PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions

Abstract

Pose estimation aims to accurately identify anatomical keypoints in humans and animals using monocular images, which is crucial for various applications such as human-machine interaction, Embodied AI, and autonomous driving. While current models show promising results, they are typically trained and tested on clean data, potentially overlooking the corruption during real-world deployment and thus posing safety risks in practical scenarios. To address this issue, we introduce PoseBench, a comprehensive benchmark designed to evaluate the robustness of pose estimation models against real-world corruption. We evaluated 60 representative models, including top-down, bottom-up, heatmap-based, regression-based, and classification-based methods, across three datasets for human and animal pose estimation. Our evaluation involves 10 types of corruption in four categories: 1) blur and noise, 2) compression and color loss, 3) severe lighting, and 4) masks. Our findings reveal that state-of-the-art models are vulnerable to common real-world corruptions and exhibit distinct behaviors when tackling human and animal pose estimation tasks. To improve model robustness, we delve into various design considerations, including input resolution, pre-training datasets, backbone capacity, post-processing, and data augmentations. We hope that our benchmark will serve as a foundation for advancing research in robust pose estimation.

Benchmark

Pose estimation results in terms of mRR (%) for 10 representative models on the COCO-C, OCHuman-C, and AP10K-C datasets. Corruptions: #1 Motion Blur, #2 Gaussian Noise, #3 Impulse Noise, #4 Pixelate, #5 JPEG Compression, #6 Color Quant, #7 Brightness, #8 Darkness, #9 Contrast, #10 Mask.

BibTeX

@article{ma2024posebench,
      author    = {Ma, Sihan and Zhang, Jing and Cao, Qiong and Tao, Dacheng},
      title     = {PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions},
      journal   = {arXiv preprint arXiv:2406.14367}
      year      = {2024},
    }

PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions

PoseBench evaluates the robustness of 60 models on four groups of corruptions, covering 10 types of corruptions. Each corruption consists of five severity levels.

Abstract

Benchmark

BibTeX