Pose estimation aims to accurately identify anatomical keypoints in humans and animals using monocular images, which is crucial for various applications such as human-machine interaction, Embodied AI, and autonomous driving. While current models show promising results, they are typically trained and tested on clean data, potentially overlooking the corruption during real-world deployment and thus posing safety risks in practical scenarios. To address this issue, we introduce PoseBench, a comprehensive benchmark designed to evaluate the robustness of pose estimation models against real-world corruption. We evaluated 60 representative models, including top-down, bottom-up, heatmap-based, regression-based, and classification-based methods, across three datasets for human and animal pose estimation. Our evaluation involves 10 types of corruption in four categories: 1) blur and noise, 2) compression and color loss, 3) severe lighting, and 4) masks. Our findings reveal that state-of-the-art models are vulnerable to common real-world corruptions and exhibit distinct behaviors when tackling human and animal pose estimation tasks. To improve model robustness, we delve into various design considerations, including input resolution, pre-training datasets, backbone capacity, post-processing, and data augmentations. We hope that our benchmark will serve as a foundation for advancing research in robust pose estimation.
Pose estimation results in terms of mRR (%) for 10 representative models on the COCO-C, OCHuman-C, and AP10K-C datasets. Corruptions: #1 Motion Blur, #2 Gaussian Noise, #3 Impulse Noise, #4 Pixelate, #5 JPEG Compression, #6 Color Quant, #7 Brightness, #8 Darkness, #9 Contrast, #10 Mask.
@article{ma2024posebench,
author = {Ma, Sihan and Zhang, Jing and Cao, Qiong and Tao, Dacheng},
title = {PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions},
journal = {arXiv preprint arXiv:2406.14367}
year = {2024},
}