Abstract
For multi-person 2D pose estimation, current deep learning based methods have exhibited impressive performance, but the trade-offs among efficiency, robustness, and accuracy in the existing approaches remain unavoidable. In principle, bottom-up methods are superior to top-down methods in efficiency, but they perform worse in accuracy. To make full use of their respective advantages, in this paper we design a novel bidirectional optimization coupled lightweight network (BOCLN) architecture for efficient, robust, and general-purpose multi-person 2D (2-dimensional) pose estimation from natural images. With the BOCLN framework, the bottom-up network focuses on global features, while the top-down network places emphasis on detailed features. The entire framework shares global features along the bottom-up data stream, while the top-down data stream aims to accelerate the accurate pose estimation. In particular, to exploit the priors of human joints’ relationship, we propose a probability limb heat map to represent the spatial context of the joints and guide the overall pose skeleton prediction, so that each person’s pose estimation in cluttered scenes (involving crowd) could be as accurate and robust as possible. Therefore, benefiting from the novel BOCLN architecture, the time-consuming refinement procedure could be much simplified to an efficient lightweight network. Extensive experiments and evaluations on public benchmarks have confirmed that our new method is more efficient and robust, yet still attain competitive accuracy performance compared with the state-of-the-art methods. Our BOCLN shows even greater promise in online applications.