Abstract:In the complex scene composed of non-occlusion and occlusion of objects placed in disorder, aiming at the problem of real-time, accurate and stable pose estimation, a target pose estimation algorithm combining shuffle coordinate attention and improved spatial pyramid pooling is proposed. A shuffle coordinate attention residual module consisting of coordinate features, channel features and spatial features has been built to effectively improve the accuracy of key point estimation. The spatial pyramid pooling network is improved, and the multi-scale feature thinning method of neck position is used to obtain highly accurate estimation of edge pose and spatial position. The produced occlusive dataset is used to further validate the performance and generalization capability of the proposed algorithm. On the public LineMod and Partial Occlusion occlusive datasets, the proposed algorithm improves ADD metrics by 2.26% and 2.57% respectively, and 5cm5° metrics by 5.16% and 4.1%, respectively, compared to the shuffle attention (SA)-based algorithm, reaching a real-time processing speed of 30 FPS, providing an effective method for object pose estimation in complex scenes such as occlusion.