Abstract:This paper proposes an improved method for handling scattered stacked objects.In the YOLOv5 model,the BiFPN feature pyramid is used to replace PANet,and combined with the Gfocal loss function,the problem of missed detection and false detection is effectively improved,and mAP@0.5 reaches 90.1%.Mask R-CNN is used for target object segmentation,the lightweight Mobilenetv3 is used to replace the ResNet101 backbone network to reduce the number of parameters,and the CFNet idea is used to strengthen the feature fusion mechanism,increasing the segmentation accuracy to 92.1%.By cascading the improved YOLOv5 and the improved Mask R-CNN,the algorithm achieves a balance between real-time performance and accuracy,and extracts accurate object shape information in the effective region of interest(ROI)area.Compared with using the instance segmentation algorithm alone,the detection speed is increased by 1 s.Experiments have shown that the algorithm proposed in this article not only improves the inference speed,but also improves the segmentation accuracy,and solves the problem of poor object feature extraction and slow detection speed in complex stacking scenes.