Abstract:There are many problems in the aerial photos collected by UAV,such as complex background,dense targets, overlapping targets,which pose a challenge to the existing target detection network.Based on YOLOv5,the original BackBone network is modified and the improved OSA module is embedded to solve the gradient attenuation problem caused by network depth.In view of the inaccurate positioning of small targets in the original network structure and the insufficient semantic information obtained,a 160×160 small arget detection layer is added to deal with the problem of difficult detection of small targets,and the feature fusion network is modified to enrich semantic information.Finally, the original loss function CIoU is improved.The length and width are no longer a unified whole to calculate the loss,but are optimized separately to improve the accuracy of the prediction box.The experimental results of this algorithm on VisDrone 2019 UAV aerial photography data set show that compared with the original algorithm,mAP has improved by 5.2%,the detection frame rate has reached 45 fps,and the training model size is 18.9 MB.