Abstract:The TR-YOLOv5 model is proposed to address the problems of poor differentiation ability and slow detection speed of existing power patrol methods for high-likelihood targets.CBAM attention is introduced in layer 0 of the network to enhance the network's ability to extract fine-grained features;and in the deepest layer of the network, encoding is performed with the help of Transformer attention to enhance the semantic information transfer capability. For the 3×3 convolution in the residual structure of the model,rank decomposition isperformed to compress the amount of redundant parameters of the model.The GPAN structure is proposed in the feature fusion stage to control the transformation of each scale with GSPP to improve the fusion of feature fusion to information at each scale.The connection of the backbone network with the same-scale feature fusion structure is used to enhance the fusion of semantic information and improve the detection capability of the model.In the model training process,SIOU and CrossEntropy Loss are used as IOU and classification loss regression functions to improve the localisation and classification ability of the model.The completed training model was wrapped in PyQt to improve the human-computer interaction experience. The experimental results show that the average accuracy(mAP)of the TR-YOLOv5 model detection reaches 97.1%and the model floating point operations are reduced to 3.6 GFLOPs.ablationexperiments and comparison tests demonstrate that the TR-YOLOv5 model can effectively solve the aforementioned problems in the power inspection process.