Abstract:The unmanned aerial vehicle(UAV)-assisted wireless power supply for the internet of things(IoT)is an innovative network architecture where UAVs serve as energy transmission intermediaries,effectively addressing the limitations and constraints of power supply for loT devices.In addressing the challenge of multi-objective control policy learning in UAV-assisted wireless power supply for the IoT,this study proposes a multi-objective twin-delay deep deterministic policy gradient(MOTD3)algorithm based on deep reinforcement learning.The MOTD3 algorithm aims to achieve joint optimization of multiple objectives,including maximizing the total data rate and total harvested energy, while minimizing energy consumption and hover time,under constraints such as yaw angle,flight speed,and transmission power.Additionally,it adapts UAVs to dynamic demand changes through online path planning.Simulation results demonstrate that the proposed algorithm can improve the total data rate,total harvest energy,energy consumption and hover time by 14.7%,10.6%,6.1%and 10.3%respectively compared with other algorithms,and has strong generalization ability,which can be applied to different communication scenarios in practice.