Abstract:Multi-agent reinforcement learning has a broad range of applications in group control.However,traditional reinforcement learning methods,such as Q-learning or policy gradient,prove unsuitable for multi-agent environments. As training progresses,the strategy of each agent undergoes changes.When one agent makes decisions based on environmental information,the decisions of other agents may have already influenced the environment's information, leading to changes in the transition probability distribution and the reward function perceived by the agent.This renders the environment non-stationary,hindering the training process.To address these issues,this paper explores a multi- agent reinforcement learning algorithm based on multi-head self-attention.The approach considers the action strategies of other agents and utilizes a multi-head self-attention algorithm to enable agents to learn the most influential factors in the environment,successfully acquiring complex multi-agent coordination policies.In the experimental results,the average return value reaches 0.82,which is much higher than the performance of traditional algorithm.Experimental results demonstrate the effectiveness of the proposed multi-agent reinforcement learning algorithm based on multi-head self-attention in overcoming challenges related to the non-stationary environment,thereby enhancing the convergence speed and stability of the multi-agent reinforcement learning algorithm.