基于注意力机制的信息预处理多智能体强化学习算法
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP301

基金项目:


Attention-based information preprocessing multi-agent reinforcement learning algorithm
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    多智能体强化学习在群体控制领域具有广泛应用,然而传统的强化学习方法(如 Q-Learning 或策略梯度)在多智能体 环境中表现不佳。在训练过程中,每个智能体的策略不断变化。当一个智能体基于环境信息做出决策时,其他智能体的决策 可能已经影响了环境信息,导致智能体感知的转移概率分布和奖赏函数发生变化,使得环境变得非平稳,训练无法有效进行。 为了缓解这一问题,研究了一种基于多头自注意力的多智能体强化学习算法。该方法考虑了其他智能体的行动策略,利用多 头自注意力算法使智能体能够学习对决策影响最大的因素,成功地学习了复杂的多智能体协调策略。在实验结果中平均回 报达值到了0.82,远高于传统算法的表现。实验结果表明,所提出的基于多头自注意力的多智能体强化学习算法能够有效解 决环境不平稳导致的多智能体学习困难问题,提高了多智能体强化学习算法的收敛速度和平稳性。

    Abstract:

    Multi-agent reinforcement learning has a broad range of applications in group control.However,traditional reinforcement learning methods,such as Q-learning or policy gradient,prove unsuitable for multi-agent environments. As training progresses,the strategy of each agent undergoes changes.When one agent makes decisions based on environmental information,the decisions of other agents may have already influenced the environment's information, leading to changes in the transition probability distribution and the reward function perceived by the agent.This renders the environment non-stationary,hindering the training process.To address these issues,this paper explores a multi- agent reinforcement learning algorithm based on multi-head self-attention.The approach considers the action strategies of other agents and utilizes a multi-head self-attention algorithm to enable agents to learn the most influential factors in the environment,successfully acquiring complex multi-agent coordination policies.In the experimental results,the average return value reaches 0.82,which is much higher than the performance of traditional algorithm.Experimental results demonstrate the effectiveness of the proposed multi-agent reinforcement learning algorithm based on multi-head self-attention in overcoming challenges related to the non-stationary environment,thereby enhancing the convergence speed and stability of the multi-agent reinforcement learning algorithm.

    参考文献
    相似文献
    引证文献
引用本文

杜泳韬,赵岭忠,翟仲毅.基于注意力机制的信息预处理多智能体强化学习算法[J].国外电子测量技术,2024,43(3):91-97

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-06-12
  • 出版日期:
文章二维码
×
《国外电子测量技术》
财务封账不开票通知