基于注意力机制的信息预处理多智能体强化学习算法

首页 > 过刊浏览>2024年第43卷第3期 >91-97

基于注意力机制的信息预处理多智能体强化学习算法
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        杜泳韬杜泳韬
1.桂林电子科技大学计算机与信息安全学院
在期刊界中查找
在百度中查找
在本站中查找
赵岭忠赵岭忠
1.桂林电子科技大学计算机与信息安全学院
在期刊界中查找
在百度中查找
在本站中查找
翟仲毅翟仲毅
1.桂林电子科技大学计算机与信息安全学院
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP301
基金项目:

Attention-based information preprocessing multi-agent reinforcement learning algorithm

Author:

Du Yongtao
Du Yongtao
1.School of Computer and Information Security,Guilin University of Electronic Technology
在期刊界中查找
在百度中查找
在本站中查找
Zhao Lingzhong
Zhao Lingzhong
1.School of Computer and Information Security,Guilin University of Electronic Technology
在期刊界中查找
在百度中查找
在本站中查找
Zhai Zhongyi
Zhai Zhongyi
1.School of Computer and Information Security,Guilin University of Electronic Technology
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

多智能体强化学习在群体控制领域具有广泛应用，然而传统的强化学习方法(如 Q-Learning 或策略梯度)在多智能体环境中表现不佳。在训练过程中，每个智能体的策略不断变化。当一个智能体基于环境信息做出决策时，其他智能体的决策可能已经影响了环境信息，导致智能体感知的转移概率分布和奖赏函数发生变化，使得环境变得非平稳，训练无法有效进行。为了缓解这一问题，研究了一种基于多头自注意力的多智能体强化学习算法。该方法考虑了其他智能体的行动策略，利用多头自注意力算法使智能体能够学习对决策影响最大的因素，成功地学习了复杂的多智能体协调策略。在实验结果中平均回报达值到了0.82,远高于传统算法的表现。实验结果表明，所提出的基于多头自注意力的多智能体强化学习算法能够有效解决环境不平稳导致的多智能体学习困难问题，提高了多智能体强化学习算法的收敛速度和平稳性。

关键词:多智能体强化学习;多头自注意力;信息预处理：策略梯度：非平稳

Abstract:

Multi-agent reinforcement learning has a broad range of applications in group control.However,traditional reinforcement learning methods,such as Q-learning or policy gradient,prove unsuitable for multi-agent environments. As training progresses,the strategy of each agent undergoes changes.When one agent makes decisions based on environmental information,the decisions of other agents may have already influenced the environment's information, leading to changes in the transition probability distribution and the reward function perceived by the agent.This renders the environment non-stationary,hindering the training process.To address these issues,this paper explores a multi- agent reinforcement learning algorithm based on multi-head self-attention.The approach considers the action strategies of other agents and utilizes a multi-head self-attention algorithm to enable agents to learn the most influential factors in the environment,successfully acquiring complex multi-agent coordination policies.In the experimental results,the average return value reaches 0.82,which is much higher than the performance of traditional algorithm.Experimental results demonstrate the effectiveness of the proposed multi-agent reinforcement learning algorithm based on multi-head self-attention in overcoming challenges related to the non-stationary environment,thereby enhancing the convergence speed and stability of the multi-agent reinforcement learning algorithm.

Key words:multi-agent reinforcement learning;multi-head self-attention;information preprocessing;policy gradient; non-stationary

引用本文

杜泳韬,赵岭忠,翟仲毅.基于注意力机制的信息预处理多智能体强化学习算法[J].国外电子测量技术,2024,43(3):91-97

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-06-12
出版日期:

网站首页

杂志简介

在线阅读

投稿须知

欢迎订阅

联系我们

引用本文

分享

文章指标

历史

文章二维码

网站首页

杂志简介

在线阅读

投稿须知

欢迎订阅

联系我们

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码