基于Transformer与增强信息融合的双源情感识别

首页 > 过刊浏览>2023年第42卷第4期 >187-193

基于Transformer与增强信息融合的双源情感识别
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        闫 超闫 超
1.上海电力大学电子与信息工程学院
在期刊界中查找
在百度中查找
在本站中查找
贾振堂贾振堂
1.上海电力大学电子与信息工程学院
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP391.4
基金项目:国家自然科学基金(62105196)项目资助

Dual-source emotion recognition based on transformer and enhanced information fusion

Author:

Yan Chao
Yan Chao
1.College of Electronics and Information Engineering,Shanghai University of Electric Power
在期刊界中查找
在百度中查找
在本站中查找
Jia Zhentang
Jia Zhentang
1.College of Electronics and Information Engineering,Shanghai University of Electric Power
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为解决当前多模态情感识别效果不佳的问题，提出了一种基于Transformer 与增强信息融合的双源情感识别模型，模型由音视频编码分支网络和双源增强特征融合模块组成。其中，视频编码分支利用MobileViTv2提取每帧视频的空间特征，并通过在Transformer 编码器结构中内嵌残差结构，强化各帧短期关联语义信息的提取能力。在音频特征提取部分构建了维度匹配器，避免了潜在异构鸿沟，提高了模型训练的鲁棒性。在音视频特征融合处引入低参数量跨模态注意力机制，从两个角度同时增强特征融合能力。通过对比和消融实验证明了方法在多模态情感识别任务中的有效性。

关键词:情感识别;Transformer;注意力机制;多模态融合

Abstract:

In order to solve the problem that the current multi-modal emotion recognition effect is not good,adual-source emotion recognition model based on Transformer and enhanced information fusion is proposed.The model is composed of audio and video encoding and dual-source enhanced feature fusion modules.Among them,the video coding branch uses MobileViTv2 to extract the spatial features ofeach frame of video,and embeds the residual structure in the Transformer encoder structure to enhance the ability to extract short-term associated semantic information of each frame.A dimensionality matcher is built in the audio feature extraction part,which avoids potential heterogeneity gaps and improves the robustness of model training.A low-parameter cross-modal attention mechanism is introduced in the fusion of audio and video features to enhance the feature fusion ability from two perspectives.The effectiveness ofour method in multimodal emotion recognition tasks is demonstrated by comparison and ablation experiments.

Key words:emotion recognition;Transformer;attention mechanism;multimodal fusion

引用本文

闫超,贾振堂.基于Transformer与增强信息融合的双源情感识别[J].国外电子测量技术,2023,42(4):187-193

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-10-29
出版日期:

网站首页

杂志简介

在线阅读

投稿须知

欢迎订阅

联系我们

引用本文

分享

文章指标

历史

文章二维码

网站首页

杂志简介

在线阅读

投稿须知

欢迎订阅

联系我们

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码