基于Transformer与增强信息融合的双源情感识别
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

国家自然科学基金(62105196)项目资助


Dual-source emotion recognition based on transformer and enhanced information fusion
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为解决当前多模态情感识别效果不佳的问题,提出了一种基于Transformer 与增强信息融合的双源情感识别模型,模 型由音视频编码分支网络和双源增强特征融合模块组成。其中,视频编码分支利用MobileViTv2提取每帧视频的空间特征, 并通过在Transformer 编码器结构中内嵌残差结构,强化各帧短期关联语义信息的提取能力。在音频特征提取部分构建了维 度匹配器,避免了潜在异构鸿沟,提高了模型训练的鲁棒性。在音视频特征融合处引入低参数量跨模态注意力机制,从两个 角度同时增强特征融合能力。通过对比和消融实验证明了方法在多模态情感识别任务中的有效性。

    Abstract:

    In order to solve the problem that the current multi-modal emotion recognition effect is not good,adual-source emotion recognition model based on Transformer and enhanced information fusion is proposed.The model is composed of audio and video encoding and dual-source enhanced feature fusion modules.Among them,the video coding branch uses MobileViTv2 to extract the spatial features ofeach frame of video,and embeds the residual structure in the Transformer encoder structure to enhance the ability to extract short-term associated semantic information of each frame.A dimensionality matcher is built in the audio feature extraction part,which avoids potential heterogeneity gaps and improves the robustness of model training.A low-parameter cross-modal attention mechanism is introduced in the fusion of audio and video features to enhance the feature fusion ability from two perspectives.The effectiveness ofour method in multimodal emotion recognition tasks is demonstrated by comparison and ablation experiments.

    参考文献
    相似文献
    引证文献
引用本文

闫 超,贾振堂.基于Transformer与增强信息融合的双源情感识别[J].国外电子测量技术,2023,42(4):187-193

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-10-29
  • 出版日期:
文章二维码