基于语音和文本的跨模态情感识别方法研究
DOI:
CSTR:
作者:
作者单位:

西安工程大学电子信息学院 西安 710048

作者简介:

通讯作者:

中图分类号:

TP391.4;TN912.3

基金项目:


Research on Cross-Modal Emotion Recognition Methods Based on Speech and Text
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    跨模态情感识别(Cross-modal emotion recognition,ERC)旨在通过不同模态数据来感知人类情感。目前,大多数研究仍然专注于单一模态,忽略了其他模态的重要性。本文提出了一种基于知识蒸馏的跨模态情感识别方法,通过融合语音和文本模态的信息,显著提高了情感识别的准确性。具体来说,本文利用了预训练的文本模型Roberta作为教师模型,通过特征蒸馏法将其高质量的文本情感表示传递给一个轻量级的语音学生模型。此外,教师模型和学生模型通过双向目标蒸馏法相互传授知识。实验结果表明,本文提出的方法在IEMOCAP和MELD数据集上表现优异。

    Abstract:

    Cross-modal emotion recognition (ERC) aims to perceive human emotions through data from different modalities. Currently, most research still focuses on single modalities, neglecting the importance of other modalities. This paper proposes a cross-modal emotion recognition method based on knowledge distillation, which significantly improves the accuracy of emotion recognition by integrating information from both speech and text modalities. Specifically, the proposed method utilizes a pre-trained text model, RoBERTa, as the teacher model, and transfers its high-quality textual emotional representations to a lightweight speech student model through feature distillation. Additionally, a bi-directional objective distillation is employed, enabling the teacher and student models to mutually transfer knowledge. Experimental results show that the proposed method achieves superior performance on the IEMOCAP and MELD datasets.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-10-28
  • 最后修改日期:2024-12-04
  • 录用日期:2024-12-04
  • 在线发布日期:
  • 出版日期:
文章二维码
×
《国外电子测量技术》
财务封账不开票通知