基于语音和文本的跨模态情感识别方法研究

基于语音和文本的跨模态情感识别方法研究
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        王彦斌王彦斌
西安工程大学电子信息学院 西安 710048
在期刊界中查找
在百度中查找
在本站中查找
焦亚萌焦亚萌
西安工程大学电子信息学院 西安 710048
在期刊界中查找
在百度中查找
在本站中查找
郑燕茹郑燕茹
西安工程大学电子信息学院 西安 710048
在期刊界中查找
在百度中查找
在本站中查找
蒋晓倩蒋晓倩
西安工程大学电子信息学院 西安 710048
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:西安工程大学电子信息学院 西安 710048
作者简介:
通讯作者:
中图分类号:TP391.4;TN912.3
基金项目:

Research on Cross-Modal Emotion Recognition Methods Based on Speech and Text

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [18]

相似文献

引证文献

资源附件

文章评论

摘要:

跨模态情感识别(Cross-modal emotion recognition,ERC)旨在通过不同模态数据来感知人类情感。目前,大多数研究仍然专注于单一模态,忽略了其他模态的重要性。本文提出了一种基于知识蒸馏的跨模态情感识别方法,通过融合语音和文本模态的信息,显著提高了情感识别的准确性。具体来说,本文利用了预训练的文本模型Roberta作为教师模型,通过特征蒸馏法将其高质量的文本情感表示传递给一个轻量级的语音学生模型。此外,教师模型和学生模型通过双向目标蒸馏法相互传授知识。实验结果表明,本文提出的方法在IEMOCAP和MELD数据集上表现优异。

关键词:跨模态;知识蒸馏;情感识别

Abstract:

Cross-modal emotion recognition (ERC) aims to perceive human emotions through data from different modalities. Currently, most research still focuses on single modalities, neglecting the importance of other modalities. This paper proposes a cross-modal emotion recognition method based on knowledge distillation, which significantly improves the accuracy of emotion recognition by integrating information from both speech and text modalities. Specifically, the proposed method utilizes a pre-trained text model, RoBERTa, as the teacher model, and transfers its high-quality textual emotional representations to a lightweight speech student model through feature distillation. Additionally, a bi-directional objective distillation is employed, enabling the teacher and student models to mutually transfer knowledge. Experimental results show that the proposed method achieves superior performance on the IEMOCAP and MELD datasets.

Key words:Cross-modal; Knowledge Distillation; Emotion Recognition

参考文献

[1] James W. What is emotion? 1884[J]. 1948.

[2] Sun C, Huang L, Qiu X. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence[J]. arXiv preprint arXiv:1903.09588, 2019.

[3] Xu H, Liu B, Shu L, et al. BERT post-training for review reading comprehension and aspect-based sentiment analysis[J]. arXiv preprint arXiv:1904.02232, 2019.

[4] Majumder N, Poria S, Peng H, et al. Sentiment and sarcasm classification with multitask learning[J]. IEEE Intelligent Systems, 2019, 34(3): 38-43.

[5] 柴源.基于LSTM和Word2vec的图书评论文本情感分析研究[J].信息技术,2022,(07):59-64+69.DOI:10.13274/j.cnki.hdzj.2022.07.011.

[6] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[J]. OpenAI blog, 2019, 1(8): 9.

[7] Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33: 1877-1901.

[8] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.

[9] 徐尚.基于文本、语音和图像的多模态情感分析技术研究[D].南京邮电大学,2022.DOI:10.27251/d.cnki.gnjdc.2022.000347.

[10] Scarselli F, Gori M, Tsoi A C, et al. The graph neural network model[J]. IEEE transactions on neural networks, 2008, 20(1): 61-80.

[11] Hochreiter S. Long Short-term Memory[J]. Neural Computation MIT-Press, 1997.

[12] Vaswani A. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017.

[13] 李奇龙.基于对话的多模态融合情绪识别方法研究[D].成都理工大学,2020.DOI:10.26986/d.cnki.gcdlc.2020.000563.

[14] Hazarika D, Poria S, Zadeh A, et al. Conversational memory network for emotion recognition in dyadic dialogue videos[C]//Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting. NIH Public Access, 2018, 2018: 2122.

[15] Hazarika D, Poria S, Mihalcea R, et al. Icon: Interactive conversational memory network for multimodal emotion detection[C]//Proceedings of the 2018 conference on empirical methods in natural language processing. 2018: 2594-2604.

[16] Majumder N, Poria S, Hazarika D, et al. Dialoguernn: An attentive rnn for emotion detection in conversations[C]//Proceedings of the AAAI conference on artificial intelligence. 2019, 33(01): 6818-6825.

[17] Hu J, Liu Y, Zhao J, et al. MMGCN: Multimodal fusion via deep graph convolution network for emotion recognition in conversation[J]. arXiv preprint arXiv:2107.06779, 2021.

[18] Hu D, Hou X, Wei L, et al. MM-DFN: Multimodal dynamic fusion network for emotion recognition in conversations[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 7037-7041.

引用本文

复制

文章指标

点击次数:4
下载次数: 0
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2024-10-28
最后修改日期:2024-12-04
录用日期:2024-12-04
在线发布日期:
出版日期:

网站首页

杂志简介

在线阅读

投稿须知

欢迎订阅

联系我们

引用本文

分享

文章指标

历史

文章二维码

网站首页

杂志简介

在线阅读

投稿须知

欢迎订阅

联系我们

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码