基于幅值滤波与分层特征融合策略的语音情感识别

首页 > 过刊浏览>2024年第43卷第3期 >35-42

基于幅值滤波与分层特征融合策略的语音情感识别
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        喻永振喻永振
1.上海电力大学计算机科学与技术学院
在期刊界中查找
在百度中查找
在本站中查找
刘大明刘大明
1.上海电力大学计算机科学与技术学院
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TN912.3
基金项目:上海市科技计划项目(23010501500)资助

Speech emotion recognition based on amplitude filtering and hierarchical feature fusion strategy

Author:

Yu Yongzhen
Yu Yongzhen
1.School of Computer Science and Technology,Shanghai University of Electric Power
在期刊界中查找
在百度中查找
在本站中查找
Liu Daming
Liu Daming
1.School of Computer Science and Technology,Shanghai University of Electric Power
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

针对语音情感识别在多语言联合数据集上识别准确率低的问题，提出了一种基于幅值滤波与分层特征融合策略的语音情感识别方法。该方法首先对梅尔谱图内幅值分布规律进行幅值滤波，通过概率叠加扩大梅尔谱图内相近幅值之间的差异，实现谱图内的高频强增益、低频弱增益；同时，通过概率相乘缩小梅尔谱图内相远幅值之间的差异，以显示谱图内中频的细节部分。在此基础上，使用矩形卷积提取音频信号的时间动态特征，生成梅尔谱图动态特征图，并将其作为分层特征融合策略的输入。分层特征融合策略通过压缩特征图来提取不同尺度的时间动态特征，并提取不同深度中的时间动态特征。在多语言联合数据集 CER 上取得了84.44%的分类准确率。

关键词:语音情感识别;幅值滤波;分层特征融合策略;梅尔谱图动态特征图

Abstract:

A speech emotion recognition method based on amplitude filtering and hierarchical feature fusion strategy is proposed in response to the problem of low accuracy of speech emotion recognition on multi-language joint datasets.The method first applies amplitude filtering to the amplitude distribution pattern in the Mel spectrogram,enlarging the differences between similar amplitudes and achieving high frequency strong gain and low frequency weak gain within the spectrogram.Meanwhile,by multiplying probabilities,it reduces the differences between distant amplitudes in the Mel spectrogram,displaying the detailed middle frequency components. Based on this,the method uses rectangular convolution to extract the temporal dynamic features of the audio signal,generating dynamic feature maps of the Mel spectrogram,which serve as inputs to the hierarchical feature fusion strategy.The hierarchical feature fusion strategy compresses the feature maps to extract temporal dynamic features of different scales and from different depths.The proposed method achieves a classification accuracy of 84.44%on the multi-language joint dataset CER.

Key words:speech emotion recognition;amplitude filtering;hierarchical feature fusion strategy;dynamic feature map of Mel spectrogram

引用本文

喻永振,刘大明.基于幅值滤波与分层特征融合策略的语音情感识别[J].国外电子测量技术,2024,43(3):35-42

复制

文章指标

点击次数:259
下载次数: 441
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-06-12
出版日期:

网站首页

杂志简介

在线阅读

投稿须知

欢迎订阅

联系我们

引用本文

分享

文章指标

历史

文章二维码

网站首页

杂志简介

在线阅读

投稿须知

欢迎订阅

联系我们

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码