基于增强全局-局部特征融合的视频描述生成方法

首页 > 过刊浏览>2024年第43卷第1期 >1-9

基于增强全局-局部特征融合的视频描述生成方法
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP391.4;TP183
基金项目:国家自然科学基金(61976063)项目资助

Video description generation method based on enhanced global-localfeature fusion

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

现有的视频描述生成方法提取的特征及特征组合的方式较为简单,导致模型丢失了部分与视频描述相关的重要语义信息,限制了对视频内容的准确描述和理解。分析存在的不足,提出了一种基于增强全局-局部特征融合的视频描述生成方法。首先采用不同特征提取器分别对视频片段提取局部特征和全局特征,为了建模不同级别特征(局部和全局)的相关性,利用特征融合增强网络进行特征融合,丰富模型的特征信息。解码器使用的双向长短期记忆网络,并在其后加入重构网络,重构经编码器处理得到的视频特征序列,最终经过长短期记忆网络生成视频的描述语句。在 MSVD 与 MSR-VTT 数据集上的实验结果表明,提出的模型可以显著提高生成的描述语句的准确性。

Abstract:

Existing video description generation methods extract features and feature combinations in a simpler way.resulting in the model losing some of the important semantic information related to the video description, limiting theaccurate description and understanding of the video content. Analysing the deficiencies, this paper proposes a videodescription generation method based on enhanced global-local feature fusion, Firstly, different feature extractors are usedto extract local and global features for the video clips respectively, and in order to model the relevance of different levelsof features (local and global), feature fusion is performed using a feature fusion enhancement network to enrich thefeature information of the model, In this paper, the bidirectional long and short termn memory network used by thedecoder is followed by a reconstruction network, which reconstructs the video feature sequences obtained by the encodelprocessing, and finally generates the descriptive statements of the video through the long and short term memorynetwork. Experimental results on MSVD and MSR-VTT datasets show that the model proposed in this paper cansignificantly improve the accuracy of the generated descriptive statements.

参考文献

相似文献

引证文献

引用本文

黄飞燕,曾上游,邱泓语.基于增强全局-局部特征融合的视频描述生成方法[J].国外电子测量技术,2024,43(1):1-9

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-05-28
出版日期:

网站首页

杂志简介

在线阅读

投稿须知

欢迎订阅

联系我们

引用本文

分享

文章指标

历史

文章二维码