基于跨模态特征融合的RGB-D显著性目标检测
DOI:
作者:
作者单位:

新疆大学智能制造现代产业学院(机械工程学院)

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


RGB-D Salient Object Detection Based on Cross-Modal Feature Fusion
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    RGB-D显著性目标检测因其有效性和易于捕捉深度线索而受到越来越多的关注。现有的工作通常侧重于通过各种融合策略学习共享表示,少有方法明确考虑如何维持RGB和深度的模态特征。本文提出了一种跨模态特征融合网络,该网络维持RGB-D显著目标检测的RGB和深度的模态,通过探索共享信息以及RGB和深度模态的特性来提高显著检测性能。具体来说,采用RGB模态、深度模态网络和一个共享学习网络来生成RGB和深度模态显著性预测图以及共享显著性预测图。本文提出了一种跨模态特征融合模块,用于融合共享学习网络中的跨模态特征,然后将这些特征传播到下一层以整合跨层次信息。此外,提出了一种多模态特征聚合模块,将每个单独解码器的模态特定特征整合到共享解码器中,这可以提供丰富的互补多模态信息来提高显著性检测性能。最后,使用跳转连接来组合编码器和解码器层之间的分层特征。通过在四个基准数据集上与七种先进方法进行的实验表明,本文方法优于其他最先进的方法。

    Abstract:

    RGB-D saliency object detection has received increasing attention due to its effectiveness and ease of capturing depth cues. Existing work usually focuses on learning shared representations through various fusion strategies, and few approaches explicitly consider how to maintain the modal features of RGB and depth. In this paper, we propose a cross-modal fusion network that maintains the modalities of RGB and depth for RGB-D salient object detection, and improves the salient detection performance by exploring the shared information as well as the properties of RGB and depth modalities. Specifically, an RGB modal, a deep modal network, and a shared learning network are used to generate RGB and deep modal saliency prediction maps as well as shared saliency prediction maps. A cross-modal feature integrate module is proposed to fuse cross-modal features in the shared learning network, which are then propagated to the next layer for integrating cross level information. Besides, we propose a multi-modal feature aggregation module to integrate the modality specific features from each individual decoder into the shared decoder, which can provide rich complementary multi-modal information to boost the saliency detection performance. Further, a skip connection is used to combine hierarchical features between the encoder and decoder layers. Experiments with seven state-of-the-art methods on four benchmark datasets show that the method in this paper outperforms other state-of-the-art methods.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-12-03
  • 最后修改日期:2024-03-15
  • 录用日期:2024-03-26
  • 在线发布日期:
  • 出版日期: