基于梯度算子和注意力的多模态融合目标检测
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TH741 TP391. 41

基金项目:

江淮前沿技术协同创新中心追梦基金课题(2023ZM01Z025)项目资助


Multi-modal fusion object detection based on gradient operator and attention
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    红外与可见光图像具有很好的互补特性,可以利用这 2 种模态图像的融合来适应自动驾驶等领域对于目标检测高精度 和高鲁棒性的要求。 现有多模态目标检测算法往往模型庞大,推理耗时长,无法在边缘设备上部署,而采用直接融合等方法又 无法充分发挥不同模态的优势,因此提出了一种基于梯度算子和注意力机制的融合目标检测算法。 引入梯度算子设计定制化 卷积来捕获图像纹理;红外支路引入坐标注意力发挥其目标定位优势;引入权重生成网络对 2 个模态的特征进行自适应加权融 合。 算法结构模块化,轻量化,适合部署在边缘设备上。 在数据集上实验,得到 mAP@ 0. 50 和 mAP@ 0. 5 ∶ 0. 95 指标值比可见 光单模态检测提升了 6. 3% 和 7. 2% ,比红外提升了 11. 3% 和 9. 8% 。 推理帧率可达 22. 7,满足实时性要求。

    Abstract:

    Infrared and visible images exhibit complementary characteristics, making their fusion highly suitable for achieving high accuracy and robustness in target detection for applications such as autonomous driving. However, existing multimodal object detection algorithms often feature large models and long inference times, making them unsuitable for deployment on edge devices. Additionally, direct fusion methods fail to fully leverage the strengths of different modalities. To address these challenges, we propose a fusion object detection algorithm that integrates a gradient operator and an attention mechanism. A gradient operator is employed to design a customized convolutional layer for capturing image texture. In the infrared branch, coordinate attention is incorporated to enhance target localization capabilities. Additionally, a weight generation network is introduced to adaptively balance the features of both modalities. The algorithm is modular and lightweight, making it ideal for edge device deployment. Experiments on benchmark datasets demonstrate that the proposed method achieves mAP@ 0. 50 and mAP@ 0. 5 ∶ 0. 95 scores that are 6. 3% and 7. 2% higher, respectively, than singlemodal detection using visible images, and 11. 3% and 9. 8% higher than infrared detection. The inference frame rate reaches 22. 7 FPS, meeting real-time processing requirements. Keywords:object detection; dual-modal; f

    参考文献
    相似文献
    引证文献
引用本文

李学钊,王 伟,薛 冰.基于梯度算子和注意力的多模态融合目标检测[J].仪器仪表学报,2024,45(11):224-232

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-01-26
  • 出版日期:
文章二维码