基于 Stereo RCNN 的锚引导 3D 目标检测算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391. 41 TH741

基金项目:

国家自然科学基金(61801323)、苏州市民生科技项目(SS2019029)、中国博士后科学基金(2021M691848)项目资助


An anchor-guided 3D target detection algorithm based on stereo RCNN
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对当前基于锚的双目 3D 目标检测算法存在的锚点数量选取较多,从而影响在线计算速度的问题,提出了一种基于 Stereo RCNN 的锚引导 3D 目标检测算法 FGAS RCNN。 在第 1 阶段中,输入左右图像分别生成相应的概率图以生成稀疏锚点及 稀疏锚框,再通过将左右锚作为一个整体生成 2D 预选框。 第 2 阶段的关键点生成网络利用稀疏锚点信息生成关键点热图,并 结合立体回归器融合生成 3D 预选框。 针对原始图像在卷积后会出现像素级信息丢失的问题,通过 Mask Branch 生成的实例分 割掩模结合实例级视差估计进行像素级优化。 实验表明,在没有任何深度和位置先验信息输入的情况下,此方法依旧可以在减 少计算量的同时保持较高的召回率。 具体来说,此方法在以 0. 7 为阈值的 3D 目标检测上平均精度为 44. 07% 。 相比于 Stereo RCNN,本文方法在平均精度上提高了 4. 5% 。 与此同时,此方法的整体运行时间较 Stereo RCNN 缩短了 0. 09 s。

    Abstract:

    The current binocular 3D detection algorithm has the problem of slow online calculation speed due to a large number of anchor points to be selected. To address this issue, an anchor-guided 3D target detection algorithm is proposed, which is based on the stereo RCNN. This method is named as the FGAS RCNN. In the first stage, a probability map is generated for the left and right input images to generate sparse anchor points and corresponding sparse anchor boxes. The left and right anchors are used as the whole entirety to generate a 2D preselection box. The second stage is based on the key-point generation network of the pyramid feature network. The key-point heatmaps are generated by the information of these sparse anchor points. A 3D bounding box can be generated by combining the stereo regressor with these key-point heatmaps. The original image will lose pixel-level information after convolution. The instance segmentation mask generated by Mask Branch can be used to solve this problem. The 3D bounding box center depth precision can be improved by the instance segmentation mask and the instance-level disparity estimation. Experimental results show that the proposed method can reduce the amount of calculation while maintaining a high recall rate without any depth and position prior information input. Specifically, the mean average precision is 44. 07% on 3D target detection with a threshold of 0. 7. Compared with the stereo RCNN, the proposed method improves the average precision by 4. 5% . Meanwhile, the overall running time of our method is 0. 09 s shorter than Stereo RCNN.

    参考文献
    相似文献
    引证文献
引用本文

曹杰程,陶重犇.基于 Stereo RCNN 的锚引导 3D 目标检测算法[J].仪器仪表学报,2021,(12):191-201

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-06-28
  • 出版日期: