Multi-Modal Pedestrian Detection with Misalignment

Fig0

In this research, we study on pedestrian detection from RGB (visible)
and long-wavelength infrared (thermal) images, when misalignment between them is present.
How to efficiently use the information of both modals is our main concern.


Contributions

Fig1

  • We analyzed the misalignment problem of existing multi-modal detection
  • We proposed new evaluation metrics for multi-modal detection, multi-modal IoU (IoUM) and multi-modal MR (MRM)
  • We proposed multi-modal Faster R-CNN for pedestrian detection based on modal-wise regression and IoUM


  • Framework Comparison

    Fig2

    AR-CNN refers to: Lu Zhang, Xiangyu Zhu, Xiangyu Chen, Xu Yang, Zhen Lei, and Zhiyong Liu. Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.


    Proposed Network Overview

    Fig3


    Visualization Examples

    Fig3

  • MSDS-RCNN refers to: Chengyang Li, Dan Song, Ruofeng Tong, and Min Tang. Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation. In British Machine Vision Conference (BMVC), 2018.
  • AR-CNN refers to: Lu Zhang, Xiangyu Zhu, Xiangyu Chen, Xu Yang, Zhen Lei, and Zhiyong Liu. Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
  • MBNet refers to: Kailai Zhou, Linsen Chen, and Xun Cao. Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems. In Proceedings of the European Conference on Computer Vision (ECCV), pages 787-803, 2020.


  • Publication

    Multi-Modal Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU [arXiv]

    Napat Wanchaitanawong, Masayuki Tanaka, Takashi Shibata, and Masatoshi Okutomi
    Proceedings of the 17th International Conference on Machine Vision Applications (MVA2021), July 2021.