GOAT: Global Occlusion-Aware Transformer for Robust Stereo Matching

Zihua Liu, Yizhou Li, Masatoshi Okutomi

Tokyo Institude of Technology

Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV2024)

[paper], [supp], [code]


Method Overview

Description of the image

Figure 1. An overview of our proposed GOAT.

We decouple the stereo-matching process into matching for non-occluded regions and disparity refinement for occluded regions. In the matching phase, we propose a parallel disparity and occlusion estimation module (PDO) which leverages both positional and global correlations between the left and right views to estimate initial disparity and the occlusion mask, respectively. In the refinement phase, we propose an iterative occlusion-aware global aggregation module (OGA) using restricted global correlation with occlusion guidance to optimize the disparity within the occluded regions. Finally, a context adjustment layer is employed to refine the disparity from a mono-depth aspect.


Abstract

The Deep Learning-Based Stereo Matching's performance in the ill-conditioned regions, such as the occluded regions, remains a bottleneck. Due to the limited receptive field, existing CNN-based methods struggle to handle these ill-conditioned regions effectively. To address this issue, this paper introduces a novel attention-based stereo-matching network called Global Occlusion-Aware Transformer (GOAT) to exploit long-range dependency and occlusion-awareness global context for disparity estimation. In the GOAT architecture, a parallel disparity and occlusion estimation module (PDO) is proposed to estimate the initial disparity map and the occlusion mask using a parallel attention mechanism. To further enhance the disparity estimates in the occluded regions, an occlusion-aware global aggregation module (OGA) is proposed. This module aims to refine the disparity in the occluded regions by leveraging restricted global correlation within the focus scope of the occluded areas. Extensive experiments were conducted on several public benchmark datasets including SceneFlow, KITTI 2015, and Middlebury. The results show that proposed GOAT demonstrates outstanding performance among all benchmarks, particularly in the occluded regions.

Parallel Disparity and Occlusion Estimation Module (PDO)

Description of the image

Figure 2. Parallel Disparity and Occlusion Estimation Module Architecture. (PDO)

Instead of using a cost volume with a predetermined search, we proposed a global-attention-based module named PDO to compute the initial disparity and the occlusion mask.

Description of the image

Iterative Occlusion-Aware Global Aggregation Module (OGA)

Description of the image

Figure 3.Iterative Occlusion-Aware Global Aggregation Module (OGA)

We propose an iterative refinement module based on self-attention, namely OGA module, which aggregates features from valid non-occluded regions into invalid occluded regions with global spatial correlation..

Description of the image

Quantitative Results

Massive experiments were conducted on SceneFlow,KIITI2015 and MiddleBurry2014 dataset as well as the FAT dataset, compared with existing state-of-the-art method.

Description of the image

Qualitative Results

Description of the image

Figure 4. Qualitative Comparsion on SceneFlow and KITTI Dataset with other SOTA Methods.


Description of the image

Figure 5. (a) Visualization of estimated response for disparity candidates using proposed PDO. Compared with a cost volume method, the PDO can alleviate matching ambiguity in texture-less regions and show a single peak waveform.
(b)Visualization of global attention map in the occluded regions using the proposed OGA.

Publications

Global Occlusion-Aware Transformer for Robust Stereo Matching

Zihua Liu, Yizhou Li, Masatoshi Okutomi
Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV2024)

[paper], [supp], [code]


Contact

Zihua Liu: zliu@ok.sc.e.titech.ac.jp
Yizhou Li: yli@ok.sc.e.titech.ac.jp
Masatoshi Okutomi: mxo@ctrl.titech.ac.jp


アクセスカウンター