DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation

Institute of Science Tokyo, Sony, Nanyang Technological University
AIM: Advances in Image Manipulation workshop and challenges (ICCV2025 Workshop)
Project teaser banner
Project teaser banner

Multi-Baseline Stereo Images Generation by DMS on KITTI Dataset, From top to down: left-left,left,center,right,right-right view,

Abstract

Self-supervised stereo and monocular depth estimation remain limited by photometric ambiguity from missing correspondences in occluded and out-of-frame regions. We introduce DMS, a model-agnostic framework that leverages geometric priors from diffusion models to synthesize epipolar-aligned novel views via directional prompts. By fine-tuning Stable Diffusion to generate left-shifted, right-shifted, and intermediate perspectives, DMS explicitly supplements occluded pixels, providing clearer photometric supervision. Without extra labels or training cost, DMS improves self-supervised depth learning and achieves up to 35% fewer outliers across multiple benchmarks.

Method Overview

Two-Stage Training pipeline for Self-Supervised Depth Estimation using Diffusion Models.

DMS pipeline overview

Stage 1: Start from given views, using diffusion model to generated multi-baseline images.
Stage 2: Use multi-baseline images to provide extra geometry clues to assist the self-supervised depth estimation

Network Achitecture Of DMS

DMS pipeline overview

Intermediate View Approximation

DMS pipeline overview

Experimental Results

DMS pipeline overview
DMS pipeline overview

DMS pipeline overview
DMS pipeline overview

DMS pipeline overview

Poster

BibTeX

@article{liu2025dms,
  title={DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation},
  author={Liu, Zihua and Li, Yizhou and Zhang, Songyan and Okutomi, Masatoshi},
  journal={arXiv preprint arXiv:2508.13091},
  year={2025}
}