PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation

Zihua Liu, Yizhou Li, Songyan Zhang Masatoshi Okutomi

Institute of Science Tokyo, Sony, Nanyang Technological University
AIM: Advances in Image Manipulation workshop and challenges (ICCV2025 Workshop)

Paper Code arXiv

Multi-Baseline Stereo Images Generation by DMS on KITTI Dataset, From top to down: left-left,left,center,right,right-right view,

Abstract

Self-supervised stereo and monocular depth estimation remain limited by photometric ambiguity from missing correspondences in occluded and out-of-frame regions. We introduce DMS, a model-agnostic framework that leverages geometric priors from diffusion models to synthesize epipolar-aligned novel views via directional prompts. By fine-tuning Stable Diffusion to generate left-shifted, right-shifted, and intermediate perspectives, DMS explicitly supplements occluded pixels, providing clearer photometric supervision. Without extra labels or training cost, DMS improves self-supervised depth learning and achieves up to 35% fewer outliers across multiple benchmarks.

Method Overview

Two-Stage Training pipeline for Self-Supervised Depth Estimation using Diffusion Models.

Stage 1: Start from given views, using diffusion model to generated multi-baseline images.
Stage 2: Use multi-baseline images to provide extra geometry clues to assist the self-supervised depth estimation

Network Achitecture Of DMS

Intermediate View Approximation

Experimental Results

Poster

BibTeX

@article{liu2025dms,
  title={DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation},
  author={Liu, Zihua and Li, Yizhou and Zhang, Songyan and Okutomi, Masatoshi},
  journal={arXiv preprint arXiv:2508.13091},
  year={2025}
}

More Works from Our Lab

Paper Title 1

Paper Title 2

Paper Title 3