GastroNeRF

Neural Radiance Fields for Novel View Synthesis
in Monocular Gastroscopy

(To be presented at) 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024

Zijie Jiang¹, Yusuke Monno¹, Masatoshi Okutomi¹

Sho Suzuki², Kenji Miki³

¹Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology

²Department of Gastroenterology, International University of Health and Welfare Ichikawa Hospital

³Department of Internal Medicine, Tsujinaka Hospital Kashiwanoha

Abstract

Enabling the synthesis of arbitrarily novel view- point images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure- from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, such as point clouds and meshes, thereby enabling the rendering of the images from novel viewpoints. However, the existence of low- texture and non-Lambertian regions within the stomach often results in noisy and incomplete reconstructions of point clouds and meshes, hindering the attainment of high-quality image rendering. In this paper, we apply the emerging technique of neural radiance fields (NeRF) to monocular gastroscopic data for synthesizing photo-realistic images for novel viewpoints. To address the performance degradation due to view sparsity in local regions of monocular gastroscopy, we incorporate geometry priors from a pre-reconstructed point cloud into the training of NeRF, which introduces a novel geometry-based loss to both pre-captured observed views and generated unobserved views. Compared to other recent NeRF methods, our approach showcases high-fidelity image renderings from novel viewpoints within the stomach both qualitatively and quantitatively.

Paper

EMBC
arXiv

To appear

The overall process flow

Using a real monocular gastroscopic image sequence, we first apply structure-from-motion (SfM) to obtain camera poses and a reconstructed point cloud. Then, we train neural radiance fields (NeRF) of the stomach, where we propose a novel geometry-based loss exploiting the point cloud from SfM. In the application phase, RGB images and depth maps of novel views can be synthesized through volume rendering of NeRF.

The overview of our proposed NeRF method

As a standard NeRF method, we train a network to estimate the color and the density given the 3D point coordinate and the viewing direction as inputs. The key of our method is twofold: 1) We generate unobserved views by interpolating consecutive observed views to address view sparsity and 2) we apply a geometry-based loss for both observed and unobserved views to effectively constrain the learned geometry by using the point cloud reconstructed by SfM. The technical details are in the methodology section.

Rendering results for a novel camera trajectory

The camera trajectory in red color represents a real gastroscope trajectory, which was used for training NeRF. The camera trajectory in blue color represents a novel trajectory for the view synthesis application.

The video results: