Neural Radiance Fields for Novel View Synthesis
in Monocular Gastroscopy

(To be presented at) 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024

Sho Suzuki2, Kenji Miki3
1Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology
2Department of Gastroenterology, International University of Health and Welfare Ichikawa Hospital
3Department of Internal Medicine, Tsujinaka Hospital Kashiwanoha

Enabling the synthesis of arbitrarily novel view- point images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure- from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, such as point clouds and meshes, thereby enabling the rendering of the images from novel viewpoints. However, the existence of low- texture and non-Lambertian regions within the stomach often results in noisy and incomplete reconstructions of point clouds and meshes, hindering the attainment of high-quality image rendering. In this paper, we apply the emerging technique of neural radiance fields (NeRF) to monocular gastroscopic data for synthesizing photo-realistic images for novel viewpoints. To address the performance degradation due to view sparsity in local regions of monocular gastroscopy, we incorporate geometry priors from a pre-reconstructed point cloud into the training of NeRF, which introduces a novel geometry-based loss to both pre-captured observed views and generated unobserved views. Compared to other recent NeRF methods, our approach showcases high-fidelity image renderings from novel viewpoints within the stomach both qualitatively and quantitatively.


To appear

The overall process flow

Using a real monocular gastroscopic image sequence, we first apply structure-from-motion (SfM) to obtain camera poses and a reconstructed point cloud. Then, we train neural radiance fields (NeRF) of the stomach, where we propose a novel geometry-based loss exploiting the point cloud from SfM. In the application phase, RGB images and depth maps of novel views can be synthesized through volume rendering of NeRF.

The overview of our proposed NeRF method

As a standard NeRF method, we train a network to estimate the color and the density given the 3D point coordinate and the viewing direction as inputs. The key of our method is twofold: 1) We generate unobserved views by interpolating consecutive observed views to address view sparsity and 2) we apply a geometry-based loss for both observed and unobserved views to effectively constrain the learned geometry by using the point cloud reconstructed by SfM. The technical details are in the methodology section.

Rendering results for a novel camera trajectory

The camera trajectory in red color represents a real gastroscope trajectory, which was used for training NeRF. The camera trajectory in blue color represents a novel trajectory for the view synthesis application.

The video results:

Pre-captured images Rendered RGBs of the novel trajectory Rendered depths of the novel trajectory