Stomach 3D Reconstruction Based on Virtual Chromoendoscopic Image Generation

IEEE Journal of Translational Engineering in Health and Medicine, vol. 9, no. 1700211, pp. 1-11, 2021

Aji Resindra Widya¹, Yusuke Monno¹, Masatoshi Okutomi¹

Sho Suzuki², Takuji Gotoda², Kenji Miki³

¹Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology

²Division of Gastroenterology and Hepatology, Department of Medicine, Nihon University School of Medicine

³Department of Internal Medicine, Tsujinaka Hospital Kashiwanoha

¹Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology

²Division of Gastroenterology and Hepatology, Department of Medicine, Nihon University School of Medicine

³Department of Internal Medicine, Tsujinaka Hospital Kashiwanoha

Abstract

Gastric endoscopy is a standard clinical process that enables medical practitioners to diagnose various lesions inside a patient's stomach. If any lesion is found, it is very important to perceive the location of the lesion relative to the global view of the stomach. Our previous research showed that this could be addressed by reconstructing the whole stomach shape from chromoendoscopic images using a structure-from-motion (SfM) pipeline, in which indigo carmine (IC) blue dye-sprayed images were used to increase feature matches for SfM by enhancing stomach surface's textures. However, spraying the IC dye to the whole stomach requires additional time, labor, and cost, which is not desirable for patients and practitioners. In this paper, we propose an alternative way to achieve whole stomach 3D reconstruction without the need of the IC dye by generating virtual IC-sprayed (VIC) images based on image-to-image style translation trained on unpaired real no-IC and IC-sprayed images. We have specifically investigated the effect of input and output color channel selection for generating the VIC images and found that translating no-IC green-channel images to IC-sprayed red-channel images gives the best SfM reconstruction result.

Summary

Firstly, This study was conducted in accordance with the Declaration of Helsinki. The Institutional Review Board at Nihon University Hospital approved the study protocol on March 8, 2018, before patient recruitment. Informed consent was obtained from all patients before they were enrolled. This study was registered with the University Hospital Medical Information Network (UMIN) Clinical Trials Registry (identification No.: UMIN000031776) on March 17, 2018. This study was also approved by the research ethics committee of Tokyo Institute of Technology, where 3D reconstruction experiments were conducted.

In our previous study, we tackled the problem of lesion localization by reconstructing a whole stomach 3D shape from endoscopic images based on structrure-from-motion (SfM) pipeline. Unfortunately, our previous pipeline only works with gastroendoscopy sequence with indigo carmin blue dye. However, though the IC dye is commonly used in gastroendoscopy, spraying it on the whole stomach surface requires additional procedure time, labor, and cost, which is not desirable for both patients and practitioners. Furthermore, the IC dye may also hinder the visibility for some lesions and reconstructed stomach 3D models because of its dark color tone. In this work, we expand our previous work by proposing a framework to reconstruct the whole stomach shape from endoscope video using SfM pipeline without the need of indigo carmin blue dye.

Based on the investigation, we have found that the generated VIC images significantly increase the number of extracted SIFT feature points. We have also found that translating from no-IC green-channel images to IC-sprayed red-channel images gives significant improvements to the SfM reconstruction quality. We have experimentally demonstrated that our new approach can reconstruct the whole stomach shapes of all seven subjects and showed that the estimated camera poses can be used for the lesion localization purpose.

[IEEE Explore] Stomach 3D Reconstruction Using Virtual Chromoendoscopic Images
@article{widya2021stomach,
title={Stomach 3D Reconstruction Using Virtual Chromoendoscopic Images},
author={Widya, Aji Resindra and Monno, Yusuke and Okutomi, Masatoshi and Suzuki, Sho and Gotoda, Takuji and Miki, Kenji},
journal={IEEE Journal of Translational Engineering in Health and Medicine},
volume={9},
pages={1--11},
year={2021},
publisher={IEEE}
}

[IEEE Explore] Stomach 3D Reconstruction Based on Virtual Chromoendoscopic Image Generation
@inproceedings{widya2020stomach,
title={Stomach 3D Reconstruction Based on Virtual Chromoendoscopic Image Generation},
author={Widya, Aji Resindra and Monno, Yusuke and Okutomi, Masatoshi and Suzuki, Sho and Gotoda, Takuji and Miki, Kenji},
booktitle={2020 42nd Annual International Conference of the IEEE Engineering in Medicine \& Biology Society (EMBC)},
pages={1848--1852},
year={2020},
organization={IEEE}
}

[arXiv page] Stomach 3D Reconstruction Based on Virtual Chromoendoscopic Image Generation
@misc{widya2020stomach,
title={Stomach 3D Reconstruction Based on Virtual Chromoendoscopic Image Generation},
author={Aji Resindra Widya and Yusuke Monno and Masatoshi Okutomi and Sho Suzuki and Takuji Gotoda and Kenji Miki},
year={2020},
eprint={2004.12288},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

Flowchart

Our best result was achieved by this following setting to train the image-to-image translation network (cycleGAN):

The flow of overall pipeline.

We train the cycleGAN to translate between green channel of no-IC images to red channel of IC images. We chose this setting because it was proven previously in our study that the green channel has the most number of extracted features in the case of no-IC dye and red channel has the most number of extracted features in the case of with IC dye. We also experimented with translating red channel of no-IC images to red channel of IC images with less preferable results.

Image-to-image translation results

Real Red channel no-IC
no-ICr in paper

Real red channel no-IC to virtual red channel with IC
VICr2r and cGAN_r₂r in paper

Real Green channel no-IC
no-ICg in paper

Real green channel no-IC to virtual red channel with IC
VICg2r and cGAN_g₂r in paper

Real red channel no-IC

Real red channel no-IC to virtual red channel with IC

Real green channel no-IC

Real green channel no-IC to virtual red channel with IC

As we can see from the results, both cGAN_r₂r and cGAN_g₂r can transfer the style of the IC-sprayed image to the input no-IC image, not only by transferring the IC pattern, but also either bringing up or down the overall surface brightness. However, if we see the red channel no-ICimage, we can observe that the surface looks fairly texture-less. It is hard even for convolutional neural networks~(CNN) to extract features from this kind of texture-less images. In other hand, the green channel no-IC image shows more textures, enabling slightly better style transfer.

Feature matching performance

First, we show the features extracted from three types of sequences bellow. The red dots in the video represent the extracted features. We can see that the virtual IC sequence generated from cGAN_g₂r has the most number of extracted features.

Real Green channel no-IC
no-ICg in paper

Real red channel no-IC to virtual red channel with IC
VICr2r and cGAN_r₂r in paper

Real green channel no-IC to virtual red channel with IC
VICg2r and cGAN_g₂r in paper

We then show the comparison of the average number of feature matches between the anchor frame and its 10 consecutive frames in the graphic bellow. The x-axis represents the relative time stamp and the y-axis represents the average number of matches calculated using 43 samples. It is clearly shown that the VIC images from cGAN_r₂r has a higher number of matches across frames.

Feature matches on green channel no-IC images

Feature matches on VIC images from cGAN_r₂r

Feature matches on VIC images from cGAN_g₂r

Reconstruction results

Please note that the embedded 3D model viewers show the inner texture of the stomach and with back-face culling mode. Also please note that the order of the subjects are different with the other page.

Subject A VIC input images. Red dots in each frame represent extracted feature points.

Subject B VIC input images. Red dots in each frame represent extracted feature points.

Subject C VIC input images. Red dots in each frame represent extracted feature points.

Subject D VIC input images. Red dots in each frame represent extracted feature points.

Subject E VIC input images. Red dots in each frame represent extracted feature points.

Subject F VIC input images. Red dots in each frame represent extracted feature points.

Subject G VIC input images. Red dots in each frame represent extracted feature points.