Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?

Accurate visual localization is a key technology for autonomous navigation. We demonstrate experimentally that large-scale 3D models are not strictly necessary for accurate visual localization. We create reference poses for a large and challenging urban dataset. Using these poses, we show that combining image-based methods with local reconstructions results in a pose accuracy similar to the state-of-the-art structure-based methods. Our results suggest that we might want to reconsider the current approach for accurate large-scale localization.

News

20/09/2019: New reference poses are available now.
03/10/2017: Evaluation package bug fix.
06/07/2017: Dataset release.
30/03/2017: Page open.

Publications

Torsten Sattler, Akihiko Torii, Josef Sivic, Marc Pollefeys, Hajime Taira, Masatoshi Okutomi, Tomas Pajdla: Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? CVPR 2017. [PDF] [bib].
Akihiko Torii, Hajime Taira, Josef Sivic, Marc Pollefeys, Masatoshi Okutomi, Tomas Pajdla, Torsten Sattler: Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? IEEE Trans. on Pattern Analysis and Machine Intelligence. [PDF] [bib].

Dataset

6DoF reference poses for San Francisco dataset.

If you want to compare with the results in CVPR2017 paper, Please use "reference_poses_442.txt" included in above, which is the exact set appeared in CVPR2017. "reference_poses_addTM_all_598.txt" is the extended dataset which will be appeared in our coming TPAMI paper.

Our referece poses are computed on top of the SF-0 model provided by Li, Snavely, et. al. For more details about their dataset, please refer http://landmark.cs.cornell.edu/
Evaluation package 03/10/2017

The following older versions have a bug on plotting the new results. When the result file list is shorter than 442, the graph shows unnecessarily good result.

~~Evaluation package 23/09/2017~~

~~Evaluation package 14/07/2017~~

Acknowledgement

This work was partly supported by EU-H2020 project LADIO No. 731970, JSPS KAKENHI Grant Number 15H05313, ERC grant LEAP (no. 336845), CIFAR Learning in Machines & Brains program and ESIF, OP Research, development and education project IMPACT No. CZ.02.1.01/0.0/0.0/15_003/0000468, and Google Tango.