Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?


Accurate visual localization is a key technology for autonomous navigation. We demonstrate experimentally that large-scale 3D models are not strictly necessary for accurate visual localization. We create reference poses for a large and challenging urban dataset. Using these poses, we show that combining image-based methods with local reconstructions results in a pose accuracy similar to the state-of-the-art structure-based methods. Our results suggest that we might want to reconsider the current approach for accurate large-scale localization.


  • 20/09/2019: New reference poses are available now.
  • 03/10/2017: Evaluation package bug fix.
  • 06/07/2017: Dataset release.
  • 30/03/2017: Page open.



  1. 6DoF reference poses for San Francisco dataset.

  2. If you want to compare with the results in CVPR2017 paper, Please use "reference_poses_442.txt" included in above, which is the exact set appeared in CVPR2017. "reference_poses_addTM_all_598.txt" is the extended dataset which will be appeared in our coming TPAMI paper.

  3. Our referece poses are computed on top of the SF-0 model provided by Li, Snavely, et. al. For more details about their dataset, please refer

  4. Evaluation package 03/10/2017

The following older versions have a bug on plotting the new results. When the result file list is shorter than 442, the graph shows unnecessarily good result.

Evaluation package 23/09/2017

Evaluation package 14/07/2017


This work was partly supported by EU-H2020 project LADIO No. 731970, JSPS KAKENHI Grant Number 15H05313, ERC grant LEAP (no. 336845), CIFAR Learning in Machines & Brains program and ESIF, OP Research, development and education project IMPACT No. CZ.02.1.01/0.0/0.0/15_003/0000468, and Google Tango.