Indoor Topological Localization Using a Visual Landmark Sequence

This paper presents a novel indoor topological localization method based on mobile phone videos. Conventional methods suffer from indoor dynamic environmental changes and scene ambiguity. The proposed Visual Landmark Sequence-based Indoor Localization (VLSIL) method is capable of addressing problems by taking steady indoor objects as landmarks. Unlike many feature or appearance matching-based localization methods, our method utilizes highly abstracted landmark sematic information to represent locations and thus is invariant to illumination changes, temporal variations, and occlusions. We match consistently detected landmarks against the topological map based on the occurrence order in the videos. The proposed approach contains two components: a convolutional neural network (CNN)-based landmark detector and a topological matching algorithm. The proposed detector is capable of reliably and accurately detecting landmarks. The other part is the matching algorithm built on the second order hidden Markov model and it can successfully handle the environmental ambiguity by fusing sematic and connectivity information of landmarks. To evaluate the method, we conduct extensive experiments on the real world dataset collected in two indoor environments, and the results show that our deep neural network-based indoor landmark detector accurately detects all landmarks and is expected to be utilized in similar environments without retraining and that VLSIL can effectively localize indoor landmarks.


The Problem

We propose a novel visual landmark sequence-based indoor localization (VLSIL) framework, and we first illustrate its basic idea. Suppose there is an indoor space that has seven locations as shown in Figure 1a. For each location, there is a landmark representing it as shown in Figure 1b and the color indicates the landmark type. Pedestrians can only walk from one location to the others linked by a path. Suppose pedestrians reach the location L(2) without knowing it and observe the red landmark. Their locations cannot be determined since there is more than one location denoted by the red landmark (e.g., LM(5) and LM(7)). Suppose pedestrians observe red, green, and blue landmarks in sequence in their path. They can be sure they start from LM(2), go through LM(4), and arrive at LM(6), because LM(2), LM(4), and LM(6) are the only valid path. The VLSIL achieves localization through taking photos (video) of a location to determine the current position by matching a sequence of previously discovered landmarks against the topological map of the space.


In this paper, we propose a robust landmark representation using sematic information. A CNN-based landmark detector is proposed to determine landmark type. Unlike previous approaches using handcrafted features, our detector learns the distinctive features to distinguish target objects and background. Moreover, it can be used for off-the-shelf scenes without changing. The learned features are not derived from a single space but from a combination of color, gradients, and geometric space. With a proper training dataset, it stays robust to landmark variations caused by illumination and other deformations. CNN was selected due to its high performance in image classification [22] and indoor scene recognition [23] and outperforms approaches based on handcrafted features.

We propose a matching algorithm based on a second order hidden Markov model (HMM2) to utilize landmark connecting information and sematic information for landmark recognition. An HMM2 is able to involve the walking direction in the process of landmark recognition. The walking direction is introduced to constrain the landmark connectivity. In this manner, more contextual information is taken into account for landmark localization, so indoor scene ambiguity is reduced.



Landmark detection performance

Our method correctly detected all landmarks in all routes. The ANN-based detector correctly detected landmarks in Route 2 and Route 3. Some walls were wrongly detected as doors in Routes 3, 5, 6, and 7. This demonstrates that our detector outperforms the detector using handcrafted features. Currently, the proposed method cannot be achieved in real time. The majority of time is spent on landmark detection. Although the average time of classifying an image is short using our convolutional neural network (about 0.012 s on our machine), the average time to process a landmark image is about7 s. The process is time-consuming for two reasons. Firstly, we choose an effective selective search algorithm to generate patches from landmark images, which costs about 3–4 s to generate reliable patches. Secondly, we feed 300 patches of a landmark image to the network to correctly detect landmarks, which takes an extra 3 s. It should be noted that the detection process can be optimized with the development of object detection technologies in computer vision.

Localization Performance

We further draw comparisons with the HMM-based method in two situations, and the statistical results are shown in Table 5. The number of possible paths is used to report the comparison result. It is notable that the HMM fails to localize all landmark sequences without a known start and only Route 5 is accurately localized given the start position. In addition, our method outperforms the HMM-based method in seven routes with the same conditions.


  1. Ranganathan, P.; Hayet, J.B.; Devy, M.; Hutchinson, S.; Lerasle, F. Topological navigation and qualitative localization for indoor environment using multi-sensory perception. Auton. Syst. 2002, 41, 137–144. [CrossRef] 

  2. Cheng, H.; Chen, H.; Liu, Y. Topological Indoor Localization and Navigation for Autonomous Mobile Robot. IEEE Trans. Autom. Sci. Eng. 2015, 12, 729–738. [CrossRef] 

  3. Bradley, D.M.; Patel, R.; Vandapel, N.; Thayer, S.M. Real-time image-based topological localization in large outdoor environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 3670–3677. 

  4. Becker, C.; Salas, J.; Tokusei, K.; Latombe, J.C. Reliable navigation using landmarks. In Proceedings of the 1995 IEEE International Conference on Robotics and Automation, Nagoya, Japan, 21–27 May 1995; Volume 1, pp. 401–406. 

  5. Kosecka, J.; Zhou, L.; Barber, P.; Duric, Z. Qualitative image based localization in indoors environments. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 18–20 June 2003; Volume 2, pp. 3–8. 

  6. Li, Q.; Zhu, J.; Liu, T.; Garibaldi, J.; Li, Q.; Qiu, G. Visual landmark sequence-based indoor localization. In Proceedings of the 1st Workshop on Artificial Intelligence and Deep Learning for Geographic Knowledge Discovery, Los Angeles, CA, USA, 7–10 November 2017; pp. 14–23. 

  7. Ahn, S.J.; Rauh, W.; Recknagel, M. Circular coded landmark for optical 3D-measurement and robot vision. In Proceedings of the 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems, Kyongju, Korea, 17–21 October 1999; Volume 2, pp. 1128–1133. 

Remote Sens. 2019, 11, 73 22 of 24

  1. Jang, G.; Lee, S.; Kweon, I. Color landmark based self-localization for indoor mobile robots. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation, Washington, DC, USA, 11–15 May 2002; Volume 1, pp. 1037–1042. 

  2. Basiri, A.; Amirian, P.; Winstanley, A. The use of quick response (qr) codes in landmark-based pedestrian navigation. J. Navig. Obs. 2014, 2014, 897103. [CrossRef] 

  3. Briggs, A.J.; Scharstein, D.; Braziunas, D.; Dima, C.; Wall, P. Mobile robot navigation using self-similar landmarks. In Proceedings of the IEEE International Conference on Robotics and Automation, San Francisco, CA, USA, 24–28 April 2000; Volume 2, pp. 1428–1434. 

  4. Hayet, J.B.; Lerasle, F.; Devy, M. A visual landmark framework for indoor mobile robot navigation. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation, Washington, DC, USA, 11–15 May 2002; Volume 4, pp. 3942–3947. 

  5. Ayala, V.; Hayet, J.B.; Lerasle, F.; Devy, M. Visual localization of a mobile robot in indoor environments using planar landmarks. In Proceedings of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems, Takamatsu, Japan, 31 October–5 November 2000; Volume 1, pp. 275–280. 

  6. Tian, Y.; Yang, X.; Yi, C.; Arditi, A. Toward a computer vision-based wayfinding aid for blind persons to access unfamiliar indoor environments. Vis. Appl. 2013, 24, 521–535. [CrossRef] [PubMed] 

  7. Chen, K.C.; Tsai, W.H. Vision-based autonomous vehicle guidance for indoor security patrolling by a SIFT-based vehicle-localization technique. IEEE Trans. Veh. Technol. 2010, 59, 3261–3271. [CrossRef] 

  8. Bai, Y.; Jia, W.; Zhang, H.; Mao, Z.H.; Sun, M. Landmark-based indoor positioning for visually impaired individuals. In Proceedings of the 2014 12th International Conference on Signal Processing, Hangzhou, China, 19–23 October 2014; pp. 668–671. 

  9. Serrão, M.; Rodrigues, J.M.; Rodrigues, J.; du Buf, J.H. Indoor localization and navigation for blind persons using visual landmarks and a GIS. Procedia Comput. Sci. 2012, 14, 65–73. [CrossRef] 

  10. Kawaji, H.; Hatada, K.; Yamasaki, T.; Aizawa, K. Image-based indoor positioning system: Fast image matching using omnidirectional panoramic images. In Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis, Firenze, Italy, 29 October 2010; pp. 1–4. 

  11. Zitová, B.; Flusser, J. Landmark recognition using invariant features. Pattern Recognit. Lett. 1999, 20, 541–547. [CrossRef] 

  12. Pinto, A.M.G.; Moreira, A.P.; Costa, P.G. Indoor localization system based on artificial landmarks and monocular vision. TELKOMNIKA Telecommun. Comput. Electron. Control 2012, 10, 609–620. [CrossRef] 

  13. Lin, G.; Chen, X. A Robot Indoor Position and Orientation Method based on 2D Barcode Landmark. JCP 2011, 6, 1191–1197. [CrossRef] 

  14. Kosmopoulos, D.I.; Chandrinos, K.V. Definition and Extraction of Visual Landmarks for Indoor Robot Navigation; 
Springer: Berlin/Heidelberg, Germany, 2002; pp. 401–412. 

  15. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and 
semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 
Columbus, OH, USA, 23–28 June 2014; pp. 580–587. 

  16. Zhou, B.; Lapedriza, A.; Xiao, J.; Torralba, A.; Oliva, A. Learning deep features for scene recognition using 
places database. In Advances in Neural Information Processing Systems; 2014; pp. 487–495. Available online: (accessed on 3 January 2019). 

  17. Werner, M.; Kessel, M.; Marouane, C. Indoor positioning using smartphone camera. In Proceedings of 
the 2011 International Conference on Indoor Positioning and Indoor Navigation, Guimaraes, Portugal, 
21–23 September 2011; pp. 1–6. 

  18. Liang, J.Z.; Corso, N.; Turner, E.; Zakhor, A. Image based localization in indoor environments. In Proceedings 
of the 2013 Fourth International Conference on Computing for Geospatial Research and Application, San Jose, 
CA, USA, 22–24 July 2013; pp. 70–75. 

  19. Chen, C.; Yang, B.; Song, S.; Tian, M.; Li, J.; Dai, W.; Fang, L. Calibrate Multiple Consumer RGB-D Cameras 
for Low-Cost and Efficient 3D Indoor Mapping. Remote Sens. 2018, 10, 328. [CrossRef] 

  20. Zhao, P.; Hu, Q.; Wang, S.; Ai, M.; Mao, Q. Panoramic Image and Three-Axis Laser Scanner Integrated 
Approach for Indoor 3D Mapping. Remote Sens. 2018, 10, 1269. [CrossRef] 

  21. 28.
  22. 29.
  23. 30.31. 32. 33.
  24. 34.35. 36. 37. 38. 39.
  25. 40.41. 42. 43. 44.
  26. 45
  27. 46
  28. 47.48.
  29. Lu, G.; Kambhamettu, C. Image-based indoor localization system based on 3d sfm model. In IS&T/SPIE Electronic Imaging; International Society for Optics and Photonics, 2014; p. 90250H. Available online: based_on_3D_SfM_model (accessed on 3 January 2019).
  30. Van Opdenbosch, D.; Schroth, G.; Huitl, R.; Hilsenbeck, S.; Garcea, A.; Steinbach, E. Camera-based indoor positioning using scalable streaming of compressed binary image signatures. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 2804–2808. Hile, H.; Borriello, G. Positioning and orientation in indoor environments using camera phones. IEEE Comput. Gr. Appl. 2008, 28. [CrossRef]
  31. Mulloni, A.; Wagner, D.; Barakonyi, I.; Schmalstieg, D. Indoor positioning and navigation with camera phones. IEEE Pervasive Comput. 2009, 8, 22–31. [CrossRef]
Lu, G.; Yan, Y.; Sebe, N.; Kambhamettu, C. Indoor localization via multi-view images and videos. Vis. Image Understand. 2017, 161, 145–160. [CrossRef]
  32. Lu, G.; Yan, Y.; Ren, L.; Saponaro, P.; Sebe, N.; Kambhamettu, C. Where am i in the dark: Exploring active transfer learning on the use of indoor localization based on thermal imaging. Neurocomputing 2016, 173, 83–92. [CrossRef]
Piciarelli, C. Visual indoor localization in known environments. IEEE Signal Process. Lett. 2016, 23, 1330–1334. [CrossRef]
  33. Vedadi, F.; Valaee, S. Automatic Visual Fingerprinting for Indoor Image-Based Localization Applications. IEEE Trans. Syst. Man Cybern. Syst. 2017. [CrossRef]
Lee, N.; Kim, C.; Choi, W.; Pyeon, M.; Kim, Y. Development of indoor localization system using a mobile data acquisition platform and BoW image matching. KSCE J. Civ. Eng. 2017, 21, 418–430. [CrossRef]
  34. Chen, Z.; Zou, H.; Jiang, H.; Zhu, Q.; Soh, Y.C.; Xie, L. Fusion of WiFi, smartphone sensors and landmarks using the Kalman filter for indoor localization. Sensors 2015, 15, 715–732. [CrossRef]
Deng, Z.A.; Wang, G.; Qin, D.; Na, Z.; Cui, Y.; Chen, J. Continuous indoor positioning fusing WiFi, smartphone sensors and landmarks. Sensors 2016, 16, 1427. [CrossRef]
  35. Gu, F.; Khoshelham, K.; Shang, J.; Yu, F. Sensory landmarks for indoor localization. In Proceedings of the 2016 Fourth International Conference on Ubiquitous Positioning, Indoor Navigation and Location Based Services (UPINLBS), Shanghai, China, 2–4 November 2016; pp. 201–206.
Millonig, A.; Schechtner, K. Developing landmark-based pedestrian-navigation systems. IEEE Trans. Intell. Transp. Syst. 2007, 8, 43–49. [CrossRef]
  36. Betke, M.; Gurvits, L. Mobile robot localization using landmarks. IEEE Trans. Robot. Autom. 1997, 13, 251–263. [CrossRef]
Boada, B.L.; Blanco, D.; Moreno, L. Symbolic place recognition in voronoi-based maps by using hidden markov models. Intell. Robot. Syst. 2004, 39, 173–197. [CrossRef]
  37. Zhou, B.; Li, Q.; Mao, Q.; Tu, W.; Zhang, X. Activity sequence-based indoor pedestrian localization using smartphones. IEEE Trans. Hum.-Mach. Syst. 2015, 45, 562–574. [CrossRef]
Kosecká, J.; Li, F. Vision based topological Markov localization. In Proceedings of the IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, 26 April–1 May 2004; Volume 2, pp. 1481–1486.
  38. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems; 2015; pp. 91–99. Available online: https:
  39. // (accessed on 3 January 2019).
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; 2012; pp. 1097–1105. Available online: https://papers. (accessed on 3 January 2019).
Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. J. Comput. Vis. 2013, 104, 154–171. [CrossRef]
Thede, S.M.; Harper, M.P. A second-order hidden Markov model for part-of-speech tagging. In Proceedings of the the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, MD, USA, 20–26 June 1999; pp. 175–182.
  40. Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 675–678. 

  41. Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. J. Comput. Vis. 2001, 42, 145–175. [CrossRef]