Document Type : Original/Review Paper


1 Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran.

2 School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran.



When a camera moves in an unfamiliar environment, for many computer vision and robotic applications it is desirable to estimate camera position and orientation. Camera tracking is perhaps the most challenging part of Visual Simultaneous Localization and Mapping (Visual SLAM) and Augmented Reality problems. This paper proposes a feature-based approach for tracking a hand-held camera that moves within an indoor place with a maximum depth of around 4-5 meters. In the first few frames the camera observes a chessboard as a marker to bootstrap the system and construct the initial map. Thereafter, upon arrival of each new frame, the algorithm pursues the camera tracking procedure. This procedure is carried-out in a framework, which operates using only the extracted visible natural feature points and the initial map. Constructed initial map is extended as the camera explores new areas. In addition, the proposed system employs a hierarchical method on basis of Lucas-Kanade registration technique to track FAST features. For each incoming frame, 6-DOF camera pose parameters are estimated using an Unscented Kalman Filter (UKF). The proposed algorithm is tested on real-world videos and performance of the UKF is compared against other camera tracking methods. Two evaluation criteria (i.e. Relative pose error and absolute trajectory error) are used to assess performance of the proposed algorithm. Accordingly, reported experimental results show accuracy and effectiveness and of the presented approach. Conducted experiments also indicate that the type of extracted feature points has not significant effect on precision of the proposed approach.


[1] F.e. Ababsa and M. Mallem, "Robust camera pose estimation using 2d fiducials tracking for real-time augmented reality systems," in Proceedings of the ACM SIGGRAPH international conference on Virtual Reality continuum and its applications in industry, Singapore, 2004, pp. 431-435.
[2] K. Xu, K. W. Chia, and A. D. Cheok, "Real-time camera tracking for marker-less and unprepared augmented reality environments," Image and Vision Computing, Vol. 26, pp. 673-689, 2008.
[3] Z. Dong, G. Zhang, J. Jia, and H. Bao, "Efficient keyframe-based real-time camera tracking," Computer Vision and Image Understanding, vol. 118, pp. 97-110, 2014.
[4] L. A. Clemente, A. J. Davison, I. D. Reid, J. Neira, and J. D. Tardós, "Mapping Large Loops with a Single Hand-Held Camera," in Proceedings of Robotics: Science and Systems, Atlanta, GA, USA, 2007, pp. 352-360.
[5] E. Eade and T. Drummond, "Unified Loop Closing and Recovery for Real Time Monocular SLAM," in Proceedings of the British Machine Vision Conference, Leeds, UK, 2008, pp. 136-145.
[6] O. Guclu and A. B. Can, "Fast and Effective Loop Closure Detection to Improve SLAM Performance," Journal of Intelligent & Robotic Systems, Vol. 93, pp. 495-517, 2019/03/01 2019.
[7] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd Ed. New York, NY, USA: Cambridge University Press, 2003.
[8] A. W. Fitzgibbon and A. Zisserman, "Automatic camera recovery for closed or open image sequences," in Proceedings of 5th European Conference on Computer Vision, Freiburg, Germany, 1998, pp. 311-326.
[9] M. Pollefeys, R. Koch, and L. Van Gool, "Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters," in Sixth International Conference on Computer Vision, Bombay, India, 1998, pp. 90-95.
[10] G. Klein and D. Murray, "Parallel Tracking and Mapping for Small AR Workspaces," in Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 2007, pp. 1-10.
[11] T. Pire, T. Fischer, G. Castro, P. De Cristóforis, J. Civera, and J. Jacobo Berlles, "S-PTAM: Stereo Parallel Tracking and Mapping," Robotics and Autonomous Systems, Vol. 93, pp. 27-42, 2017/07/01/ 2017.
[12] R. Mur-Artal, J. M. M. Montiel, J. D. Tard, and x00F, "ORB-SLAM: A Versatile and Accurate Monocular SLAM System," IEEE Transactions on Robotics, Vol. 31, pp. 1147-1163, 2015.
[13] R. Mur-Artal and J. D. Tardós, "Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras," IEEE Transactions on Robotics, Vol. 33, pp. 1255-1262, 2017.
[14] J. Engel, T. Schöps, and D. Cremers, "LSD-SLAM: Large-Scale Direct Monocular SLAM," in Proceedings of 13th European Conference on Computer Vision, Zurich, Switzerland, , 2014, pp. 834-849.
[15] F. Endres, J. Hess, J. Sturm, D. Cremers, and W. Burgard, "3-D Mapping With an RGB-D Camera," IEEE Transactions on Robotics, vol. 30, pp. 177-187, 2014.
[16] O. Guclu and A. B. Can, "k-SLAM: A fast RGB-D SLAM approach for large indoor environments," Computer Vision and Image Understanding, Vol. 184, pp. 31-44, 2019/07/01/ 2019.
[17] M. Maidi, F. Ababsa, M. Mallem, and M. Preda, "Hybrid tracking system for robust fiducials registration in augmented reality," Signal, Image and Video Processing, vol. 9, pp. 831-849, 2015.
[18] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, "MonoSLAM: Real-Time Single Camera SLAM," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, pp. 1052-1067, 2007.
[19] B. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon, "Bundle Adjustment — A Modern Synthesis," in International workshop on vision algorithms, Corfu, Greece, 1999, pp. 298-372.
[20] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, "g 2 o: A general framework for graph optimization," in IEEE International Conference on Robotics and Automation, Shanghai, China, 2011, pp. 3607-3613.
[21] E. A. Wan and R. V. D. Merwe, "The unscented Kalman filter for non-linear estimation," in Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373), 2000, pp. 153-158.
[22] E. Rosten and T. Drummond, "Fusing points and lines for high performance tracking," in Tenth IEEE International Conference on Computer Vision, Beijing, China, 2005, pp. 1508-1515.
[23] J.-Y. Bouguet, "Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm," 2001.
[24] B. D. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," in Proceedings of the 7th international joint conference on Artificial intelligence, Vancouver, BC, Canada, 1981, pp. 674-679.
[25] M. A. Fischler and R. C. Bolles, "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography," Communications of the ACM, Vol. 24, pp. 381-395, 1981.
[26] Q. Long and L. Zhongdan, "Linear N-point camera pose determination," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, pp. 774-780, 1999.
[27] A. J. Davison, "Real-Time Simultaneous Localisation and Mapping with a Single Camera," in Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 2003, pp. 1403-1410.
[28] E. Eade and T. Drummond, "Scalable Monocular SLAM," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, pp. 469-476.
[29] Z. Zhengyou, "A flexible new technique for camera calibration," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, pp. 1330-1334, 2000.
[30] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, "A benchmark for the evaluation of RGB-D SLAM systems," in IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 2012, pp. 573-580.
[31] C. Harris and M. Stephens, "A combined corner and edge detector," in Proceedings of the 4th Alvey Vision Conference, 1988, pp. 147–151.
[32] J. Shi and C.Tomasi "Good features to track," in 9th IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, 1994, pp. 593-600.