Document Type : Applied Article

Authors

1 Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran.

2 Tehran Center for Urban Statistics and Observatory, Tehran, Iran.

Abstract

Converting a postal address to a coordinate, geocoding, is a helpful tool in many applications. Developing a geocoder tool is a difficult task if this tool relates to a developing country that does not follow a standard addressing format. The lack of complete reference data and non-persistency of names are the main challenges besides the common natural language process challenges. In this paper, we propose a geocoder for Persian addresses. To the best of our knowledge, our system, TehranGeocode, is the first geocoder for this language. Considering the non-standard structure of Persian addresses, we need to split the address into small segments, find each segment in the reference dataset, and connect them to find the target of the address. We develop our system based on address parsing and dynamic programming for this aim. We specify the contribution of our work compared to similar studies. We discuss the main components of the program, its data, and its results and show that the proposed framework achieves promising results in the field by finding 83\% of addresses with less than 300 meters error.

Keywords

Main Subjects

[1] A. Kebe, R. Faye, and C. Lishou, “Multi agent-based addresses geocoding for more efficient home delivery service in developing countries,” in e-infrastructure and e-services for developing countries, Springer International Publishing, 2019, pp. 294-304.
[2] S. Khan, L. Pinault, M. Tjepkema, and R. Wilkins, “Positional accuracy of geocoding from residential postal codes versus full street addresses,” Health reports, vol. 29, no. 2, pp. 3-9, 2018.
[3] M. Ghayoomi, S. Momtazi, and M. Bijankhan, “A study of corpus development for Persian,” International Journal on Asian Language Processing, vol. 20, no. 1, pp. 17-33, 2010.
[4] O. Charif, H. Omrani, O. Klein, M. Schneider, and P. Trigano, “A method and a tool for geocoding and record linkage,” in 2010 second iita international conference on geoscience and remote sensing, 2010, vol. 1, pp. 356-359.
[5] T. R. Cortes, I. H. D. Silveira, and W. L. Junger, “Improving geocoding matching rates of structured addresses in Rio de Janeiro, Brazil,” Cadernos de Saúde Pública, vol. 37, 2021.
[6] A. Kebe, R. Faye, and C. Lishou, “A multi-agent-based approach for address geocoding in poorly mapped areas through public company data,” International Journal of Information Technology and Applied Sciences (IJITAS), vol. 3, no. 1, pp. 1-9, 2021.
[7] M. E. I. Malaainine, and H. Lechgar, “Conception of geocoding matching algorithm for Casablanca City-Morocco,” in 2020 ieee international conference of moroccan geomatics (morgeo), 2020, pp. 1-4.
[8] D. K. Matci, and U. Avdan, “Address standardization using the natural language process for improving geocoding results,” Computers, Environment and Urban Systems, vol. 70, pp. 1-8. 2018.
 
[9] F. Alcántara, A. Molina-Villegas, and V. Muñiz, “A vector semantics approach to the geoparsing disambiguation task for texts in Spanish,” in Proceedings of the 1st international conference on geospatial information sciences, 2019, vol. 13, pp. 47-55.
[10] B.  Kilic, and F. Gülgen, “Investigating the quality of reverse geocoding services using text similarity techniques and logistic regression analysis,” Cartography and Geographic Information Science, vol. 47, no. 4, pp. 336-349, 2020.
[11] K. Lee, A. R. C. Claridades, and J. Lee, “Improving a street-based geocoding algorithm using machine learning techniques,” Applied Sciences, vol. 10, no. 16, p. 5628, 2020.
[12] P. Li, A. Luo, J. Liu, Y. Wang, J. Zhu, Y. Deng, and J. Zhang, “Bidirectional gated recurrent unit neural network for Chinese address element segmentation,” ISPRS International Journal of Geo-Information, vol. 9, p. 635, 2020.
[13] L. Nizzoli, M. Avvenuti, M. Tesconi, and S. Cresci, “Geo-semantic-parsing: Ai-powered geoparsing by traversing semantic knowledge graphs,” Decision Support Systems, vol. 136, p. 113346. 2020.
[14] Z. Yan, C. Yang, L. Hu, J. Zhao, L. Jiang, and J. Gong, “The integration of linguistic and geospatial features using global context embedding for automated text geocoding”, ISPRS International Journal of Geo-Information, vol. 10, no. 9, p. 572, 2021.
[15] A. P. Wheeler, M. Gerell, and Y. Yoo, “Testing the spatial accuracy of address-based geocoding for gunshot locations”, The Professional Geographer, vol. 72, no. 3, pp. 398-410, 2020.
[16] B. Wilson, and N. Wilson, “An iterative approach to the parcel level address geocoding of a large health dataset to a shifting household geography,” Working paper, Center for Economic Information, University of Missouri-Kansas City, 2017.
[17] B. kilic, and F. Gülgen, “Accuracy and similarity aspects in online geocoding services: A comparative evaluation for Google and Bing maps,” International Journal of Engineering and Geosciences, vol. 5, pp. 109-119, 2020.
[18] V. Cetl, T. Kliment, and T. Jogun, “A comparison of address geocoding techniques case study of the city of Zagreb, Croatia,” Survey Review, vol. 50, no. 359, pp. 97-106, 2018.
[19] L. Li, W. Wang, B. He, and Y. Zhang, “A hybrid method for Chinese address segmentation,” International Journal of Geographical Information Science, vol. 32, no. 1, pp. 30-48. 2018.
[20] G. Dumedah, N. N. Y. Binche, G. M. Bob-Milliar, S. Iddrisu, E. K. Twumasi, and J. A. Boateng, “The case of electoral polling station data for geocoding in facilitating accessibility to social, economic and cultural opportunities in Ghana,” African Geographical Review, pp. 1-14, 2022.  
[21] H. L. Nguyen, D. Tsolak, A. Karmann, S. Knauff, and S. Kühne, “Efficient and reliable geocoding of German twitter data to enable spatial data linkage to official statistics and other data sources,” Frontiers in sociology, vol. 7, 2022.  
[22] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 26th international conference on neural information processing systems, Curran Associates Inc. 2013, vol. 2, pp. 3111-3119.
[23] Q. Le, and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st international conference on international conference on machine learning JMLR.org, 2014, vol. 32, pp. II-1188-II-1196.
[24] S. Rashidian, X. Dong, A. Avadhani, P. Poddar, and F. Wang, “Effective scalable and integrative geocoding for massive address datasets,” in Proceedings of the 25th acm sigspatial international conference on advances in geographic information systems, Association for Computing Machinery, 2017.
[25] S. Rashidian, X. Dong, S. K. Jain, and F. Wang, “Easergeocoder: integrative geocoding with machine learning (demo paper),” in Proceedings of the 26th acm sigspatial international conference on advances in geographic information systems, Association for Computing Machinery, 2018, pp. 572-575.
 
 
 
 
 
 
 
 
 
 
 
 
[26] K. Clemens, “Deriving spelling variants from user queries to improve geocoding accuracy,” in Proceedings of the 5th international conference on geographical information systems theory, applications and management, GISTAM 2019, Heraklion, Crete, Greece, May 3-5, 2019, pp. 53-59.
[27] N. Firouraghi, N. Bagheri, F. Kiani, L. Goshayeshi, M. GhayourMobarhan, K. Kimiafar, S. Eslami, and B. Kiani, “A spatial database of colorectal cancer patients and potential nutritional risk factors in an urban area in the middle east,” BMC Research Notes, vol. 13, no. 1, pp. 1-3, 2020.
[28] L. Goshayeshi, A. Pourahmadi, M. Ghayour-Mobarhan, S. Hashtarkhani, S. Karimian, R. S. Dastjerdi, B. Eghbali, E. Seyfi, and B. Kiani, “Colorectal cancer risk factors in north-eastern iran: A retrospective cross-sectional study based on geographical information systems, spatial autocorrelation, and regression analysis,” Geospatial health, vol. 14, no. 2, 2019.
[29] H. Shabanikiya, S. Hashtarkhani, R. Bergquist, N. Bagheri, R. VafaeiNejad, M. Amiri-Gholanlou, T. Akbari, and B. Kiani, “Multiple-scale spatial analysis of paediatric, pedestrian road traffic injuries in a major city in north-eastern iran 2015–2019,” BMC public health, vol. 20, no. 1, pp. 1-11, 2020.
[30] P. Tabari, H. Shabanikiya, N. Bagheri, R. Bergquist, S. Hashtarkhani, F. Kiani, A. Mohammadi, B. Kiani, “Paediatric, pedestrian road traffic injuries in the city of Mashhad in north-eastern Iran 2015–2019: a data note,” BMC research notes, vol. 13, pp. 1-3, 2020.
[31] H. Khodadadi, and V.Derhami, “Improving Speed and Efficiency of Dynamic Programming Methods through Chaos,” Journal of AI and Data Mining, Shahrood University of Technology, vol. 9, no. 4, pp. 487-496, 2021.
[32] J. J. Rebello, “Matlab central file exchange,” 2021. [Online]. Available: https://www.mathworks.com/matlabcentral/ fileexchange/47359-viterbi-algorithm