Document and Text Processing
A.R. Mazochi; S. Bourbour; M. R. Ghofrani; S. Momtazi
Abstract
Converting a postal address to a coordinate, geocoding, is a helpful tool in many applications. Developing a geocoder tool is a difficult task if this tool relates to a developing country that does not follow a standard addressing format. The lack of complete reference data and non-persistency of names ...
Read More
Converting a postal address to a coordinate, geocoding, is a helpful tool in many applications. Developing a geocoder tool is a difficult task if this tool relates to a developing country that does not follow a standard addressing format. The lack of complete reference data and non-persistency of names are the main challenges besides the common natural language process challenges. In this paper, we propose a geocoder for Persian addresses. To the best of our knowledge, our system, TehranGeocode, is the first geocoder for this language. Considering the non-standard structure of Persian addresses, we need to split the address into small segments, find each segment in the reference dataset, and connect them to find the target of the address. We develop our system based on address parsing and dynamic programming for this aim. We specify the contribution of our work compared to similar studies. We discuss the main components of the program, its data, and its results and show that the proposed framework achieves promising results in the field by finding 83\% of addresses with less than 300 meters error.