Document Type : Methodologies


Faculty of Engineering and Technology, Alzahra University, Tehran, Iran.


Digital images are being produced in a massive number every day. A

component that may exist in digital images is text. Textual information can be

extracted and used in a variety of fields. Noise, blur, distortions, occlusion, font

variation, alignments, and orientation, are among the main challenges for text

detection in natural images. Despite many advances in text detection algorithms,

there is not yet a single algorithm that addresses all of the above problems

successfully. Furthermore, most of the proposed algorithms can only detect

horizontal texts and a very small fraction of them consider Farsi language. In

this paper, a method is proposed for detecting multi-orientated texts in both Farsi

and English languages. We have defined seven geometric features to distinguish

text components from the background and proposed a new contrast enhancement

method for text detection algorithms. Our experimental results indicate that the

proposed method achieves a high performance in text detection on natural images.


Main Subjects

[1] H. Chen, S.S. Tsai, G. Schroth, D.M. Chen, R. Grzeszczuk, and B.Girod, “Robust Text Detection in Natural Images With Edge-Enhanced Maximally Stable Extremal Regions,” in Proc. 18th IEEE Int. Conf. Image Processing (ICIP), pp. 2609-2612, Sep.(2011).
[2] Q. Ye and D. Doermann, “Scene Text Detection via Integrated Discrimination of Component Appearance and Consensus,” Camera-Based Document Analysis and Recognition CBDAR, Lecture Notes in Computer Science, Vol. 8357. Springer, Cham, pp 47-59 (2013).
[3] S. Liu, Y. Xian, H. Li, and Z. Yu, “Text Detection in Natural Scene Images Using Morphological Component Analysis and Laplacian Dictionary,” IEEE/CAA Journal of Automatica Sinica (2017).
[4] S. Lee, M.S. Cho, K. Jungz, and J.H. Kim, “Scene Text Extraction with Edge Constraint and Text Collinearity in Pattern Recognition,” (ICPR) 20th International Conference on. IEEE, pp. 3983-3986 (2010).
[5] L. Neumann, and J. Matas, “Real-time Lexicon-free Scene Text Localization and Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 38, No. 9, pp. 1872-1885 (2016).
[6] J.L. Field and E.G. Learned-Miller, “Improving Open-Vocabulary Scene Text Recognition,” in Proc. IEEE Int. Conf. Document Analysis and Recognition, pp. 604-608, (2013)
[7] Y.X. Liu, and T. Ikenaga, “A Contour-based Robust Algorithm for Text Detection in Color Images,” IEICE Trans. on Information and Systems, Vol. 89(3), pp. 1221–1230 (2006).
[8] H. Zhang, K. Zhao, Y. ZheSong, and J. Guo, “Text Extraction from Natural Scene Image: A survey,” Neuro-computing, Vol. 122, pp. 310–323 (2013).
[9] X. Chen, J. Yang, J. Zhang, and A. Waibel, “Automatic Detection and Recognition of Signs from Natural Scenes,” IEEE Trans. Image Processing, Vol. 13, No. 1, pp. 87–99 (2004).
[10] N. Nikolaou, and N. Papamarkos, “Color Reduction for Complex Document Images,” Int. J. Imaging Systems and Technology, Vol. 19, pp. 14–26 (2009).
[11] V. Wu, R. Manmatha, and E.M. Riseman, “Text Finder: An Automatic System to Detect and Recognize Text in Images,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 21, No. 11, pp. 1224–229 (1999).
[12] X. Liu, and J. Samarabandu, “Mult-scale Edge-based Text Extraction from Complex Images,” in IEEE International Conference on Multimedia and Expo, pp. 1721–1724 (2006).
[13] W. Ou, J. Zhu, and C. Liu, “Text Location in Natural Scene,” Journal of Chinese Information Processing (2004).
[14] S. Yousfi, A. Berrani, and C. Garcia, “Boosting-based Approaches for Arabic Text Detection in News Videos,” in 11th IAPR International Workshop on Document Analysis Systems (DAS’14), Tours, France (2014).
[15] V.N.M. Aradhya, and M.S. Pavithra, “A Comprehensive of Transforms, Gabor filter and k-means Clustering for Text Detection in Images and Video,” Applied Computing and Informatics Vol. 12, pp. 109–116 (2016).
[16] H. Goto, and M. Tanaka, “Text-Tracking Wearable Camera System for The Blind,” in Proc. IEEE Int. Conf. Document Analysis and Recognition, pp. 141–145 (2009).
[17] K.I. Kim, K. Jung, and H. Kim, “Texture-based Approach for Text Detection in Images using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 25, No. 12, pp. 1631–1639 (2003).
[18] Y.F. Pan, C.L. Liu, and X.Hou, “Fast Scene Text Localization by Learning-based Filtering and Verification,” in 17th IEEE International Conference on Image Processing (ICIP), IEEE, pp. 2269–2272 (2010).
[19] S. Lucas, “ICDAR 2005 Text Locating Competition Results,” in Eight Int. Conf. Document Analysis and Recognition (2005)
[20] “Image evaluation campaign,” Imag. Eval. (2006) [online] Available:
[21] T. Retornaz, and B. Marcotegui, “Scene Text Localization based on the Ultimate Opening,” in Int. Symposium on Mathematical Morphology, Vol. 1, pp. 177-188 (2007).
[22] K. Wang, and J.A. Kangas, “Character Location in Scene Images from Digital Camera,” Pattern Recognition, Vol. 36 (10), pp. 2287–2299 (2003).
[23] M. Zhao, S. Li, and J. Kwok, “Text Detection in Images using Sparse Representation with Discriminative Dictionaries,” Image and Vision Computing, Vol. 28 (12), pp. 1590–1599 (2010).
[24] X. Zhao, K.H. Lin, Y. Fu, Y. Hu, Y. Liu, and T.S. Huang, “Text from Corners: A Novel Approach to Detect Text and Caption in Videos,” IEEE Trans. Image Processing, Vol. 20, No. 3, pp. 790-799 (2011).
[25] P. Shivakumara, T.Q. Phan, and C.L. Tan, “A Laplacian Approach to Multi-oriented Text Detection in Video,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 33 (2), pp. 412–419 (2011).
[26] K.L. Bouman, G. Abdollahian, M. Boutin, and E.J. Delp, “A Low Complexity Sign Detection and Text Localization Method for Mobile Applications,” IEEE Trans. on multimedia, Vol. 13, No. 5, Oct. (2011).
[27] M. Moradi, S. Mozaffari, and A. Oruji, “Farsi/Arabic Text Extraction from Video Images by Corner Detection,” in The 6th Iranian Machin Vision and Image Processing Conference (2010).
[28] X.C. Yin, W.Y. Pei, J. Zhang, and H.W. Hao, “Multi-orientation Scene text Detection with Adaptive Clustering,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 1, pp. 1–1.
[29] L. Kang, Y. Li, and D. Doermann, “Orientation Robust Text Line Detection in Natural Images,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, pp. 4034-4041 (2014)
[30] S. Kumar, and A. Perrault, “Text Detection on Nokia N900 using Stroke Width Transform,” (2011) [online] available:
[31] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in Proc. IEEE Int. Conf. Pattern Recognition, pp. 2963–2970 (2010).
[32] A. Mosleh, N. Bouguila, and A. Ben Hamza, “Image Text Detection using A Bandlet-based Edge Detector and Stroke Width Transform,” in Proc. Conf. British Machine Vision, pp. 1–2 (2012).
[33] C. Yao, X. Bai, and W. Liu, “A Unified Framework for Multi-oriented Text Detection and Recognition,” IEEE Trans. On Image Processing, Vol. 23 (11), pp. 4737–4749 (2014).
[34] W. Huang, Z. Lin, J.C. Yang, and J. Wang, “Text Localization in Natural Images using Stroke Feature Transform and Text Covariance Descriptors,” in Proc. Int. Conf. IEEE on Computer Vision, pp. 1241–1248 (2013).
[35] L. Neumann, and J. Matas, “Scene Text Localization and Recognition with Oriented Stroke Detection,” in IEEE Int. Conf. Computer Vision (ICCV), pp. 97-104 (2013).
[36] S. Mansouri, M. Charhad, and M. Zrigui, “A Heuristic Approach to Detect and Localize Text in Arabic News Video,” Computación y Sistemas, Vol. 22, No. 1, pp. 75–82 (2018).
[37] M. Darab, and M. Rahmati, “A Hybrid Approach to Localize Farsi Text in Natural Scene Images,” in Proc. Int. Neural Network Society Winter Conference (INNS-WC) (2012).
[38] X.C. Yin, X. Yin, K. Huang, and H. Hao, “Robust Text Detection in Natural Scene Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 36, No. 5, pp. 970–983 (2014).
[39] H.P. Le, N.D. Toan, S.C. Park, and G. Lee, “Text Localization in Natural Scene Images by Mean-shift Clustering and Parallel Edge Feature,” in Proc. 5th Int. Conf. Ubiquitous Information Management and Communication, Seoul, Korea, pp. 21-23 (2011).
[40] Y. Wang, H. Xie, Z. Fu, and Y. Zhang, “DRSN: A Deep Scale Relationship Network for Scene Text Detection,” IJCAI, pp. 947-953, (2019).
[41] C. Yi, and Y.L. Tian, “Text String Detection from Natural Scenes by Structure Based Partition and Grouping,” IEEE Trans. Image Processing, Vol. 20 (9), pp. 2594–2605 (2011).
[42] S.B. Ahmed, S. Naz, M.I. Razzak, and R. Yousaf, “Deep Learning based Isolated Arabic Scene Character Recognition,” proc. Arabic Script Analysis and Recognition, ASAR, IEEE Xplore (2017).
[43] L. Li, S. Yu, L. Zhong, and X. Li, “Multilingual Text Detection with Non-linear Neural Network,” Mathematical Problems in Engineering, Article ID 431608, 7 pages (2015).
[44] Y. Liu and L. Jin, “Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection,” Computer Vision and Pattern Recognition (2017).
[45] Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, “Multi-oriented Text Detection with Fully Convolutional Networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016.
[46] Li, Yao, and Lu, “Scene Text Detection via Stroke Width,” in 21st IEEE Int. Conf. Pattern Recognition (ICPR), (2012).
[47] C. Wolf, and J.M. Jolion, “Object count/area Graphs for the Evaluation of Object Detection and Segmentation Algorithms,” Int. Journal of Document Analysis and Recognition, Vol. 8 (4), pp. 280–296, (2006).
[48] C. Yao, X. Bai, W. Liu, Y. Ma, and Y. Tu, “Detecting Texts of Arbitrary Orientations in Natural Images,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1083–1090 (2012).
[49] B. Shi, X. Bai, and S. Belongie, “Detecting oriented text in natural images by linking segments,” Computer Vision and Pattern Recognition, (2017).
[50] H. Xie, S. Fang, Z.J. Zha, Y. Yang, Y Li, and Y. Zhang, “Convolutional Attention Networks for Scene Text Recognition,” TOMCCAP 15(1s), pp. 3:1-3:17 (2019).
[51] C. K. Chng, and C. S. Chan, “Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition,” in 14th IAPR International Conference on Document Analysis and Recognition, (2017).
[52] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Text Boxes: A Fast Text Detector with a Single Deep Neural Network,” AAAI, pp. 4161-4167, (2017). 
[53] P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai. “Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes,” in Proceedings of European Conference on Computer Vision (ECCV), (2018).
[54] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” ICCV (2015).
[55] S. Chan, “Total-Text” Dataset [online]. Available:
[56] S. Bayatpour “DNIFT: Dataset of Natural Images with Farsi Text” [online]. Available:
[57] Z. Imani, Z. Ahmadyfard, and A. Zohrevand,
“Holistic Farsi handwritten word recognition using gradient features,” Journal of AI and Data Mining, Vol. 4, pp. 19-25, (2016)