Document Type : Original/Review Paper

Authors

1 Department of Electrical Engineering, Shahid Bahonar University of Kerman, Kerman, Iran.

2 Department of Applied Mathematics, Graduate University of Advanced Technology, Kerman, Iran.

10.22044/jadm.2025.16098.2728

Abstract

Farsi optical character recognition remains challenging due to the script’s cursive structure, positional glyph variations, and frequent diacritics. This study conducts a comparative evaluation of five foundational deep learning architectures widely used in OCR—two lightweight CRNN based models aimed at efficient deployment and three Transformer based models designed for advanced contextual modeling—to examine their suitability for the distinct characteristics of Farsi script. Performance was benchmarked on four publicly available datasets: Shotor and IDPL PFOD2 for printed text, and Iranshahr and Sadri for handwritten text, using word level accuracy, parameter count, and computational cost as evaluation criteria. CRNN based models achieved high accuracy on word level datasets—99.42% (Shotor), 97.08% (Iranshahr), 98.86% (Sadri)—while maintaining smaller model sizes and lower computational demands. However, their accuracy dropped to 78.49% on the larger and more diverse line level IDPL PFOD2 dataset. Transformer based models substantially narrowed this performance gap, exhibiting greater robustness to variations in font, style, and layout, with the best model reaching 92.81% on IDPL PFOD2. To the best of our knowledge, this work is among the first comprehensive comparative studies of lightweight CRNN and Transformer based architectures for Farsi OCR, encompassing both printed and handwritten scripts, and establishes a solid performance baseline for future research and deployment strategies.

Keywords

Main Subjects

[1] C. Indravadanbhai Patel, D. Patel, C. Patel Smt Chandaben Mohanbhai, A. Patel, S. Chandaben Mohanbhai, and D. Patel Smt Chandaben Mohanbhai, “Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study Scholar Model of Images, Objects and Superpixels View project CHARUSAT Apps (Mobile Application) View project Optical Character Recognition by Open Source OCR Tool Tesseract: A,” Artic. Int. J. Comput. Appl., vol. 55, no. 10, pp. 975–8887, 2012.
 
[2] F. Asadi-Zeydabadi, E. Shabaninia, H. Nezamabadi-Pour, and M. Shojaee, “Farsi Optical Character Recognition Using a Transformer-based Model,” in 2023 13th International Conference on Computer and Knowledge Engineering, ICCKE 2023, IEEE, 2023, pp. 293–299. doi: 10.1109/ICCKE60553.2023.10326255.
 
[3] T. T. H. Nguyen, A. Jatowt, M. Coustaty, and A. Doucet, “Survey of Post-OCR Processing Approaches,” ACM Comput. Surv., vol. 54, no. 6, pp. 1–37, 2022, doi: 10.1145/3453476.
 
[4] E. Shabaninia, F. Asadi, and H. Nezamabadi_pour, “Enhancing License Plate Recognition Using a Language Model-Based Approach,” J. Mach. Vis. Image Process., vol. 11, no. 4, pp. 15–26, 2025.
 
[5] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan, “A robust arbitrary text detection system for natural scene images,” Expert Syst. Appl., vol. 41, no. 18, pp. 8027–8048, 2014, doi: 10.1016/j.eswa.2014.07.008.
 
[6] R. Gossweiler, M. Kamvar, and S. Baluja, “What’s up CAPTCHA? A CAPTCHA based on image orientation,” in Proceedings of the 18th international conference on World wide web, 2009, pp. 841–850.
 
[7] A. Afkari-Fahandari, E. Shabaninia, F. Asadi-Zeydabadi, and H. Nezamabadi-Pour, “A Comprehensive Survey of Transformers in Text Recognition: Techniques, Challenges, and Future Directions,” ACM Comput. Surv., vol. 58, no. 5, p. 42, 2025.
 
[8] A. Afkari-Fahandari, F. Asadi-Zeydabadi, E. Shabaninia, and H. Nezamabadi-pour, “Farsi Handwritten Text Recognition via a Lightweight Attention-Driven Sequence Recognition Network,” in 2024 19th Iranian Conference on Intelligent Systems (ICIS), IEEE, 2024, pp. 24–29.
 
[9] D. V. Sang and L. T. B. Cuong, “Improving CRNN with EfficientNet-like feature extractor and multi-head atention for text recognition,” in ACM International Conference Proceeding Series, 2019, pp. 285–290. doi: 10.1145/3368926.3369689.
 
[10] F. Asadi-zeydabadi, A. Afkari-Fahandari, A. Faraji, E. Shabaninia, and H. Nezamabadi-Pour, “IDPL-PFOD2: A New Large-Scale Dataset for Printed Farsi Optical Character Recognition,” arXiv Prepr. arXiv, 2023, [Online]. Available: https://arxiv.org/abs/2312.01177.
 
[11] A. Afkari-Fahandari, F. Asadi-Zeydabadi, E. Shabaninia, and H. Nezamabadi-Pour, “Enhancing Farsi Text Recognition via Iteratively Using a Language Model,” in 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing, AISP 2024, IEEE, 2024, pp. 1–6. doi: 10.1109/AISP61396.2024.10475269.
 
[12] A. Nasr-Esfahani, M. Bekrani, and R. Rajabi, “Robust Persian Digit Recognition in Noisy Environments Using Hybrid CNN-BiGRU Model,” J. AI Data Min., vol. 13, no. 3, pp. 337–345, 2025, doi: 10.22044/jadm.2025.15932.2707.
 
[13] M. A. KO and S. Poruran, “OCR-nets: variants of pre-trained CNN for Urdu handwritten character recognition via transfer learning,” Procedia Comput. Sci., vol. 171, pp. 2294–2301, 2020.
 
[14] M. Elleuch, R. Maalej, and M. Kherallah, “A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition,” Procedia Comput. Sci., vol. 80, pp. 1712–1723, 2016.
 
[15] L. Bouchakour, F. Meziani, H. Latrache, K. Ghribi, and M. Yahiaoui, “Printed arabic characters recognition using combined features and cnn classifier,” in 2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI), IEEE, 2021, pp. 1–5.
 
[16] R. Ahmad, M. Z. Afzal, S. F. Rashid, M. Liwicki, and T. Breuel, “Scale and rotation invariant OCR for Pashto cursive script using MDLSTM network,” in 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, 2015, pp. 1101–1105.
 
[17] S. Rawls, H. Cao, E. Sabir, and P. Natarajan, “Combining deep learning and language modeling for segmentation-free OCR from raw pixels,” in 2017 1st international workshop on Arabic script analysis and recognition (ASAR), IEEE, 2017, pp. 119–123.
 
[18] S. Naz, A. I. Umar, R. Ahmed, M. I. Razzak, S. F. Rashid, and F. Shafait, “Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks,” Springerplus, vol. 5, no. 1, p. 2010, 2016.
 
[19] R. Maalej and M. Kherallah, “Convolutional neural network and BLSTM for offline Arabic handwriting recognition,” in 2018 International Arab conference on information technology (ACIT), IEEE, 2018, pp. 1–6.
 
[20] M. Bonyani, S. Jahangard, and M. Daneshmand, “Persian handwritten digit, character and word recognition using deep learning,” Int. J. Doc. Anal. Recognit., vol. 24, no. 1, pp. 133–143, 2021.
 
[21] S. Khosravi and A. Chalechale, “Recognition of Persian/Arabic handwritten words using a combination of convolutional neural networks and autoencoder (AECNN),” Math. Probl. Eng., vol. 2022, no. 1, p. 4241016, 2022.
 
[22] V. M. Safarzadeh and P. Jafarzadeh, “Offline Persian handwriting recognition with CNN and RNN-CTC,” in 2020 25th international computer conference, computer society of Iran (CSICC), IEEE, 2020, pp. 1–10.
 
[23] M. Akhlaghi and V. Ghods, “Farsi handwritten phone number recognition using deep learning,” SN Appl. Sci., vol. 2, no. 3, p. 408, 2020.
 
[24] A. Fateh, M. Fateh, and V. Abolghasemi, “Enhancing optical character recognition: Efficient techniques for document layout analysis and text line detection,” Eng. Reports, vol. 6, no. 9, p. e12832, 2024.
 
[25] M. S. Anari, K. Rezaee, and A. Ahmadi, “TraitLWNet: a novel predictor of personality trait by analyzing Persian handwriting based on lightweight deep convolutional neural network,” Multimed. Tools Appl., vol. 81, no. 8, pp. 10673–10693, 2022.
 
[26] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in ACM International Conference Proceeding Series, 2006, pp. 369–376. doi: 10.1145/1143844.1143891.
 
[27] N. A. M. Isheawy and H. Hasan, “Optical character recognition (OCR) system,” IOSR J. Comput. Eng. (IOSR-JCE), e-ISSN, pp. 661–2278, 2015.
 
[28] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 2019, pp. 4171–4186.
 
[29] T.Brown et al., “Language models are few-shot learners, in Advances in neural information processing systems,2020, pp. 1877–1901.
 
[30] F. olimanpour, J. Sadri, and C. Y. Suen, “Standard databases for recognition of handwritten digits, numerical strings, legal amounts, letters and dates in Farsi language,” in Tenth International workshop on Frontiers in handwriting recognition, Suvisoft, 2006.
 
[31] S. Mozaffari, K. Faez, F. Faradji, M. Ziaratban, and S. M. Golzan, “A comprehensive isolated Farsi/Arabic character database for handwritten OCR research,” in Tenth international workshop on frontiers in handwriting recognition, Suvisoft, 2006.
 
[32] A. Zohrevand and Z. Imani, “Holistic persian handwritten word recognition using convolutional neural network,” Int. J. Eng., vol. 34, no. 8, pp. 2028–2037, 2021.
 
[33] M. F. Y. Ghadikolaie, E. Kabir, and F. Razzazi, “Sub‐word based offline handwritten farsi word recognition using recurrent neural network,” ETRI J., vol. 38, no. 4, pp. 703–713, 2016.