H.3.2.2. Computer vision
Fatemeh Asadi-Zeydabadi; Ali Afkari-Fahandari; Elham Shabaninia; Hossein Nezamabadi-pour
Abstract
Farsi optical character recognition remains challenging due to the script’s cursive structure, positional glyph variations, and frequent diacritics. This study conducts a comparative evaluation of five foundational deep learning architectures widely used in OCR—two lightweight CRNN based ...
Read More
Farsi optical character recognition remains challenging due to the script’s cursive structure, positional glyph variations, and frequent diacritics. This study conducts a comparative evaluation of five foundational deep learning architectures widely used in OCR—two lightweight CRNN based models aimed at efficient deployment and three Transformer based models designed for advanced contextual modeling—to examine their suitability for the distinct characteristics of Farsi script. Performance was benchmarked on four publicly available datasets: Shotor and IDPL PFOD2 for printed text, and Iranshahr and Sadri for handwritten text, using word level accuracy, parameter count, and computational cost as evaluation criteria. CRNN based models achieved high accuracy on word level datasets—99.42% (Shotor), 97.08% (Iranshahr), 98.86% (Sadri)—while maintaining smaller model sizes and lower computational demands. However, their accuracy dropped to 78.49% on the larger and more diverse line level IDPL PFOD2 dataset. Transformer based models substantially narrowed this performance gap, exhibiting greater robustness to variations in font, style, and layout, with the best model reaching 92.81% on IDPL PFOD2. To the best of our knowledge, this work is among the first comprehensive comparative studies of lightweight CRNN and Transformer based architectures for Farsi OCR, encompassing both printed and handwritten scripts, and establishes a solid performance baseline for future research and deployment strategies.