[1] J. Achiam et al., "GPT-4 Technical Report," arXiv preprint arXiv:2303.08774, 2023.
[2] F. Alimorad et al., "Synthesizing an Image Dataset for Text Detection and Recognition in Images," Journal of Information and Communication Technology, vol. 53, no. 53, pp. 78, 2023.
[3] J. Baek et al., "What Is Wrong with Scene Text Recognition Model Comparisons? Dataset and Model Analysis," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 4714-4722, doi: 10.1109/ICCV.2019.00481.
[4] D. Bautista and R. Atienza, "Scene Text Recognition with Permuted Autoregressive Sequence Models," ECCV, Lecture Notes in Computer Science, vol. 13688, Springer, Cham, doi: 10.1007/978-3-031-19815-1_11.
[5] F. Borisyuk, A. Gordo, and V. Sivakumar, "Rosetta: Large Scale System for Text Detection and Recognition in Images," Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18), Association for Computing Machinery, New York, NY, USA, pp. 71–79, doi: 10.1145/3219819.3219861.
[6] R. Buoy et al., "PARSTR: Partially Autoregressive Scene Text Recognition," International Journal on Document Analysis and Recognition (IJDAR), pp. 303-316, 2024, doi: 10.1007/s10032-024-00470-1.
[7] X. Chen et al., "Text Recognition in the Wild: A Survey," ACM Comput. Surv., vol. 54, no. 2, Article 42, March 2022, doi: 10.1145/3440756.
[8] J. Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Minneapolis, Minnesota, pp. 4171-4186, doi: 10.18653/v1/N19-1423.
[9] A. Fateh et al., "Persian Printed Text Line Detection Based on Font Size," Multimedia Tools and Applications, vol. 82, no. 2, pp. 2393–2418, 2023, doi: 10.1007/s11042-022-13243-x.
[10] O. Golovneva et al., "Contextual Position Encoding: Learning to Count What’s Important," 13th International Conference on Learning Representations, 2024.
[11] A. Gupta et al., "Synthetic Data for Text Localisation in Natural Images," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2315-2324, doi: 10.1109/CVPR.2016.254.
[12] K. He et al., "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[13] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," in Neural Computation, vol. 9, no. 8, pp. 1735-1780, 15 Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[14] M. Jaderberg et al., "Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition," arXiv preprint arXiv:1406.2227, 2014.
[15] M. Jaderberg et al., "Spatial Transformer Networks," Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'15), MIT Press, Cambridge, MA, USA, pp. 2017–2025, 2015.
[16] L. Kang et al., "Pay Attention to What You Read: Nonrecurrent Handwritten Text-Line Recognition," Pattern Recognition, vol. 129, 2022, doi: 10.1016/j.patcog.2022.108766.
[17] D. Karatzas et al., "ICDAR 2013 Robust Reading Competition," 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, pp. 1484-1493, doi: 10.1109/ICDAR.2013.221.
[18] D. Karatzas et al., "ICDAR 2015 Competition on Robust Reading," 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, pp. 1156-1160, doi: 10.1109/ICDAR.2015.7333942.
[19] S. Kheirinejad et al., "Persian Text Based Traffic Sign Detection with Convolutional Neural Network: A New Dataset," 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, pp. 060-064, doi: 10.1109/ICCKE50421.2020.9303646.
[20] A. Kirillov et al., "Segment Anything," 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, pp. 3992-4003, doi: 10.1109/ICCV51070.2023.00371.
[21] J. Lee et al., "On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, pp. 2326-2335, doi: 10.1109/CVPRW50498.2020.00281.
[22] V. I. Levenshtein, "Binary Codes Capable of Correcting Deletions, Insertions, and Reversals," Soviet Physics Doklady, pp. 707–710, 1966.
[24] X. Liu et al., "Learning to Encode Position for Transformer with Continuous Dynamical Model," Proceedings of the 37th International Conference on Machine Learning, pp. 6327–6335, 2020.
[25] S. Long et al., "Scene Text Detection and Recognition: The Deep Learning Era," International Journal of Computer Vision, vol. 129, pp. 161–184, 2021, doi: 10.1007/s11263-020-01369-0.
[26] Z. Raisi and J. Zelek, “Visual Place Recognition from end-to-end semantic scene text features, Frontiers in Robotics and AI, Vol. 11, Article 1424883, 2024, doi: 10.3389/frobt.2024.1424883.
[27] A. Mishra et al., "Scene Text Recognition Using Higher Order Language Priors," BMVC - British Machine Vision Conference, Sep 2012, Surrey, United Kingdom, doi: 10.5244/C.26.127.
[28] T. Q. Phan et al., "Recognizing Text with Perspective Distortion in Natural Scenes," 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, pp. 569-576, doi: 10.1109/ICCV.2013.76.
[29] A. Rahman et al., "UTRNet: High-Resolution Urdu Text Recognition in Printed Documents," International Conference on Document Analysis and Recognition, pp. 305–324, Springer, 2023, Lecture Notes in Computer Science, vol. 14191, doi: 10.1007/978-3-031-41734-4_19.
[30] M. Rahmati et al., "Printed Persian OCR System Using Deep Learning," IET Image Processing, vol. 14, no. 15, pp. 3920–3931, 2020, doi: 10.1049/iet-ipr.2019.0728.
[31] Z. Raisi and J. Zelek, "Occluded Text Detection and Recognition in the Wild," 2022 19th Conference on Robots and Vision (CRV), Toronto, ON, Canada, 2022, pp. 140-150, doi: 10.1109/CRV55824.2022.00026.
[32] Z. Raisi, M. Naiel, P. Fieguth, S. Wardell, and J. Zelek, "2D Positional Embedding-Based Transformer for Scene Text Recognition," Journal of Computational Vision and Imaging Systems, vol. 6, no. 1, pp. 1–4, 2021, doi: 10.15353/jcvis.v6i1.3533.
[33] Z. Raisi et al., "2LSPE: 2D Learnable Sinusoidal Positional Encoding Using Transformer for Scene Text Recognition," 2021 18th Conference on Robots and Vision (CRV), Burnaby, BC, Canada, 2021, pp. 119-126, doi: 10.1109/CRV52889.2021.00024.
[35] A. Ramesh et al., "Hierarchical Text-Conditional Image Generation with CLIP Latents," arXiv preprint arXiv:2204.06125, 2022.
[36] A. Risnumawan et al., "A Robust Arbitrary Text Detection System for Natural Scene Images," Expert Systems with Applications, vol. 41, no. 18, pp. 8027–8048, 2014, doi: 10.1016/j.eswa.2014.07.008.
[37] B. Shi, X. Bai, and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, 1 Nov. 2017, doi: 10.1109/TPAMI.2016.2646371.
[38] B. Shi et al., "Robust Scene Text Recognition with Automatic Rectification," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 4168-4176, doi: 10.1109/CVPR.2016.452.
[39] B. Shi et al., "ASTER: An Attentional Scene Text Recognizer with Flexible Rectification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2035-2048, 1 Sept. 2019, doi: 10.1109/TPAMI.2018.2848939.
[40] Y. Sun et al., "ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT," 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 2019, pp. 1557-1562, doi: 10.1109/ICDAR.2019.00250.
[41] R. Anil et al., "Gemini: A Family of Highly Capable Multimodal Models," arXiv preprint arXiv:2312.11805, 2023.
[42] A. Vaswani et al., "Attention is All You Need," Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), Curran Associates Inc., Red Hook, NY, USA, pp. 6000–6010, 2017.
[43] A. Veit et al., "COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images," arXiv preprint arXiv:1601.07140, 2016.
[44] B.Wang et al., "On Position Embeddings in BERT," International Conference on Learning Representations, Austria, 2021.
[45] K. Wang and S. Belongie, "Word Spotting in the Wild," ECCV 2010, Lecture Notes in Computer Science, vol. 6311, Springer, Berlin, Heidelberg, 2010, doi: 10.1007/978-3-642-15549-9_43.
[46] F. Zhan and S. Lu, "ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 2054-2063, doi: 10.1109/CVPR.2019.00216.
[47] H. Zhang et al., "Self-Attention Generative Adversarial Networks," Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 7354-7363, 09-15 Jun 2019, PMLR.
[48] S. Zhao et al., "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model," arXiv preprint arXiv:2305.14014, 2023.
[49] F. Ariai et al., "Enhancing Aspect-based Sentiment Analysis with ParsBERT in Persian Language," Journal of AI and Data Mining, vol. 12, no. 1, pp. 1–14, 2024, doi: 10.22044/jadm.2023.13666.2482.