A Siamese Network Based on InceptionV3 with Custom Loss Functions for Document Image Quality Assessment (DIQA)

Khosravi, Mohammad Hossein

doi:10.22044/jadm.2026.16621.2788

Articles in Press

Document Type : Original/Review Paper

Author

Mohammad Hossein Khosravi

Electrical and Computer Engineering Faculty, University of Birjand, Birjand, Iran.

10.22044/jadm.2026.16621.2788

Abstract

Document Image Quality Assessment (DIQA) is critical for ensuring the reliability of downstream applications such as Optical Character Recognition (OCR), digital archiving, and automated document workflows. In this paper, we propose a deep learning-based DIQA framework using a Siamese neural network architecture with an InceptionV3 backbone. Our model leverages a composite loss function that combines linear regression loss with a monotonic ranking constraint to jointly optimize for score-level accuracy and perceptual consistency. Unlike prior works that rely on handcrafted features or narrow degradation types, our approach generalizes across diverse distortions commonly observed in scanned and photographed documents. Experimental results on the SOC and SmartDoc-QA datasets demonstrate that the proposed model exhibits a strong correlation with OCR accuracy, achieving SROCC values of 0.952 and 0.873, respectively, and outperforming several state-of-the-art DIQA methods.

Keywords

Main Subjects

H.3.2.2. Computer vision

References

[1] M. Askari, M. Asadi, A. Asilian Bidgoli, and H. Ebrahimpour, "Isolated Persian/Arabic handwriting characters: Derivative projection profile features, implemented on GPUs," Journal of AI and Data Mining, vol. 4, no. 1, pp. 9–17, 2016, doi: 10.5829/idosi.JAIDM.2016.04.01.02.

[2] A. Alaei, V. Bui, D. Doermann, and U. Pal, "Document Image Quality Assessment: A Survey," ACM computing surveys, vol. 56, no. 2, pp. 1–36, 2023.

[3] L. Kang, P. Ye, Y. Li, and D. Doermann, "A deep learning approach to document image quality assessment," in 2014 IEEE International Conference on Image Processing (ICIP), 2014: IEEE, pp. 2570–2574.

[4] X. Peng and C. Wang, "Camera captured DIQA with linearity and monotonicity constraints," in Document Analysis Systems: 14th IAPR International Workshop, DAS 2020, Wuhan, China, July 26–29, 2020, Proceedings 14, 2020: Springer, pp. 168–181.

[5] A. Alaei, D. Conte, M. Blumenstein, and R. Raveaux, "Document image quality assessment based on texture similarity index," in 2016 12th IAPR Workshop on Document Analysis Systems (DAS), 2016: IEEE, pp. 132–137.

[6] A. Alaei, D. Conte, C. Martineau, and R. Raveaux, "Blind document image quality prediction based on modification of quality aware clustering method integrating a patch selection strategy," Expert Systems with Applications, vol. 108, pp. 183–192, 2018.

[7] A. Alaei, "A new document image quality assessment method based on hast derivations," in 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019: IEEE, pp. 1244–1249.

[8] H. Li, F. Zhu, and J. Qiu, "CG-DIQA: No-reference document image quality assessment based on character gradient," in 2018 24th International Conference on Pattern Recognition (ICPR), 2018: IEEE, pp. 3622–3626.

[9] T. Obafemi-Ajayi, G. J. I. T. o. S. Agam, Man,, C.-P. A. Systems, and Humans, "Character-based automated human perception quality assessment in document images," IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans vol. 42, no. 3, pp. 584–595, 2011.

[10] N. S. Rani, K. Akshatha, and K. Koushik, "Quality assessment model for handwritten photo document images," Procedia Computer Science, vol. 218, pp. 133–142, 2023.

[11] H. Li, J. Qiu, and F. Zhu, "TextNet for text-related image quality assessment," in International Conference on Artificial Neural Networks, Cham, 2018: Springer International Publishing, in Artificial Neural Networks and Machine Learning – ICANN 2018, pp. 275–285.

[12] T. Lu and A. Dooms, "A deep transfer learning approach to document image quality assessment," in 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019: IEEE, pp. 1372–1377.

[13] J. Gao et al., "DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment," arXiv preprint arXiv:2507.12796, 2025.

[14] J. Aissam, M. Hain, and A. Chergui, "Handwritten Documents Validation Using Pattern Recognition and Transfer Learning," International Journal of Web-Based Learning and Teaching Technologies (IJWLTT), vol. 17, no. 5, pp. 1–13, 2022, doi: 10.4018/IJWLTT.20220901.oa1.

[15] S. Kundu, S. Malakar, Z. W. Geem, Y. Y. Moon, P. K. Singh, and R. Sarkar, "Hough Transform-Based Angular Features for Learning-Free Handwritten Keyword Spotting," Sensors, vol. 21, no. 14, p. 4648, 2021. [Online]. Available: https://www.mdpi.com/1424-8220/21/14/4648.

[16] M. Talebian, K. Kiani, and R. Rastgoo, "A Deep Learning-based Model for Fingerprint Verification," Journal of AI and Data Mining, vol. 12, no. 2, pp. 241–248, 2024, doi: 10.22044/jadm.2024.14298.2531.

[17] A. Fateh, R. T. Birgani, M. Fateh, and V. Abolghasemi, "Advancing multilingual handwritten numeral recognition with attention-driven transfer learning," IEEE Access, vol. 12, pp. 41381–41395, 2024.

[18] X. Liu, X. Tang, and S. Chen, "Learning a similarity metric discriminatively with application to ancient character recognition," in Knowledge Science, Engineering and Management: 14th International Conference, KSEM 2021, Tokyo, Japan, August 14–16, 2021, Proceedings, Part I 14, 2021: Springer, pp. 614–626.

[19] N. Nayef, M. M. Luqman, S. Prum, S. Eskenazi, J. Chazalon, and J.-M. Ogier, "SmartDoc-QA: A dataset for quality assessment of smartphone captured document images-single and multiple distortions," in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015: IEEE, pp. 1231–1235.

[20] J. Kumar, P. Ye, and D. Doermann, "A dataset for quality assessment of camera captured document images," in Camera-Based Document Analysis and Recognition: 5th International Workshop, CBDAR 2013, Washington, DC, USA, August 23, 2013, Revised Selected Papers 5, 2014: Springer, pp. 113–125.

[21] P. Li, L. Peng, J. Cai, X. Ding, and S. Ge, "Attention based RNN model for document image quality assessment," in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, vol. 1: IEEE, pp. 819–825.

[22] W. Wang, Z. Yan, and H. Lin, "A Document Image Quality Assessment Method Based on Feature Fusion," in The International Conference on Image, Vision and Intelligent Systems (ICIVIS 2021), 2022: Springer, pp. 889–899.

[23] X. Peng, H. Cao, and P. Natarajan, "Document image quality assessment using discriminative sparse representation," in 2016 12th IAPR Workshop on Document Analysis Systems (DAS), 2016: IEEE, pp. 227–232.

[24] H. Li, F. Zhu, and J. Qiu, "Towards document image quality assessment: a text line based framework and a synthetic text line image dataset," in 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019: IEEE, pp. 551–558.

A Siamese Network Based on InceptionV3 with Custom Loss Functions for Document Image Quality Assessment (DIQA)

References

References

Articles in Press, Accepted Manuscript Available Online from 07 June 2026

Articles in Press, Accepted Manuscript
Available Online from 07 June 2026