Document Type : Original/Review Paper

Authors

Artificial Intelligence Group, Faculty of Electrical and Computer Engineering, University of Kashan, Kashan, Iran.

10.22044/jadm.2025.15167.2625

Abstract

This paper evaluates the performance of various fine-tuning methods in Persian natural language ‎processing (NLP) tasks. In low-resource languages like Persian, ‎which suffer from a lack of rich and sufficient data for training large ‎models, it is crucial to select appropriate fine-tuning techniques that ‎mitigate overfitting and prevent the model from learning weak or ‎surface-level patterns. The main goal of this research is to compare ‎the effectiveness of fine-tuning approaches such as Full-Finetune, ‎LoRA, AdaLoRA, and DoRA on model learning and task ‎performance. We apply these techniques to three different Persian ‎NLP tasks: sentiment analysis, named entity recognition (NER), and ‎span question answering (QA). For this purpose, we conduct ‎experiments on three Transformer-based multilingual models with ‎different architectures and parameter scales: BERT-base multilingual ‎‎(~168M parameters) with Encoder only structure, mT5-small ‎‎(~300M parameters) with Encoder-Decoder structure, and mGPT ‎‎(~1.4B parameters) with Decoder only structure. Each of these ‎models supports the Persian language but varies in structure and ‎computational requirements, influencing the effectiveness of ‎different fine-tuning approaches. Results indicate that fully fine-‎tuned BERT-base multilingual consistently outperforms other ‎models across all tasks in basic metrics, particularly given the unique ‎challenges of these embedding-based tasks. Additionally, lightweight ‎fine-tuning methods like LoRA and DoRA offer very competitive ‎performance while significantly reducing computational overhead ‎and outperform other models in Performance-Efficiency Score ‎introduced in the paper. This study contributes to a better ‎understanding of fine-tuning methods, especially for Persian NLP, ‎and offers practical guidance for applying Large Language Models ‎‎(LLMs) to downstream tasks in low-resource languages.‎

Keywords

Main Subjects

[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.
 
[2] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
 
[3]‎ A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
 
[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019.
 
[5]‎ E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint arXiv:2106.09685, 2021.
 
[6] T. Pires, E. Schlinger, and D. Garrette, “How Multilingual Is Multilingual BERT?,” arXiv preprint arXiv:1906.01502, 2019.
 
[7]‎ A. Abaskohi, S. Baruni, M. Masoudi, N. Abbasi, M. Babalou, A. Edalat, S. Kamahi, S. Mahdizadeh Sani, N. Naghavian, D. Namazifard, P. Sadeghi, Y. Yaghoobzadeh, “Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT,” arXiv preprint arXiv:2404.02403, 2024.
 
[8] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal Journal of Machine Learning Research, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot Learners,” in Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020.
 
[9]‎ C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,”, vol. 21, pp. 1–67, 2020.
 
[10] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mT5: A massively multilingual pre-trained text-to-text transformer,” arXiv preprint arXiv:2010.11934, Oct. 22, 2020.
 
[11]‎ A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” OpenAI, 2019.
 
[12]‎ O. Shliazhko, A. Fenogenova, M. Tikhonova, V. Mikhailov, A. Kozlova, and T. Shavrina, “mGPT: Few-Shot Learners Go Multilingual,” arXiv preprint arXiv:2204.07580, 2022.
 
[13] M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, “ParsBERT: Transformer-based model for Persian language understanding,” arXiv preprint arXiv:2005.12515, 2020.
 
[14]‎ M. Hamidzadeh, “Persian-NER-Dataset-500k,” Hugging Face, 2024. [Online]. Available: https://huggingface.co/datasets/mansoorhamidzadeh/Persian-NER-Dataset-500k. [Accessed: Feb. 9, 2025].
 
[15] S. Sabouri, “syntran-fa,” Hugging Face, [Online]. Available: https://huggingface.co/datasets/SLPL/syntran-fa. [Accessed: Feb. 9, 2025].  
 
[16]‎ S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.-C. F. Wang, K.-T. Cheng, and M.-H. Chen, “DoRA: Weight-Decomposed Low-Rank Adaptation,” arXiv preprint arXiv:2402.09353, 2024.
 
[17] Q. Zhang, M. Chen, A. Bukharin, N. Karampatziakis, P. He, Y. Cheng, W. Chen, and T. Zhao, “AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning,” arXiv preprint arXiv:2303.10512, 2023.
 
[18]‎ Hugging Face, “Auto classes: AutoModelForSequenceClassification,” Hugging Face, [Online]. Available: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSequenceClassification. [Accessed: Feb. 9, 2025].
 
[19] Hugging Face, “Auto classes: AutoModelForTokenClassification,” Hugging Face, [Online]. Available: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForTokenClassification. [Accessed: Feb. 9, 2025].
 
[20]‎ Hugging Face, “Auto classes: AutoModelForQuestionAnswering,” Hugging Face, [Online]. Available: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForQuestionAnswering. [Accessed: Feb. 9, 2025].
 
[21] R. Shuttleworth, J. Andreas, A. Torralba, and P. Sharma, “LoRA vs Full Fine-tuning: An Illusion of Equivalence,” arXiv preprint arXiv:2410.21228, 2024.