[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.
[2] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019.
[5] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint arXiv:2106.09685, 2021.
[6] T. Pires, E. Schlinger, and D. Garrette, “How Multilingual Is Multilingual BERT?,” arXiv preprint arXiv:1906.01502, 2019.
[7] A. Abaskohi, S. Baruni, M. Masoudi, N. Abbasi, M. Babalou, A. Edalat, S. Kamahi, S. Mahdizadeh Sani, N. Naghavian, D. Namazifard, P. Sadeghi, Y. Yaghoobzadeh, “Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT,” arXiv preprint arXiv:2404.02403, 2024.
[8] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal Journal of Machine Learning Research, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot Learners,” in Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020.
[9] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,”, vol. 21, pp. 1–67, 2020.
[10] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mT5: A massively multilingual pre-trained text-to-text transformer,” arXiv preprint arXiv:2010.11934, Oct. 22, 2020.
[11] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” OpenAI, 2019.
[12] O. Shliazhko, A. Fenogenova, M. Tikhonova, V. Mikhailov, A. Kozlova, and T. Shavrina, “mGPT: Few-Shot Learners Go Multilingual,” arXiv preprint arXiv:2204.07580, 2022.
[13] M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, “ParsBERT: Transformer-based model for Persian language understanding,” arXiv preprint arXiv:2005.12515, 2020.
[14] M. Hamidzadeh, “Persian-NER-Dataset-500k,” Hugging Face, 2024. [Online]. Available: https://huggingface.co/datasets/mansoorhamidzadeh/Persian-NER-Dataset-500k. [Accessed: Feb. 9, 2025].
[15] S. Sabouri, “syntran-fa,” Hugging Face, [Online]. Available: https://huggingface.co/datasets/SLPL/syntran-fa. [Accessed: Feb. 9, 2025].
[16] S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.-C. F. Wang, K.-T. Cheng, and M.-H. Chen, “DoRA: Weight-Decomposed Low-Rank Adaptation,” arXiv preprint arXiv:2402.09353, 2024.
[17] Q. Zhang, M. Chen, A. Bukharin, N. Karampatziakis, P. He, Y. Cheng, W. Chen, and T. Zhao, “AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning,” arXiv preprint arXiv:2303.10512, 2023.
[18] Hugging Face, “Auto classes: AutoModelForSequenceClassification,” Hugging Face, [Online]. Available: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSequenceClassification. [Accessed: Feb. 9, 2025].
[19] Hugging Face, “Auto classes: AutoModelForTokenClassification,” Hugging Face, [Online]. Available: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForTokenClassification. [Accessed: Feb. 9, 2025].
[20] Hugging Face, “Auto classes: AutoModelForQuestionAnswering,” Hugging Face, [Online]. Available: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForQuestionAnswering. [Accessed: Feb. 9, 2025].
[21] R. Shuttleworth, J. Andreas, A. Torralba, and P. Sharma, “LoRA vs Full Fine-tuning: An Illusion of Equivalence,” arXiv preprint arXiv:2410.21228, 2024.