[1] P. Hambarde, "Information Retrieval: Recent Advances and Beyond," 2023.
[2] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, "Deep learning--based text classification: a comprehensive review," ACM computing surveys (CSUR), vol. 54, no. 3, pp. 1-40, 2021.
[3] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, "Text classification algorithms: A survey," Information, vol. 10, no. 4, p. 150, 2019.
[4] M. W. Bilotti, P. Ogilvie, J. Callan, and E. Nyberg, "Structured retrieval for question answering," in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007, pp. 351-358.
[5] P. F. Brown, V. J. Della Pietra, P. V. Desouza, J. C. Lai, and R. L. Mercer, "Class-based n-gram models of natural language," Computational linguistics, vol. 18, no. 4, pp. 467-480, 1992.
[6] C. Sammut and G. I. Webb, Encyclopedia of machine learning. Springer Science & Business Media, 2011.
[7] S. Fatima and B. Srinivasu, "Text Document categorization using support vector machine," International Research Journal of Engineering and Technology (IRJET), vol. 4, no. 2, pp. 141-147, 2017.
[8] S.-B. Kim, K.-S. Han, H.-C. Rim, and S. H. Myaeng, "Some effective techniques for naive bayes text classification," IEEE transactions on knowledge and data engineering, vol. 18, no. 11, pp. 1457-1466, 2006.
[9] S. Jiang, G. Pang, M. Wu, and L. Kuang, "An improved K-nearest-neighbor algorithm for text categorization," Expert Systems with Applications, vol. 39, no. 1, pp. 1503-1509, 2012.
[10] N. Reimers, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks," arXiv preprint arXiv:1908.10084, 2019.
[11] C. Duan, L. Cui, X. Chen, F. Wei, C. Zhu, and T. Zhao, "Attention-Fused Deep Matching Network for Natural Language Inference," in IJCAI, 2018, pp. 4033-4040.
[12] C. Tan, F. Wei, W. Wang, W. Lv, and M. Zhou, "Multiway attention networks for modeling sentence pairs," in IJCAI, 2018, pp. 4411-4417.
[13] C. Subakan, M. Ravanelli, S. Cornell, M. Bronzi, and J. Zhong, "Attention is all you need in speech separation," in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: IEEE, pp. 21-25.
[14] A. Fan, S. Wang, and Y. Wang, "Legal Document Similarity Matching Based on Ensemble Learning," IEEE Access, 2024.
[15] G. Wang, T. Zhang, G. Xu, Y. Zheng, Z. Du, and Q. Long, "A Deep Learning Based Method to Measure the Similarity of Long Text," in 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), 2020: IEEE, pp. 173-178.
[16] F. Safi-Esfahani, S. Rakian, and M. Nadimi-Shahraki, "English-Persian Plagiarism Detection based on a Semantic Approach," Journal of AI and Data Mining, vol. 5, no. 2, pp. 275-284, 2017.
[17] N. Jiang and M.-C. de Marneffe, "Do you know that Florence is packed with visitors? Evaluating state-of-the-art models of speaker commitment," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 4208-4213.
[18] I. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau, "Building end-to-end dialogue systems using generative hierarchical neural network models," in Proceedings of the AAAI conference on artificial intelligence, 2016, vol. 30, no. 1.
[19] Q. Wang et al., "Learning deep transformer models for machine translation," arXiv preprint arXiv:1906.01787, 2019.
[20] M. Ostendorff, T. Ruas, M. Schubotz, G. Rehm, and B. Gipp, "Pairwise multi-class document classification for semantic relations between wikipedia articles," in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020, pp. 127-136.
[21] P. Bafna, D. Pramod, and A. Vaidya, "Document clustering: TF-IDF approach," in 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016: IEEE, pp. 61-66.
[22] M. A. El-Rashidy, R. G. Mohamed, N. A. El-Fishawy, and M. A. Shouman, "An effective text plagiarism detection system based on feature selection and SVM techniques," Multimedia Tools and Applications, vol. 83, no. 1, pp. 2609-2646, 2024.
[23] L. Yang, M. Zhang, C. Li, M. Bendersky, and M. Najork, "Beyond 512 tokens: Siamese multi-depth transformer-based hierarchical encoder for long-form document matching," in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 1725-1734.
[24] M. Ding, C. Zhou, H. Yang, and J. Tang, "Cogltx: Applying bert to long texts," Advances in Neural Information Processing Systems, vol. 33, pp. 12792-12804, 2020.
[25] A. Sharma and S. Kumar, "Ontology-based semantic retrieval of documents using Word2vec model," Data & Knowledge Engineering, vol. 144, p. 102110, 2023.
[26] R. Wu, "RecBERT: Semantic recommendation engine with large language model enhanced query segmentation for k-nearest neighbors ranking retrieval," Intelligent and Converged Networks, 2024.
[27] N. B. Korade, M. B. Salunke, A. A. Bhosle, P. B. Kumbharkar, G. G. Asalkar, and R. G. Khedkar, "Strengthening Sentence Similarity Identification Through OpenAI Embeddings and Deep Learning," International Journal of Advanced Computer Science & Applications, vol. 15, no. 4, 2024.
[28] A. Jha, V. Rakesh, J. Chandrashekar, A. Samavedhi, and C. K. Reddy, "Supervised contrastive learning for interpretable long-form document matching," ACM Transactions on Knowledge Discovery from Data, vol. 17, no. 2, pp. 1-17, 2023.
[29] H. Wang, K. Tian, Z. Wu, and L. Wang, "A short text classification method based on convolutional neural network and semantic extension," International Journal of Computational Intelligence Systems, vol. 14, no. 1, pp. 367-375, 2021.
[30] F. Ahmad and M. Faisal, "A novel hybrid methodology for computing semantic similarity between sentences through various word senses," International Journal of Cognitive Computing in Engineering, vol. 3, pp. 58-77, 2022.
[31] W. Yu, C. Xu, J. Xu, L. Pang, and J.-R. Wen, "Distribution distance regularized sequence representation for text matching in asymmetrical domains," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 721-733, 2022.
[32] D. Viji and S. Revathy, "A hybrid approach of Weighted Fine-Tuned BERT extraction with deep Siamese Bi–LSTM model for semantic text similarity identification," Multimedia tools and applications, vol. 81, no. 5, pp. 6131-6157, 2022.
[33] P. Li, G.-J. Ren, A. L. Gentile, C. DeLuca, D. Tan, and S. Gopisetty, "Long-form information retrieval for enterprise matchmaking," in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023, pp. 3260-3264.
[34] F. Mashhadirajab, M. Shamsfard, R. Adelkhah, F. Shafiee, and C. Saedi, "A Text Alignment Corpus for Persian Plagiarism Detection," FIRE (Working Notes), vol. 1737, pp. 184-189, 2016.
[35] M. R. Sharifabadi and S. A. Eftekhari, "Mahak Samim: A Corpus of Persian Academic Texts for Evaluating Plagiarism Detection Systems," FIRE (Working Notes), vol. 1737, pp. 190-192, 2016.
[36] S. Abnar, M. Dehghani, H. Zamani, and A. Shakery, "Expanded n-grams for semantic text alignment," Cappellato et al.[35], 2014.
[37] K. Khoshnavataher, V. Zarrabi, S. Mohtaj, and H. Asghari, "Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015," in CLEF (Working Notes), 2015.
[38] A. C. Marco, A. Myers, S. J. Graham, P. D'Agostino, and K. Apple, "The USPTO patent assignment dataset: Descriptions and analysis," 2015.
[39] M. Joshi, E. Choi, D. S. Weld, and L. Zettlemoyer, "Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension," arXiv preprint arXiv:1705.03551, 2017.
[40] A. Trischler et al., "Newsqa: A machine comprehension dataset," arXiv preprint arXiv:1611.09830, 2016.
[41] Z. Yang et al., "HotpotQA: A dataset for diverse, explainable multi-hop question answering," arXiv preprint arXiv:1809.09600, 2018.
[42] D. D. Lewis, Y. Yang, T. Russell-Rose, and F. Li, "Rcv1: A new benchmark collection for text categorization research," Journal of machine learning research, vol. 5, no. Apr, pp. 361-397, 2004.
[43] D. D. Lewis, "text categorization test collection," ed: Tech. Rep., http://www. ics. uci. edu/~ kdd/databases/reuters21578 …, 2004.
[44] H. Asghari, S. Mohtaj, O. Fatemi, H. Faili, P. Rosso, and M. Potthast, "Algorithms and corpora for persian plagiarism detection: overview of PAN at FIRE 2016," in Text Processing: FIRE 2016 International Workshop, Kolkata, India, December 7–10, 2016, Revised Selected Papers, 2018: Springer, pp. 61-79.