Document Type : Original/Review Paper

Author

Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran.

Abstract

In recent years, new word embedding methods have clearly improved the accuracy of NLP tasks. A review of the progress of these methods shows that the complexity of these models and the number of their training parameters grows increasingly. Therefore, there is a need for methodological innovation for presenting new word embedding methodologies. Most current word embedding methods use a large corpus of unstructured data to train the semantic vectors of words. This paper addresses the basic idea of utilizing from structure of structured data to introduce embedding vectors. Therefore, the need for high processing power, large amount of processing memory, and long processing time will be met using structures and conceptual knowledge lies in them. For this purpose, a new embedding vector, Word2Node is proposed. It uses a well-known structured resource, the WordNet, as a training corpus and hypothesis that graphic structure of the WordNet includes valuable linguistic knowledge that can be considered and not ignored to provide cost-effective and small sized embedding vectors. The Node2Vec graph embedding method allows us to benefit from this powerful linguistic resource. Evaluation of this idea in two tasks of word similarity and text classification has shown that this method perform the same or better in comparison to the word embedding method embedded in it (Word2Vec). This result is achieved while the required training data is reduced by about 50,000,000%. These results provide a view of capacity of the structured data to improve the quality of existing embedding methods and the resulting vectors.

Keywords

[1]           R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural Language Processing (Almost) from Scratch,” J. Mach. Learn. Res., vol. 12, pp. 2493–2537, 2011.
 
[2]           T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and Their Compositionality,” in Advances in Neural Information Processing Systems, Oct. 2013, pp. 1–9.
 
[3]           J. Pennington, R. Socher, and C. Manning, “GloVe: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP}), Oct. 2014, pp. 1532–1543, doi: 10.3115/v1/D14-1162.
 
[4]           J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” 2018.
 
[5]           Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv Prepr. arXiv1907.11692, 2019.
 
[6]           T. Brown et al., “Language Models are Few-Shot Learners,” arXiv:2005.14165, May 2020.
 
[7]           Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. Le, “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” arXiv:1906.08237, Jun. 2019.
 
[8]           R. A. Stein, P. A. Jaques, and J. F. Valiati, “An Analysis of Hierarchical Text Classification Using Word Embeddings,” Inf. Sci. (Ny)., vol. 471, pp. 216–232, 2019, doi: https://doi.org/10.1016/j.ins.2018.09.001.
 
[9]           Q. Chen and A. Crooks, “Analyzing the Vaccination debate in social media data Pre- and Post-COVID-19 pandemic,” Int. J. Appl. Earth Obs. Geoinf., vol. 110, p. 102783, 2022, doi: https://doi.org/10.1016/j.jag.2022.102783.
 
[10]         A. Pimpalkar and J. R. Raj R, “MBiLSTMGloVe: Embedding GloVe Knowledge Into the Corpus Using Multi-Layer BiLSTM Deep Learning Model for Social Media Sentiment Analysis,” Expert Syst. Appl., vol. 203, p. 117581, 2022, doi: https://doi.org/10.1016/j.eswa.2022.117581.
 
[11]         M. Molaei and D. Mohamadpur, “Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics,” J. AI Data Min., vol. 10, no. 2, pp. 197–205, 2022, doi: 10.22044/jadm.2022.11330.2293.
 
[12]         E. Manzini, J. Garrido-Aguirre, J. Fonollosa, and A. Perera-Lluna, “Mapping Layperson Medical Terminology into the Human Phenotype Ontology using Neural Machine Translation Models,” Expert Syst. Appl., vol. 204, p. 117446, 2022, doi: https://doi.org/10.1016/j.eswa.2022.117446.
 
[13]         A. Joshi, E. Fidalgo, E. Alegre, and L. Fernández-Robles, “DeepSumm: Exploiting Topic Models and Sequence to Sequence Networks for Extractive Text Summarization,” Expert Syst. Appl., vol. 211, p. 118442, 2023, doi: https://doi.org/10.1016/j.eswa.2022.118442.
 
[14]         T. Xian, Z. Li, C. Zhang, and H. Ma, “Dual Global Enhanced Transformer for image captioning,” Neural Networks, vol. 148, pp. 129–141, 2022, doi: https://doi.org/10.1016/j.neunet.2022.01.011.
 
[15]         A. Shahini Shamsabadi, R. Ramezani, H. Khosravi Farsani, and M. Nematbakhsh, “Direct Relation Detection for Knowledge-Based Question Answering,” Expert Syst. Appl., vol. 211, p. 118678, 2023, doi: https://doi.org/10.1016/j.eswa.2022.118678.
 
[16]         R. Navigli and S. P. Ponzetto, “BabelNet: Building a Very Large Multilingual Semantic Network,” in In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010, pp. 216–225.
 
[17]         S. Rothe and H. Schütze, “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes,” in In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015, pp. 1793–1803, doi: 10.3115/v1/P15-1173.
 
[18]         A. T. Thibault Cordier, “Learning Word Representations by Embedding the WordNet Graph,” 2018.
 
[19]         A. Kutuzov, M. Dorgham, O. Oliynyk, C. Biemann, and A. Panchenko, “Learning Graph Embeddings from WordNet-based Similarity Measures,” in Conference: Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics, 2019, pp. 125–135, doi: 10.18653/v1/S19-1014.
 
[20]         J. Harvill, R. Girju, and M. Hasegawa-Johnson, “Syn2Vec: Synset Colexification Graphs for Lexical Semantic Similarity,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022, pp. 5259–5270, doi: 10.18653/v1/2022.naacl-main.386.
 
[21]         A. Budanitsky and G. Hirst, “Evaluating WordNet-Based Measures of Lexical Semantic Relatedness,” Comput. Linguist., vol. 32, no. 1, pp. 13–47, 2006.
 
[22]         Z. Zhao, X. Chen, D. Wang, Y. Xuan, and G. Xiong, “Robust Node Embedding Against Graph Structural Perturbations,” Inf. Sci. (Ny)., vol. 566, pp. 165–177, 2021, doi: https://doi.org/10.1016/j.ins.2021.02.046.
 
[23]         X. Wu, Y. Zheng, T. Ma, H. Ye, and L. He, “Document Image Layout Analysis Via Explicit Edge Embedding Network,” Inf. Sci. (Ny)., vol. 577, pp. 436–448, 2021, doi: https://doi.org/10.1016/j.ins.2021.07.020.
 
[24]         Q. Tian et al., “Lower Order Information Preserved Network Embedding Based on Non-Negative Matrix Decomposition,” Inf. Sci. (Ny)., vol. 572, pp. 43–56, 2021, doi: https://doi.org/10.1016/j.ins.2021.04.095.
 
[25]         A. Amara, M. A. Hadj Taieb, and M. Ben Aouicha, “Network Representation Learning Systematic Review: Ancestors and Current Development State,” Mach. Learn. with Appl., vol. 6, p. 100130, 2021, doi: https://doi.org/10.1016/j.mlwa.2021.100130.
 
[26]         L. Moyano, “Learning Network Representations,” Eur. Phys. J. Spec. Top., vol. 226, pp. 499–518, 2017, doi: 10.1140/epjst/e2016-60266-2.
 
[27]         G. Alanis-Lobato, P. Mier, and M. A. Andrade-Navarro, “Efficient Embedding of Complex Networks to Hyperbolic Space via Their Laplacian,” Sci. Rep., vol. 6, no. 1, p. 30108, 2016, doi: 10.1038/srep30108.
 
[28]         J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “LINE: Large-Scale Information Network Embedding,” Line Large-Scale Inf. Netw. Embed., 2015, doi: 10.1145/2736277.2741093.
 
[29]         A. Grover and J. Leskovec, “Node2vec: Scalable Feature Learning for Networks,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 855–864, doi: 10.1145/2939672.2939754.
[30]         D. Wang, P. Cui, and W. Zhu, “Structural Deep Network Embedding,” in Proc. ACM SIGKDD, 2016, pp. 1225–1234, doi: 10.1145/2939672.2939753.
[31]         Z. Zhang, P. Cui, and W. Zhu, “Deep Learning on Graphs: A Survey,” IEEE Trans. Knowl. Data Eng., vol. PP, p. 1, 2020, doi: 10.1109/TKDE.2020.2981333.
 
[32]         S. Cao, W. Lu, and Q. Xu, “GraRep: Learning Graph Representations with Global Structural Information,” in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015, pp. 891–900, doi: 10.1145/2806416.2806512.
 
[33]         J. Chen, Z. Gong, W. Wang, W. Liu, and X. Dong, “CRL: Collaborative Representation Learning by Coordinating Topic Modeling and Network Embeddings,” IEEE Trans. neural networks Learn. Syst., vol. PP, Feb. 2021, doi: 10.1109/TNNLS.2021.3054422.
 
[34]         A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. Smola, “Distributed Large-scale Natural Graph Factorization,” in WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web, 2013, pp. 37–48, doi: 10.1145/2488388.2488393.
 
[35]         B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: Online Learning of Social Representations,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., Mar. 2014, doi: 10.1145/2623330.2623732.
 
[36]         T. Landauer, P. Foltz, and D. Laham, “An Introduction to Latent Semantic Analysis,” Discourse Process., vol. 25, pp. 259–284, 1998, doi: 10.1080/01638539809545028.
 
[37]         V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter,” arXive:1910.01108, 2019.
 
[38]         D. Soergel, WordNet. An Electronic Lexical Database. MIT Press, 1998.
 
[39]         A. Pal and D. Saha, “Word Sense Disambiguation: A Survey,” Int. J. Control Theory Comput. Model., vol. 5, Aug. 2015, doi: 10.5121/ijctcm.2015.5301.
 
[40]         A. Montejo-Ráez, E. Martínez-Cámara, M. T. Martín-Valdivia, and L. A. Ureña-López, “Ranked WordNet graph for Sentiment Polarity Classification in Twitter,” Comput. Speech Lang., vol. 28, no. 1, pp. 93–107, 2014, doi: https://doi.org/10.1016/j.csl.2013.04.001.
 
[41]         O. El Midaoui, B. El Ghali, A. El Qadi, and M. D. Rahmani, “Geographical Query reformulation using a Geographical Taxonomy and WordNet,” Procedia Comput. Sci., vol. 127, pp. 489–498, 2018, doi: https://doi.org/10.1016/j.procs.2018.01.147.
 
[42]         T. Hao, W. Xie, Q. Wu, H. Weng, and Y. Qu, “Leveraging Question Target Word Features Through Semantic Relation Expansion for Answer Type Classification,” Knowledge-Based Syst., vol. 133, pp. 43–52, 2017, doi: https://doi.org/10.1016/j.knosys.2017.06.030.
 
[43]         S. K. Ray, S. Singh, and B. P. Joshi, “A Semantic Approach for Question Classification Using WordNet and Wikipedia,” Pattern Recognit. Lett., vol. 31, no. 13, pp. 1935–1943, 2010, doi: https://doi.org/10.1016/j.patrec.2010.06.012.
 
[44]         J. Goikoetxea, A. Soroa, and E. Agirre, “Bilingual Embeddings with Random Walks Over Multilingual WordNets,” Knowledge-Based Syst., vol. 150, pp. 218–230, 2018, doi: https://doi.org/10.1016/j.knosys.2018.03.017.
 
[45]         D. Banik, A. Ekbal, P. Bhattacharyya, S. Bhattacharyya, and J. Platos, “Statistical-Based System Combination Approach to Gain Advantages Over Different Machine Translation Systems,” Heliyon, vol. 5, no. 9, p. e02504, 2019, doi: https://doi.org/10.1016/j.heliyon.2019.e02504.
 
[46]         L. Finkelstein et al., “Placing Search in Context: The Concept Revisited,” ACM Trans. Inf. Syst., vol. 20, pp. 116–131, 2002.
 
[47]         R. Misra, “News Category Dataset.” 2018, doi: 10.13140/RG.2.2.20331.18729.
 
[48]         A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning Word Vectors for Sentiment Analysis,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Jun. 2011, pp. 142–150, [Online]. Available: http://www.aclweb.org/anthology/P11-1015.
 
[49]         H. Rubenstein and J. Goodenough, “Contextual Correlates of Synonymy,” Commun. ACM, vol. 8, pp. 627–633, 1965, doi: 10.1145/365628.365657.
 
[50]         G. Cassani and A. Lopopolo, “Multimodal Distributional Semantics Models and Conceptual Representations in Sensory Deprived Subjects.” 2016, doi: 10.13140/RG.2.1.3394.3924.
 
[51]         G. A. Miller and W. G. Charles, “Contextual Correlates of Semantic Similarity,” Lang. Cogn. Process., vol. 6, no. 1, pp. 1–28, 1991, doi: 10.1080/01690969108406936.
 
[52]         and A. K. Felix Hill, Roi Reichart, “SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation,” Comput. Linguist., vol. 41, no. 4, pp. 665–695, 2015.
 
[53] I. Vulić et al., “Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity,” Comput. Linguist., vol. 46, no. 4, pp. 847–897, 2020.