F. Jafarinejad
Abstract
In recent years, new word embedding methods have clearly improved the accuracy of NLP tasks. A review of the progress of these methods shows that the complexity of these models and the number of their training parameters grows increasingly. Therefore, there is a need for methodological innovation for ...
Read More
In recent years, new word embedding methods have clearly improved the accuracy of NLP tasks. A review of the progress of these methods shows that the complexity of these models and the number of their training parameters grows increasingly. Therefore, there is a need for methodological innovation for presenting new word embedding methodologies. Most current word embedding methods use a large corpus of unstructured data to train the semantic vectors of words. This paper addresses the basic idea of utilizing from structure of structured data to introduce embedding vectors. Therefore, the need for high processing power, large amount of processing memory, and long processing time will be met using structures and conceptual knowledge lies in them. For this purpose, a new embedding vector, Word2Node is proposed. It uses a well-known structured resource, the WordNet, as a training corpus and hypothesis that graphic structure of the WordNet includes valuable linguistic knowledge that can be considered and not ignored to provide cost-effective and small sized embedding vectors. The Node2Vec graph embedding method allows us to benefit from this powerful linguistic resource. Evaluation of this idea in two tasks of word similarity and text classification has shown that this method perform the same or better in comparison to the word embedding method embedded in it (Word2Vec). This result is achieved while the required training data is reduced by about 50,000,000%. These results provide a view of capacity of the structured data to improve the quality of existing embedding methods and the resulting vectors.
F. Jafarinejad; R. Farzbood
Abstract
Image retrieval is a basic task in many content-based image systems. Achieving high precision, while maintaining computation time is very important in relevance feedback-based image retrieval systems. This paper establishes an analogy between this and the task of image classification. Therefore, in the ...
Read More
Image retrieval is a basic task in many content-based image systems. Achieving high precision, while maintaining computation time is very important in relevance feedback-based image retrieval systems. This paper establishes an analogy between this and the task of image classification. Therefore, in the image retrieval problem, we will obtain an optimized decision surface that separates dataset images into two categories of relevant/irrelevant images corresponding to the query image. This problem will be viewed and solved as an optimization problem using particle optimization algorithm. Although the particle swarm optimization (PSO) algorithm is widely used in the field of image retrieval, no one use it for directly feature weighting. Information extracted from user feedbacks will guide particles in order to find the optimal weights of various features of images (Color-, shape- or texture-based features). Fusion of these very non-homogenous features need a feature weighting algorithm that will take place by the help of PSO algorithm. Accordingly, an innovative fitness function is proposed to evaluate each particle’s position. Experimental results on Wang dataset and Corel-10k indicate that average precision of the proposed method is higher than other semi-automatic and automatic approaches. Moreover, the proposed method suggest a reduction in the computational complexity in comparison to other PSO-based image retrieval methods.