M. Taherinia; M. Esmaeili; B. Minaei Bidgoli
Abstract
The Influence Maximization Problem in social networks aims to find a minimal set of individuals to produce the highest influence on other individuals in the network. In the last two decades, a lot of algorithms have been proposed to solve the time efficiency and effectiveness challenges of this NP-Hard ...
Read More
The Influence Maximization Problem in social networks aims to find a minimal set of individuals to produce the highest influence on other individuals in the network. In the last two decades, a lot of algorithms have been proposed to solve the time efficiency and effectiveness challenges of this NP-Hard problem. Undoubtedly, the CELF algorithm (besides the naive greedy algorithm) has the highest effectiveness among them. Of course, the CELF algorithm is faster than the naive greedy algorithm (about 700 times). This superiority has led many researchers to make extensive use of the CELF algorithm in their innovative approaches. However, the main drawback of the CELF algorithm is the very long running time of its first iteration. Because it has to estimate the influence spread for all nodes by expensive Monte-Carlo simulations, similar to the naive greedy algorithm. In this paper, a heuristic approach is proposed, namely Optimized-CELF algorithm, to improve this drawback of the CELF algorithm by avoiding unnecessary Monte-Carlo simulations. It is found that the proposed algorithm reduces the CELF running time, and subsequently improves the time efficiency of other algorithms that employed the CELF as a base algorithm. Experimental results on the wide spectral of real datasets showed that the Optimized-CELF algorithm provided better running time gain, about 88-99% and 56-98% for k=1 and k=50, respectively, compared to the CELF algorithm without missing effectiveness.
M. Asgari-Bidhendi; B. Janfada; O. R. Roshani Talab; B. Minaei-Bidgoli
Abstract
Named Entity Recognition (NER) is one of the essential prerequisites for many natural language processing tasks. All public corpora for Persian named entity recognition, such as ParsNERCorp and ArmanPersoNERCorpus, are based on the Bijankhan corpus, which is originated from the Hamshahri newspaper in ...
Read More
Named Entity Recognition (NER) is one of the essential prerequisites for many natural language processing tasks. All public corpora for Persian named entity recognition, such as ParsNERCorp and ArmanPersoNERCorpus, are based on the Bijankhan corpus, which is originated from the Hamshahri newspaper in 2004. Correspondingly, most of the published named entity recognition models in Persian are specially tuned for the news data and are not flexible enough to be applied in different text categories, such as social media texts. This study introduces ParsNER-Social, a corpus for training named entity recognition models in the Persian language built from social media sources. This corpus consists of 205,373 tokens and their NER tags, crawled from social media contents, including 10 Telegram channels in 10 different categories. Furthermore, three supervised methods are introduced and trained based on the ParsNER-Social corpus: Two conditional random field models as baseline models and one state-of-the-art deep learning model with six different configurations are evaluated on the proposed dataset. The experiments show that the Mono-Lingual Persian models based on Bidirectional Encoder Representations from Transformers (MLBERT) outperform the other approaches on the ParsNER-Social corpus. Among different Configurations of MLBERT models, the ParsBERT+BERT-TokenClass model obtained an F1-score of 89.65%.
H.3.5. Knowledge Representation Formalisms and Methods
N. Khozouie; F. Fotouhi Ghazvini; B. Minaei
Abstract
Context-aware systems must be interoperable and work across different platforms at any time and in any place. Context data collected from wireless body area networks (WBAN) may be heterogeneous and imperfect, which makes their design and implementation difficult. In this research, we introduce a model ...
Read More
Context-aware systems must be interoperable and work across different platforms at any time and in any place. Context data collected from wireless body area networks (WBAN) may be heterogeneous and imperfect, which makes their design and implementation difficult. In this research, we introduce a model which takes the dynamic nature of a context-aware system into consideration. This model is constructed according to the four-dimensional objects approach and three-dimensional events for the data collected from a WBAN. In order to support mobility and reasoning on temporal data transmitted from WBAN, a hierarchical model based on ontology is presented. It supports the relationship between heterogeneous environments and reasoning on the context data for extracting higher-level knowledge. Location is considered a temporal attribute. To support temporal entity, reification method and Allen’s algebra relations are used. Using reification, new classes Time_slice and Time_Interval and new attributes ts_time_slice and ts_time_Interval are defined in context-aware ontology. Then the thirteen logic relations of Allen such as Equal, After, Before is added by OWL-Time ontology to the properties. Integration and consistency of context-aware ontology are checked by the Pellet reasoner. This hybrid context-aware ontology is evaluated by three experts using the FOCA method based on the Goal-Question-Metrics (GQM) approach. This evaluation methodology diagnoses the ontology numerically and decreases the subjectivity and dependency on the evaluator’s experience. The overall performance quality according to completeness, adaptability, conciseness, consistency, computational efficiency and clarity metrics is 0.9137.
H.3.8. Natural Language Processing
A. Pakzad; B. Minaei Bidgoli
Abstract
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do ...
Read More
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipeline models, a tagging error propagates, but the model is not able to apply useful syntactic information. The goal of joint models simultaneously reduce errors of POS tagging and dependency parsing tasks. In this research, we attempted to utilize the joint model on the Persian and English language using Corbit software. We optimized the model's features and improved its accuracy concurrently. Corbit software is an implementation of a transition-based approach for word segmentation, POS tagging and dependency parsing. In this research, the joint accuracy of POS tagging and dependency parsing over the test data on Persian, reached 85.59% for coarse-grained and 84.24% for fine-grained POS. Also, we attained 76.01% for coarse-grained and 74.34% for fine-grained POS on English.