D. Data
Zahra Ghorbani; Ali Ghorbanian
Abstract
Increasing the accuracy of time-series clustering while reducing execution time is a primary challenge in the field of time-series clustering. Researchers have recently applied approaches, such as the development of distance metrics and dimensionality reduction, to address this challenge. However, using ...
Read More
Increasing the accuracy of time-series clustering while reducing execution time is a primary challenge in the field of time-series clustering. Researchers have recently applied approaches, such as the development of distance metrics and dimensionality reduction, to address this challenge. However, using segmentation and ensemble clustering to solve this issue is a key aspect that has received less attention in previous research. In this study, an algorithm based on the selection and combination of the best segments created from a time-series dataset was developed. In the first step, the dataset was divided into segments of equal lengths. In the second step, each segment is clustered using a hierarchical clustering algorithm. In the third step, a genetic algorithm selects different segments and combines them using combinatorial clustering. The resulting clustering of the selected segments was selected as the final dataset clustering. At this stage, an internal clustering criterion evaluates and sorts the produced solutions. The proposed algorithm was executed on 82 different datasets in 10 repetitions. The results of the algorithm indicated an increase in the clustering efficiency of 3.07%, reaching a value of 67.40. The obtained results were evaluated based on the length of the time series and the type of dataset. In addition, the results were assessed using statistical tests with the six algorithms existing in the literature.
D. Data
M. Zarezade; E. Nourani; Asgarali Bouyer
Abstract
Community structure is vital to discover the important structures and potential property of complex networks. In recent years, the increasing quality of local community detection approaches has become a hot spot in the study of complex network due to the advantages of linear time complexity and applicable ...
Read More
Community structure is vital to discover the important structures and potential property of complex networks. In recent years, the increasing quality of local community detection approaches has become a hot spot in the study of complex network due to the advantages of linear time complexity and applicable for large-scale networks. However, there are many shortcomings in these methods such as instability, low accuracy, randomness, etc. The G-CN algorithm is one of local methods that uses the same label propagation as the LPA method, but unlike the LPA, only the labels of boundary nodes are updated at each iteration that reduces its execution time. However, it has resolution limit and low accuracy problem. To overcome these problems, this paper proposes an improved community detection method called SD-GCN which uses a hybrid node scoring and synchronous label updating of boundary nodes, along with disabling random label updating in initial updates. In the first phase, it updates the label of boundary nodes in a synchronous manner using the obtained score based on degree centrality and common neighbor measures. In addition, we defined a new method for merging communities in second phase which is faster than modularity-based methods. Extensive set of experiments are conducted to evaluate performance of the SD-GCN on small and large-scale real-world networks and artificial networks. These experiments verify significant improvement in the accuracy and stability of community detection approaches in parallel with shorter execution time in a linear time complexity.
D. Data
M. Hajizadeh-Tahan; M. Ghasemzadeh
Abstract
Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important ...
Read More
Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operation should be applied in a way that relationships between the features are maintained and accuracy of the classification algorithms would increase. In this paper, a new evolutionary multi-objective algorithm is presented. The proposed algorithm uses three objective functions to achieve high-quality discretization. The first and second objectives minimize the number of selected cut points and classification error, respectively. The third objective introduces a new criterion called the normalized cut, which uses the relationships between their features’ values to maintain the nature of the data. The performance of the proposed algorithm was tested using 20 benchmark datasets. According to the comparisons and the results of nonparametric statistical tests, the proposed algorithm has a better performance than other existing major methods.
D. Data
S. Taherian Dehkordi; A. Khatibi Bardsiri; M. H. Zahedi
Abstract
Data mining is an appropriate way to discover information and hidden patterns in large amounts of data, where the hidden patterns cannot be easily discovered in normal ways. One of the most interesting applications of data mining is the discovery of diseases and disease patterns through investigating ...
Read More
Data mining is an appropriate way to discover information and hidden patterns in large amounts of data, where the hidden patterns cannot be easily discovered in normal ways. One of the most interesting applications of data mining is the discovery of diseases and disease patterns through investigating patients' records. Early diagnosis of diabetes can reduce the effects of this devastating disease. A common way to diagnose this disease is performing a blood test, which, despite its high precision, has some disadvantages such as: pain, cost, patient stress, lack of access to a laboratory, and so on. Diabetic patients’ information has hidden patterns, which can help you investigate the risk of diabetes in individuals, without performing any blood tests. Use of neural networks, as powerful data mining tools, is an appropriate method to discover hidden patterns in diabetic patients’ information. In this paper, in order to discover the hidden patterns and diagnose diabetes, a water wave optimization(WWO) algorithm; as a precise metaheuristic algorithm, was used along with a neural network to increase the precision of diabetes prediction. The results of our implementation in the MATLAB programming environment, using the dataset related to diabetes, indicated that the proposed method diagnosed diabetes at a precision of 94.73%,sensitivity of 94.20%, specificity of 93.34%, and accuracy of 95.46%, and was more sensitive than methods such as: support vector machines, artificial neural networks, and decision trees.
D. Data
M. Abdar; M. Zomorodi-Moghadam
Abstract
In this paper the accuracy of two machine learning algorithms including SVM and Bayesian Network are investigated as two important algorithms in diagnosis of Parkinson’s disease. We use Parkinson's disease data in the University of California, Irvine (UCI). In order to optimize the SVM algorithm, ...
Read More
In this paper the accuracy of two machine learning algorithms including SVM and Bayesian Network are investigated as two important algorithms in diagnosis of Parkinson’s disease. We use Parkinson's disease data in the University of California, Irvine (UCI). In order to optimize the SVM algorithm, different kernel functions and C parameters have been used and our results show that SVM with C parameter (C-SVM) with average of 99.18% accuracy with Polynomial Kernel function in testing step, has better performance compared to the other Kernel functions such as RBF and Sigmoid as well as Bayesian Network algorithm. It is also shown that ten important factors in SVM algorithm are Jitter (Abs), Subject #, RPDE, PPE, Age, NHR, Shimmer APQ 11, NHR, Total-UPDRS, Shimmer (dB) and Shimmer. We also prove that the accuracy of our proposed C-SVM and RBF approaches is in direct proportion to the value of C parameter such that with increasing the amount of C, accuracy in both Kernel functions is increased. But unlike Polynomial and RBF, Sigmoid has an inverse relation with the amount of C. Indeed, by using these methods, we can find the most effective factors common in both genders (male and female). To the best of our knowledge there is no study on Parkinson's disease for identifying the most effective factors which are common in both genders.