H.6.4. Clustering
M. Owhadi-Kareshki; M.R. Akbarzadeh-T.
Abstract
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality ...
Read More
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in the consensus process, hence no private data are transferred. With the proposed use of entropy as an internal measure of consensus clustering validation at each machine, the cluster centers of the local machines with higher expected clustering validity have more influence in the final consensus centers. We also employ relative cost function of the local Fuzzy C-Means (FCM) and the number of data points in each machine as measures of relative machine validity as compared to other machines and its reliability, respectively. The utility of the proposed consensus strategy is examined on 18 datasets from the UCI repository in terms of clustering accuracy and speed up against the centralized version of FCM. Several experiments confirm that the proposed approach yields to higher speed up and accuracy while maintaining data security due to its protected and distributed processing approach.
H.6.4. Clustering
M. Manteqipour; A.R. Ghaffari Hadigheh; R. Mahmoodvand; A. Safari
Abstract
Grouping datasets plays an important role in many scientific researches. Depending on data features and applications, different constrains are imposed on groups, while having groups with similar members is always a main criterion. In this paper, we propose an algorithm for grouping the objects with random ...
Read More
Grouping datasets plays an important role in many scientific researches. Depending on data features and applications, different constrains are imposed on groups, while having groups with similar members is always a main criterion. In this paper, we propose an algorithm for grouping the objects with random labels, nominal features having too many nominal attributes. In addition, the size constraint on groups is necessary. These conditions lead to a mixed integer optimization problem which is not convex nor linear. It is an NP-hard problem and exact solution methods are computationally costly. Our motivation to solve such a problem comes along with grouping insurance data which is essential for fair pricing. The proposed algorithm includes two phases. First, we rank random labels using fuzzy numbers. Afterwards, an adjusted K-means algorithm is used to produce homogenous groups satisfying a cluster size constraint. Fuzzy numbers are used to compare random labels, in both observed values and their chance of occurrence. Moreover, an index is defined to find the similarity of multi-valued attributes without perfect information with those accompanied with perfect information. Since all ranks are scaled into the interval [0,1], the result of ranking random labels does not need rescaling techniques. In the adjusted K-means algorithm, the optimum number of clusters is found using coefficient of variation instead of Euclidean distance. Experiments demonstrate that our proposed algorithm produces fairly homogenous and significantly different groups having requisite mass.
H.6.4. Clustering
M. Lashkari; M. Moattar
Abstract
A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization ...
Read More
A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization Algorithm (ECOA) and K-means (K), which is called ECOA-K. The COA algorithm has advantages such as fast convergence rate, intelligent operators and simultaneous local and global search which are the motivations behind choosing this algorithm. In the Extended Cuckoo Algorithm, we have enhanced the operators in the classical version of the Cuckoo algorithm. The proposed operator of production of the initial population is based on a Chaos trail whereas in the classical version, it is based on randomized trail. Moreover, allocating the number of eggs to each cuckoo in the revised algorithm is done based on its fitness. Another improvement is in cuckoos’ migration which is performed with different deviation degrees. The proposed method is evaluated on several standard data sets at UCI database and its performance is compared with those of Black Hole (BH), Big Bang Big Crunch (BBBC), Cuckoo Search Algorithm (CSA), traditional Cuckoo Optimization Algorithm (COA) and K-means algorithm. The results obtained are compared in terms of purity degree, coefficient of variance, convergence rate and time complexity. The simulation results show that the proposed algorithm is capable of yielding the optimized solution with higher purity degree, faster convergence rate and stability in comparison to the other compared algorithms.
H.6.4. Clustering
P. Shahsamandi Esfahani; A. Saghaei
Abstract
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering ...
Read More
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two contradictory objective functions based on maximum data compactness in clusters (the degree of proximity of data) and maximum cluster separation (the degree of remoteness of clusters’ centers) is proposed. In order to solve this model, a recently proposed optimization method, the Multi-objective Improved Teaching Learning Based Optimization (MOITLBO) algorithm, is used. This algorithm is tested on several datasets and its clusters are compared with the results of some single-objective algorithms. Furthermore, with respect to noise, the comparison of the performance of the proposed model with another multi-objective model shows that it is robust to noisy data sets and thus can be efficiently used for multi-objective fuzzy clustering.