Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Sedighi, Z.; Boostani, R.

doi:10.22044/jadm.2017.5064.1611

Document Type : Original/Review Paper

Authors

Electrical & Computer Department, Shiraz University, Shiraz, Iran.

https://doi.org/10.22044/jadm.2017.5064.1611

Abstract

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised one. To estimate the density distribution of data, Wiebull Mixture Model (WMM) is utilized due to its high flexibility. Another contribution of this study is to propose a new hill and valley seeking algorithm to find the constraints for semi-supervise algorithm. It is assumed that each density peak stands on a cluster center; therefore, neighbor samples of each center are considered as must-link samples while the near centroid samples belonging to different clusters are considered as cannot-link ones. The proposed approach is applied to a standard image dataset (designed for clustering evaluation) along with some UCI datasets. The achieved results on both databases demonstrate the superiority of the proposed method compared to the conventional clustering methods.

Keywords

Main Subjects

H.3. Artificial Intelligence

Journal of AI and Data Mining

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Volume 6, Issue 2
July 2018
Pages 287-295

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Volume 6, Issue 2July 2018Pages 287-295

Volume 6, Issue 2
July 2018
Pages 287-295