Document Type : Original/Review Paper

Authors

1 Department of Industrial Engineering, Birjand University of Technology, Birjand, Iran.

2 Department of Mathematics, Kosar University of Bojnord, Bojnord, Iran.

10.22044/jadm.2020.9021.2038

Abstract

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is presented to cluster incomplete fuzzy data. The method substitutes missing attribute by a trapezoidal fuzzy number to be determined by using the corresponding attribute of q nearest-neighbor. Comparisons and analysis of the experimental results demonstrate the capability of the proposed method.

Keywords

[1] Bellman, R. E. & Zadeh, L. A. (1970). Decision making in a fuzzy environment, Manag. Sci, vol. 17, pp. 141-164.

[2] Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms, Plenum, New York.

[3] Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B, vol. 39, pp. 1-38.

[4] Dixon, J. K. (1979). Pattern recognition with partly missing data, IEEE Trans Syst Man Cybern, vol. 9, pp. 617-621.

[5] Dutt, A., Ismail, M. A., & Herawan, T. (2017). A systematic review on educational data mining, IEEE Access, vol. 5, pp. 15991-16005.

[6] Fang, S. C., Hu, C. F., Wang, H. F., & Wu, S. Y. (1999). Linear programming with fuzzy coefficients in constraints, Computers &Mathematics with Applications, vol. 37, no. 10, pp. 63-76.

[7] Farhangfar, A., Kurgan, L. A., & Pedrycz, W. (2007). A novel framework for imputation of missing values in databases, IEEETransactions on Systems, Man, and Cybernetics-Part A: System sand Humans, vol. 37, no. 5, pp. 692-709.

[8] Garcia-Aguado, C., & Verdegay, J. L. (1993). On the sensitivity of membership functions for fuzzy linear programming problems, Fuzzy Sets and Systems, vol. 56, no. 1, pp. 47-49.

[9] Hathaway, R. J. &Bezdek, J. C. (2001). Fuzzy c-means clustering of incomplete data, IEEE Transactions on systems, Man, and Cybernetics Part B: Cybernetics, vol. 31, no. 5, pp. 735-744.

[10] Hettich, S., Blake, C. L. & Merz, C. J. (1998). UCI repository of machine learning database, Department of Information and Computer Science, University of California, Irvine, CA. http.

[11] Lai, Y. J. & Hwang, C. L. (1992). Fuzzy Mathematical Programming Methods and Applications, Springer, Berlin.

[12] Li, D., Gu, H., & Zhang, L. (2010). A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data, Expert Systems with Applications, vol. 37, no. 10, pp. 6942-6947.

[13] Li, D., Gu, H., & Zhang, L. (2013). A hybrid genetic algorithm fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals, Soft Computing, vol. 17, no. 10, pp.1787-1796.

 

[14] Li, T., Zhang, L., Lu, W., Hou, H., Liu, X., Pedrycz, W. & Zhong,C. (2017). Interval kernel Fuzzy C-Means clustering of incomplete data, Neurocomputing, vol. 237, pp. 316-331.

[15] Liu, L., Sun, S. Z., Yu, H., Yue, X. & Zhang, D. (2016). A modified Fuzzy C-Means (FCM) Clustering algorithm and its application on carbonate fluid identification, Journal of Applied Geophysics, vol. 129, pp. 28-35.

[16] Luenberger, D. G. (1984). Linear and Nonlinear Programming, 2nded. Addison-Wesley.

[17] Maleki, H. R. (2002). Ranking functions and their applications to fuzzy linear programming, Far East J. Math. Sci, vol. 4, pp. 283-301.

[18] Mclachlan, G. J. & Basford, K. E. (1988). Mixture models: inference and applications to clustering, Marcel Dekker, New York.

[19] Mesquita, D. P., Gomes, J. P., Junior, A. H. S., &Nobre, J. S.(2017). Euclidean distance estimation in incomplete datasets. Neurocomputing, vol. 248, pp. 11-18.

[20] Miyamoto, S., Takata, O. & Umayahara, K. (1998). Handling missing values in fuzzy c-means. In Proceedings of the third Asian fuzzy systems symposium, Masan, Korea, pp. 139-142.

[21] Owhadi-Kareshki, M. (2019). Entropy-based Consensus for Distributed Data Clustering, Journal of AI and Data Mining, vol. 7, no. 4, pp. 551-561.

[22] Sebestyen, G. S. (1962). Decision-making process in pattern recognition, NY: Macmillan Press.

[23] Shaocheng, T. (1994). Interval number and fuzzy number linear programming, Fuzzy sets and systems, vol. 66, no. 3, pp. 301-306.

[24] Shen, J., Zheng, E., Cheng, Z. & Deng, C. (2017). Assisting attraction classification by harvesting web data, IEEE Access, vol. 5, pp.1600-1608.

[25] Li, J., Struzik, Z., Zhang, L., & Cichocki, A. (2015). Feature learning from incomplete EEG with denoising auto encoder, Neurocomputing, vol. 165, pp. 23-31.

[26] Tan, P. N., Steinbach, M. & Kumar, V. (2005). Introduction to Datamining, Addison- Wesley.

[27] Tanaka, H. &Ichihashi, H. (1984). A formulation of fuzzy linear programming problem based on comparison of fuzzy numbers, Control Cyber, vol. 13, pp. 185-194.

[28] Teodoridis, S. & Koutroumbas, K. (2006). Pattern recognition, Third ed. Academic press, San Diego.

[29] Wang, Z. (2017). Determining the clustering centers by slope difference distribution, IEEE Access, vol. 5, pp. 10995-11002.

[30] Wang, X., Ruan, D. & Kerre, E. E. (2009). Mathematics of Fuzziness ˝U Basic Issues, Springer-Verlag Berlin Heidelberg.

[31] Wu, S., Pang, Y., Shao, S. & Jiang, K. (2018). Advanced fuzzy C-means algorithm based on local Density and Distance, Journal of Shanghai Jiaotong university (Science), vol. 23, no. 5, pp. 636-642.

[32] Yager, R.R. (1981). A procedure for ordering fuzzy sets of the unit interval, Information Sciences, vol. 24, pp. 143-161.

[33] Yang, M. S. & Nataliani, Y. (2017). Robust-learning fuzzy c-means clustering algorithm with unknown number of clusters, Pattern Recognition, vol. 71, pp. 45-59.

[34] Zhang, T. T. & Yuan, B. (2018). Density-based multiscale analysis for clustering in strong noise settings with varying densities, IEEE Access, vol. 6, pp. 25861-25873.