H.6.3.2. Feature evaluation and selection
Sayyed Mohammad Hoseini; Majid Ebtia; Mohanna Dehgardi
Abstract
The abundance of high dimensional datasets and the computational limitations of data analysis processes in applying to high-dimensional data have made clear the importance of developing feature selection methods. The negative impact of irrelevant variables on prediction and increasing unnecessary calculations ...
Read More
The abundance of high dimensional datasets and the computational limitations of data analysis processes in applying to high-dimensional data have made clear the importance of developing feature selection methods. The negative impact of irrelevant variables on prediction and increasing unnecessary calculations due to the redundant attributes lead to poor results or performance of the classifiers. Feature selection is, therefore, applied to facilitate a better understanding of the datasets, reduce computational time, and enhance prediction accuracy. In this research, we develop a composite method for feature selection that combines support vector machines and principal component analysis. Then the method is implemented to the -nearest neighbor and the Naïve Bayes algorithms. The datasets utilized in this study consist of three from the UCI Machine Learning Repository, used to assess the performance of the proposed models. Additionally, a dataset gathered from the central library of Ayatollah Boroujerdi University was considered. This dataset encompasses 1,910 instances with 30 attributes, including gender, native status, entry term, faculty code, cumulative GPA, and the number of books borrowed. After applying the proposed feature selection method, an accuracy of 70% was obtained with only five features. Experimental results demonstrate that the proposed feature selection method chooses appropriate feature subset. The approach yields enhanced classification performance, as evaluated by metrics such as accuracy, -score and Matthews correlation coefficient.
Mohammad Ahmadi Livani; mahdi Abadi; Meysam Alikhany; Meisam Yadollahzadeh Tabari
Abstract
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient ...
Read More
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be caused due to compromised or malfunctioning nodes. In the distributed approach, we use distributed principal component analysis (DPCA) and fixed-width clustering (FWC) in order to establish a global normal pattern and to detect outlier. The process of establishing the global normal pattern is distributed among all sensor nodes. We also use weighted coefficients and a forgetting curve to periodically update the established normal profile. We demonstrate that the proposed distributed approach achieves comparable accuracy compared to the centralized approach, while the communication overhead in the network and energy consumption is significantly reduced.