H.6.3.2. Feature evaluation and selection
Zeinab Abbasi
Abstract
Storing and processing large volume datasets is one of the most critical problems in large-scale processing. Therefore, it is need to reduce their size before further processing. This paper is proposed a framework for data reduction in large-scale datasets. The proposed framework is based on MapReduce ...
Read More
Storing and processing large volume datasets is one of the most critical problems in large-scale processing. Therefore, it is need to reduce their size before further processing. This paper is proposed a framework for data reduction in large-scale datasets. The proposed framework is based on MapReduce algorithm. It has three steps. Firstly, by reservoir sampling, some instances of a dataset are selected. In the second step, the features of these selected instances are weighted using ReliefF algorithm. Then, all weights are averaged for each feature and features with the highest weight values are selected. Finally, the selected features have been used in classification. Implementation results of the proposed framework show a good reduction of time. It also increases accuracy or maintains it when a large amount of data is removed by eliminating irrelevant features in classification algorithms.