N. Esfandian; F. Jahani bahnamiri; S. Mavaddati
Abstract
This paper proposes a novel method for voice activity detection based on clustering in spectro-temporal domain. In the proposed algorithms, auditory model is used to extract the spectro-temporal features. Gaussian Mixture Model and WK-means clustering methods are used to decrease dimensions of the spectro-temporal ...
Read More
This paper proposes a novel method for voice activity detection based on clustering in spectro-temporal domain. In the proposed algorithms, auditory model is used to extract the spectro-temporal features. Gaussian Mixture Model and WK-means clustering methods are used to decrease dimensions of the spectro-temporal space. Moreover, the energy and positions of clusters are used for voice activity detection. Silence/speech is recognized using the attributes of clusters and the updated threshold value in each frame. Having higher energy, the first cluster is used as the main speech section in computation. The efficiency of the proposed method was evaluated for silence/speech discrimination in different noisy conditions. Displacement of clusters in spectro-temporal domain was considered as the criteria to determine robustness of features. According to the results, the proposed method improved the speech/non-speech segmentation rate in comparison to temporal and spectral features in low signal to noise ratios (SNRs).