Unsupervised and supervised VAD systems using combination of time and frequency domain features

dc.contributor.authorKorkmaz, Yunus
dc.contributor.authorBoyaci, Aytug
dc.date.accessioned2024-04-24T16:10:58Z
dc.date.available2024-04-24T16:10:58Z
dc.date.issued2020
dc.departmentDicle Üniversitesien_US
dc.description.abstractVoice Activity Detection (VAD), also referred as Speech Activity Detection (SAD) is the process of identifying speech/non-speech region in digital speech recordings. It is used as a preliminary stage to reduce errors and increase effectiveness in the most of speech based applications like automatic speech recognition (ASR), speaker identification/verification, speech enhancement, speaker diarization etc. In this study, two independent VAD structures were proposed for unsupervised and supervised approaches using both time and frequency domain features. The autocorrelation based pitch contour estimation was used together with the 1NN Cosine classifier trained by 21-column feature matrix comprising Energy, Zero Crossing Rate (ZCR), 13rd order-Mel Frequency Cepstral Coefficients (MFCC) and Shannon Entropies of daubechies-filtered 5th depth-Wavelet Packet Transform (WPT) to obtain VAD decision in supervised approach, while methods like normalization, thresholding and median filtering were applied over the same feature set in unsupervised approach. The proposed unsupervised VAD achieved error rates of 4%, 19%, 0.02% and 0.7% for the FEC, MSC, OVER and NDS, respectively at 0 dB SNR. The VAD decisions of both supervised and unsupervised systems showed that the proposed methods can efficiently be used either in silent or in environments with noise similar to Additive White Gaussian Noise (AWGN). (C) 2020 Elsevier Ltd. All rights reserved.en_US
dc.identifier.doi10.1016/j.bspc.2020.102044
dc.identifier.issn1746-8094
dc.identifier.issn1746-8108
dc.identifier.scopus2-s2.0-85086406902
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1016/j.bspc.2020.102044
dc.identifier.urihttps://hdl.handle.net/11468/15207
dc.identifier.volume61en_US
dc.identifier.wosWOS:000551927300038
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoenen_US
dc.publisherElsevier Sci Ltden_US
dc.relation.ispartofBiomedical Signal Processing and Control
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectVoice Activity Detectionen_US
dc.subjectZcren_US
dc.subjectWpten_US
dc.subjectMfccen_US
dc.subjectAcf Based Pitchen_US
dc.subjectKnn Classificationen_US
dc.titleUnsupervised and supervised VAD systems using combination of time and frequency domain featuresen_US
dc.titleUnsupervised and supervised VAD systems using combination of time and frequency domain features
dc.typeArticleen_US

Dosyalar