Hybrid voice activity detection system based on LSTM and auditory speech features

dc.contributor.authorKorkmaz, Yunus
dc.contributor.authorBoyaci, Aytug
dc.date.accessioned2024-04-24T16:10:58Z
dc.date.available2024-04-24T16:10:58Z
dc.date.issued2023
dc.departmentDicle Üniversitesien_US
dc.description.abstractVoice Activity Detection (VAD), sometimes called as Speech Activity Detection, is the process of extracting speech regions in audio recordings including many type of sounds. Because undesired data causes both computational complexity and time wasting, most of speech based applications consider only speech part (region of interest) and ignore the rest. This is the main reason that makes usage of the VAD stands a preliminary stage in applications like automatic speech recognition (ASR), speaker identification/verification, speech enhancement, speaker diarization etc. In this study, a successful semi-supervised VAD system, which we named as hybrid-VAD , was proposed especially for the environment with high signal-to-noise ratio (SNR) with the manner of two-stage. At first, VAD decision was obtained from a relatively simple Long-Short Term Memory (LSTM) network trained by auditory speech features like energy, zero crossing rate (ZCR) and 13rd order-Mel Frequency Cepstral Coefficients (MFCC). After we applied a reasonable thresholding strategy to the same features to have second VAD decision, we combined both decisions with logical operators. The result was surprisingly showed that final VAD decision have low FEC and OVER errors, which are specifically critical for any speaker diarization system, mostly in the environments with high SNR.en_US
dc.identifier.doi10.1016/j.bspc.2022.104408
dc.identifier.issn1746-8094
dc.identifier.issn1746-8108
dc.identifier.scopus2-s2.0-85141927780
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1016/j.bspc.2022.104408
dc.identifier.urihttps://hdl.handle.net/11468/15209
dc.identifier.volume80en_US
dc.identifier.wosWOS:000891441500006
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoenen_US
dc.publisherElsevier Sci Ltden_US
dc.relation.ispartofBiomedical Signal Processing and Control
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectVoice Activity Detectionen_US
dc.subjectZero Crossing Rateen_US
dc.subjectMfccen_US
dc.subjectLstmen_US
dc.titleHybrid voice activity detection system based on LSTM and auditory speech featuresen_US
dc.titleHybrid voice activity detection system based on LSTM and auditory speech features
dc.typeArticleen_US

Dosyalar