Hybrid voice activity detection system based on LSTM and auditory speech features
dc.contributor.author | Korkmaz, Yunus | |
dc.contributor.author | Boyaci, Aytug | |
dc.date.accessioned | 2024-04-24T16:10:58Z | |
dc.date.available | 2024-04-24T16:10:58Z | |
dc.date.issued | 2023 | |
dc.department | Dicle Üniversitesi | en_US |
dc.description.abstract | Voice Activity Detection (VAD), sometimes called as Speech Activity Detection, is the process of extracting speech regions in audio recordings including many type of sounds. Because undesired data causes both computational complexity and time wasting, most of speech based applications consider only speech part (region of interest) and ignore the rest. This is the main reason that makes usage of the VAD stands a preliminary stage in applications like automatic speech recognition (ASR), speaker identification/verification, speech enhancement, speaker diarization etc. In this study, a successful semi-supervised VAD system, which we named as hybrid-VAD , was proposed especially for the environment with high signal-to-noise ratio (SNR) with the manner of two-stage. At first, VAD decision was obtained from a relatively simple Long-Short Term Memory (LSTM) network trained by auditory speech features like energy, zero crossing rate (ZCR) and 13rd order-Mel Frequency Cepstral Coefficients (MFCC). After we applied a reasonable thresholding strategy to the same features to have second VAD decision, we combined both decisions with logical operators. The result was surprisingly showed that final VAD decision have low FEC and OVER errors, which are specifically critical for any speaker diarization system, mostly in the environments with high SNR. | en_US |
dc.identifier.doi | 10.1016/j.bspc.2022.104408 | |
dc.identifier.issn | 1746-8094 | |
dc.identifier.issn | 1746-8108 | |
dc.identifier.scopus | 2-s2.0-85141927780 | |
dc.identifier.scopusquality | Q1 | |
dc.identifier.uri | https://doi.org/10.1016/j.bspc.2022.104408 | |
dc.identifier.uri | https://hdl.handle.net/11468/15209 | |
dc.identifier.volume | 80 | en_US |
dc.identifier.wos | WOS:000891441500006 | |
dc.identifier.wosquality | Q2 | |
dc.indekslendigikaynak | Web of Science | |
dc.indekslendigikaynak | Scopus | |
dc.language.iso | en | en_US |
dc.publisher | Elsevier Sci Ltd | en_US |
dc.relation.ispartof | Biomedical Signal Processing and Control | |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Voice Activity Detection | en_US |
dc.subject | Zero Crossing Rate | en_US |
dc.subject | Mfcc | en_US |
dc.subject | Lstm | en_US |
dc.title | Hybrid voice activity detection system based on LSTM and auditory speech features | en_US |
dc.title | Hybrid voice activity detection system based on LSTM and auditory speech features | |
dc.type | Article | en_US |