Yazar "Boyaci, Aytug" seçeneğine göre listele
Listeleniyor 1 - 3 / 3
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe A comprehensive Turkish accent/dialect recognition system using acoustic perceptual formants(Elsevier Sci Ltd, 2022) Korkmaz, Yunus; Boyaci, AytugAccent or dialect is one of the hot topics of emerging technology in speech processing. In an audio recording, extracting accent clues from a speech signal can help investigators to have an idea about where speaker is from. It is mostly used for detecting regional origin/ethnicity of speakers in real-time surveillance systems as well as demographic researches about changes in human sounds by geography. In this work, Turkish language accent analysis was performed using formant frequencies (F1, F2 and F3) of vowels. We divided our work into two approaches which are statistical and classification. In both of them, evaluations were done by splitting Turkey's map in to 2 and 3 dialect regions virtually. Totally 112 monolingual university students (72 males, 40 females) have uttered 103 meaningful Turkish syllables. Because formant frequencies can vary depending on genders, males and females were evaluated separately in both statistical and classification analysis. The result surprisingly showed that especially the isolated vowel 'e' is able to classify a male speaker which was pre-known to be from Mediterranean or Eastern Anatolia regions with an accuracy of 90% using KNN classifier. (c) 2022 Elsevier Ltd. All rights reserved.Öğe Hybrid voice activity detection system based on LSTM and auditory speech features(Elsevier Sci Ltd, 2023) Korkmaz, Yunus; Boyaci, AytugVoice Activity Detection (VAD), sometimes called as Speech Activity Detection, is the process of extracting speech regions in audio recordings including many type of sounds. Because undesired data causes both computational complexity and time wasting, most of speech based applications consider only speech part (region of interest) and ignore the rest. This is the main reason that makes usage of the VAD stands a preliminary stage in applications like automatic speech recognition (ASR), speaker identification/verification, speech enhancement, speaker diarization etc. In this study, a successful semi-supervised VAD system, which we named as hybrid-VAD , was proposed especially for the environment with high signal-to-noise ratio (SNR) with the manner of two-stage. At first, VAD decision was obtained from a relatively simple Long-Short Term Memory (LSTM) network trained by auditory speech features like energy, zero crossing rate (ZCR) and 13rd order-Mel Frequency Cepstral Coefficients (MFCC). After we applied a reasonable thresholding strategy to the same features to have second VAD decision, we combined both decisions with logical operators. The result was surprisingly showed that final VAD decision have low FEC and OVER errors, which are specifically critical for any speaker diarization system, mostly in the environments with high SNR.Öğe Unsupervised and supervised VAD systems using combination of time and frequency domain features(Elsevier Sci Ltd, 2020) Korkmaz, Yunus; Boyaci, AytugVoice Activity Detection (VAD), also referred as Speech Activity Detection (SAD) is the process of identifying speech/non-speech region in digital speech recordings. It is used as a preliminary stage to reduce errors and increase effectiveness in the most of speech based applications like automatic speech recognition (ASR), speaker identification/verification, speech enhancement, speaker diarization etc. In this study, two independent VAD structures were proposed for unsupervised and supervised approaches using both time and frequency domain features. The autocorrelation based pitch contour estimation was used together with the 1NN Cosine classifier trained by 21-column feature matrix comprising Energy, Zero Crossing Rate (ZCR), 13rd order-Mel Frequency Cepstral Coefficients (MFCC) and Shannon Entropies of daubechies-filtered 5th depth-Wavelet Packet Transform (WPT) to obtain VAD decision in supervised approach, while methods like normalization, thresholding and median filtering were applied over the same feature set in unsupervised approach. The proposed unsupervised VAD achieved error rates of 4%, 19%, 0.02% and 0.7% for the FEC, MSC, OVER and NDS, respectively at 0 dB SNR. The VAD decisions of both supervised and unsupervised systems showed that the proposed methods can efficiently be used either in silent or in environments with noise similar to Additive White Gaussian Noise (AWGN). (C) 2020 Elsevier Ltd. All rights reserved.