milVAD: A bag-level MNIST modelling of voice activity detection using deep multiple instance learning

dc.authorid0000-0002-6315-5750en_US
dc.contributor.authorKorkmaz, Yunus
dc.contributor.authorBoyacı, Aytuğ
dc.date.accessioned2023-03-29T10:54:16Z
dc.date.available2023-03-29T10:54:16Z
dc.date.issued2022en_US
dc.departmentDicle Üniversitesi, Diyarbakır Teknik Bilimler Meslek Yüksekokulu, Bilgisayar Teknolojileri Bölümüen_US
dc.description.abstractVoice Activity Detection (VAD) which is used as an onset step for majority of the applications in Digital Speech Processing (DSP) area is defined as the process of identifying speech region in an audio recording. It is mostly used for automatic speech recognition, speaker identification/verification, speech enhancement, speaker diari-zation etc. in order to reduce output errors and increase overall effectiveness of the systems. In this study, a bag-level MNIST modelling of VAD was proposed using Deep Multiple Instance Learning (Deep MIL) approach. To the best of our knowledge, because this is the first attempt that the VAD was modelled as a MIL problem in the literature, we named "milVAD". The MNIST dataset was modified to obtain bag-level classifier model for the VAD framework while the MIL algorithm was implemented inside a Convolutional Neural Network (CNN) as an embedded layer using Noisy-And pooling method. The proposed modelling scenario has surprisingly achieved high training accuracy, which is approx. 99.91%, with only nine epochs via Deep MIL at bag-level. These results proved that the MIL can efficiently be used for the VAD systems in the manner of binary classificationen_US
dc.identifier.citationKorkmaz, Y. ve Boyacı, A. (2022). milVAD: A bag-level MNIST modelling of voice activity detection using deep multiple instance learning. Biomedical Signal Processing and Control, 74, 103520.en_US
dc.identifier.doi10.1016/j.bspc.2022.103520en_US
dc.identifier.endpage8en_US
dc.identifier.issn1746-8094
dc.identifier.issn1746-8108
dc.identifier.scopus2-s2.0-85122670745en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage1en_US
dc.identifier.urihttps://www.sciencedirect.com/science/article/pii/S1746809422000428?via%3Dihub
dc.identifier.urihttps://hdl.handle.net/11468/11551
dc.identifier.volume74en_US
dc.identifier.wosWOS:000777305800005
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.institutionauthorKorkmaz, Yunus
dc.language.isoenen_US
dc.publisherElsevier SCI LTD.en_US
dc.relation.ispartofBiomedical Signal Processing and Controlen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectVoice Activity Detection (VAD)en_US
dc.subjectConvolutional Neural Networks (CNN)en_US
dc.subjectSemisupervised learningen_US
dc.subjectMachine learningen_US
dc.titlemilVAD: A bag-level MNIST modelling of voice activity detection using deep multiple instance learningen_US
dc.typeArticleen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
[ X ]
İsim:
milVAD.pdf
Boyut:
4.14 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Makale Dosyası
Lisans paketi
Listeleniyor 1 - 1 / 1
[ X ]
İsim:
license.txt
Boyut:
1.44 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: