milVAD: A bag-level MNIST modelling of voice activity detection using deep multiple instance learning

Yükleniyor...
Küçük Resim

Tarih

2022

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Elsevier SCI LTD.

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Voice Activity Detection (VAD) which is used as an onset step for majority of the applications in Digital Speech Processing (DSP) area is defined as the process of identifying speech region in an audio recording. It is mostly used for automatic speech recognition, speaker identification/verification, speech enhancement, speaker diari-zation etc. in order to reduce output errors and increase overall effectiveness of the systems. In this study, a bag-level MNIST modelling of VAD was proposed using Deep Multiple Instance Learning (Deep MIL) approach. To the best of our knowledge, because this is the first attempt that the VAD was modelled as a MIL problem in the literature, we named "milVAD". The MNIST dataset was modified to obtain bag-level classifier model for the VAD framework while the MIL algorithm was implemented inside a Convolutional Neural Network (CNN) as an embedded layer using Noisy-And pooling method. The proposed modelling scenario has surprisingly achieved high training accuracy, which is approx. 99.91%, with only nine epochs via Deep MIL at bag-level. These results proved that the MIL can efficiently be used for the VAD systems in the manner of binary classification

Açıklama

Anahtar Kelimeler

Voice Activity Detection (VAD), Convolutional Neural Networks (CNN), Semisupervised learning, Machine learning

Kaynak

Biomedical Signal Processing and Control

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

74

Sayı

Künye

Korkmaz, Y. ve Boyacı, A. (2022). milVAD: A bag-level MNIST modelling of voice activity detection using deep multiple instance learning. Biomedical Signal Processing and Control, 74, 103520.