Voice Activity Detector (VAD) on QualComm-ISCI-OGI Frontend by 팁휘

VAD on the frontend utilizes Multilayer Perceptron(MLP) s fed 6 MFCCs over 9 frames of input speech wavelet.

The MLP has 50 hidden units and 1 output unit. The hidden units has 9*6 inputs, corresponding weights, sigmoid function f = (1/1+e^-x) and one output. The output unit makes a result by aggregating the outputs of hidden units with weights. The MLP is trained using two outputs, a speech and a silence.

The probability of given frame being silence is computed by (e^silence/(e^silence+e^speech)).

Specially, they utilized low-pass filter before DCT computation during MFCC extraction.

References
Adami et al, Qualcomm-ICSI-OGI Features for ASR, ICSLP 2002


트랙백

이 글과 관련된 글 쓰기 (트랙백 보내기)
TrackbackURL : http://tptp.egloos.com/tb/5180896 [도움말]

덧글

댓글 입력 영역