Robert Mugabe

Robert Mugabe was a Zimbabwean revolutionary and politician who served as the Prime Minister of Zimbabwe from 1980 to 1987 and as the President of Zimbabwe from 1987 to 2017. Mugabe was a…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




MFCC

MFCC is based on human hearing perceptions which cannot perceive frequencies over 1Khz. In other words, in MFCC is based on the known variation of the human ear’s critical bandwidth with frequency [8–10]. MFCC has two types of the filter which are spaced linearly at the low frequency below 1000 Hz and logarithmic spacing above 1000Hz. A subjective pitch is present on Mel Frequency Scale to capture the important characteristic of phonetic in speech.

1)Pre-emphasis
This step processes the passing of signal through a filter which emphasizes
higher frequencies. This process will increase the energy of signal at the higher frequency.
Y[n]= X [n] — 0 . 95 X [n- 1]

Let’s consider a = 0.95, which make 95% of any one sample is presumed
to originate from the previous sample.

2)Framing

The process of segmenting the speech samples obtained from analog to digital conversion (ADC) into a small frame with the length within the range of 20 to 40 msec. The voice signal is divided into frames of N samples. Adjacent frames are being separated by M (M<N). Typical values used are M = 100 and N= 256.

3)Hamming windowing Hamming window is used as window shape by considering the next block in feature extraction processing chain and integrates all the closest frequency lines. The Hamming window equation is given as: If the window is defined as W (n), 0 ≤ n ≤ N-1 where

N = number of samples in each frame

Y[n] = Output signal

X (n) = input signal

W (n) = Hamming window,

then the result of the windowing signal is shown below:

Y[n] = X (n)* W (n)

4)Fast Fourier Transform

To convert each frame of N samples from the time domain into the frequency domain. The Fourier Transform is to convert the convolution of the glottal pulse U[n] and the vocal tract impulse response H[n] in the time domain. This statement supports the equation below:

Y [w] = FFT*[ h( t )* X( t)]= H( w)* X( w )

If X (w), H (w) and Y (w) are the Fourier Transform of X (t), H (t) and Y (t) respectively.

5)Mel Filter Bank Processing

The frequencies range in FFT spectrum is very wide and voice signal does not follow the linear scale. The bank of filters according to Mel scale.

Each filter’s magnitude frequency response is triangular in shape and equal to unity at the center frequency and decreases linearly to zero at the center frequency of two adjacent filters [7, 8]. Then, each filter output is the sum of its filtered spectral components. After that the following equation is used to compute the Mel for given frequency f in HZ:

F (Mel ) = [ 2595 * log 10 [ 1+ f] 700]

6) Discrete Cosine Transform

This is the process to convert the log Mel spectrum into time domain using Discrete Cosine Transform (DCT). The result of the conversion is called Mel Frequency Cepstrum Coefficient. The set of the coefficient is called acoustic vectors. Therefore, each input utterance is transformed into a sequence of the acoustic vector.

7) Delta Energy and Delta Spectrum The voice signal and the frames changes, such as the slope of a formant at its transitions. Therefore, there is a need to add features related to the change in cepstral features over time . 13 delta or velocity features (12 cepstral features plus energy), and 39 features a double delta or acceleration feature are added. The energy in a frame for a signal x in a window from time sample t1 to time sample t2 is represented at the equation below:

Energy = x2 * t

Each of the 13 delta features represents the change between frames in the equation 8 corresponding cepstral or energy feature, while each of the 39 double delta features represents the change between frames in the corresponding delta features.

d(t)=[c(t+1)- c(t-1)]/2

Add a comment

Related posts:

Menulis.

Biasanya orang menulis karena memang seseorang itu bisa menulis. Bisa dalam artian profesi atau hobi. Lalu apa alasan saya menulis? Karena aku pernah hidup. Apa yang membuat seorang revolusioner di…

Opcionais em Swift

Depois de ler este artigo, você vai conseguir utilizar uma ótima prática para aprimorar a segurança no seu código. Vamos lá? Quando você está desenvolvendo um código em Swift, é normal esbarrar com…

A Lost Friend

The voice of a friend