Type a new keyword(s) and press Enter to search

Reading Report

 

            
            
             My degree thesis is focusing on the topic of "Study and comparison of the algorithms for pitch determination in Speech Recognition". During these 3 weeks, I was studying on the basic theories of Speech Recognition, reading related chapters in two books as well as four reference papers. A rudimentary picture of speech recognition comes up in my mind and here I would make a brief review in order to carry out following research.
             The speech signal is a non-stationary stochastic process. Because of this time-variant characteristic, we should divide the signal into short segment sequence, with one segment called a frame, often 5~50ms for general applications. In each frame, the signal can be deemed as stationary so that some existent theories of signal processing can be used to analyze the speech. .
             There are two types of speech sound called Unvoiced and Voiced. The voiced sound has periodicity to some extent while an unvoiced one is nearly random noise. The periodicity sets the pitch of a voiced sound. The difference between the voiced and unvoiced plays an important role in pitch determination, aiming at detection of the periodicity of the voiced sound and thus recognizing or synthesizing the produced sound.
             Pitch determination is one of the most important and difficult tasks in speech processing. Many pitch determination algorithms have been developed, three of which will be covered in my future work. They are "Short-time Average Magnitude Difference Function (AMDF)", "Homomorphic speech signal processing" and "Linear Predictive Coding (LPC)". All of these methods use the periodicity of the voiced sound. General ideas will be discussed later. Let's take a look at the fundamental but essential features of speech signal: short time stationarity and periodicity of the voiced.
             These two characteristics lead us to use the auto-correlation function to recognize who is who in the seemingly random speech signal.


Essays Related to Reading Report