Categories


Silence, Voiced, and Unvoiced Speech Classification

Classifying speech segments as silence, voiced, or unvoiced is an  important component of the speech timescale modification algorithm.  This classification is accomplished using short time magnitude and zero crossing rates of the speech signal.  Many algorithms have been published for classifying speech segments into voiced, unvoiced that focus on determining the exact endpoints of theses [...]

Zero Crossing Rate in Octave

Here is a prototype of the short time average crossing rate.  Note that the zero crossing rate near the beginning of the phrase is high where the average magnitude is low.  The combination of the zero crossing rate and average magnitude can be used in an algorithm to classify components of speech.

Average zero crossing [...]

Short Time Zero Crossing Rate of Speech

The short time average zero crossing rate of a speech signal can be used in conjunction with the short time average energy (or magnitude) to discriminate between voiced speech, unvoiced speech and silence.  The short time average crossing rate of a digitally sample speech signal is defined in Digital Processing of Speech Signals (Rabiner & [...]

Short Time Energy in Octave

Here is a quick prototype of the short time energy function in GNU Octave for a the speech sample “Mister Meryk”. The plot below shows the average magnitude of the phrase using a window size of 320 samples, calculated every 80 samples.

Average magnitude function.

Here is the code that I used to generate the [...]

Short Time Energy of Speech Signals

The short time energy measurement of a speech signal can be used to determine voiced vs. unvoiced speech.  Short time energy can also be used to detect the transition from unvoiced to voiced speech and vice versa.  The energy of voiced speech is much greater than the energy of unvoiced speech.

Equation 1 Short time [...]

BeagleBoard Project Update

My project, Speed Reader, has been approved for the BeagleBoard Sponsored Projects Program.  Now I’ll receive a BeagleBoard to prototype a Speech Timescale Modification application for playing audio books.  I’m one step closer to building a real application!

Zooming in on speech

The plot from last time doesn’t reveal much about the nature of speech so I’ve been looking at some smaller bits.  First I isolated the first phrase from the sample speech: “Mister Maryk”:

Plot of the phrase "Mister Meryk"

This is a good chunk of speech for experimenting with classifying algorithms but this is still too [...]

Handling Speech Samples with GNU Octave

The first step is to get some sample speech to work with.  I found this clip on the Web.  The first problem is that all of the speech samples that I could find on the Web were encoded in mp3 format.  Speech processing requires linear encoding.  I also wanted to sample at 8kHz which is [...]

Algorithm Prototyping: GNU Octave

Signal processing algorithms are usually prototyped and tested using high level math tools such as MATLAB or Mathcad. These tools use a high level language that closely models actual mathematical equations. Tools like these include many built in math functions and utilities to plot results. MATLAB even has add-on packages that can [...]

Basic Technique

Basically, timescale modification of speech is accomplished by first dividing the speech into segments.   Then segments are deleted to speed up the speaker rate or segments are repeated to slow down the speaker rate.  The two key issues are how to segment the speech and what segments can be deleted or repeated without degrading intelligibility.

Speech [...]