Categories


Silence, Voiced, and Unvoiced Speech Classification

Classifying speech segments as silence, voiced, or unvoiced is an  important component of the speech timescale modification algorithm.  This classification is accomplished using short time magnitude and zero crossing rates of the speech signal.  Many algorithms have been published for classifying speech segments into voiced, unvoiced that focus on determining the exact endpoints of theses [...]

Zero Crossing Rate in Octave

Here is a prototype of the short time average crossing rate.  Note that the zero crossing rate near the beginning of the phrase is high where the average magnitude is low.  The combination of the zero crossing rate and average magnitude can be used in an algorithm to classify components of speech.

Average zero crossing [...]

Short Time Zero Crossing Rate of Speech

The short time average zero crossing rate of a speech signal can be used in conjunction with the short time average energy (or magnitude) to discriminate between voiced speech, unvoiced speech and silence.  The short time average crossing rate of a digitally sample speech signal is defined in Digital Processing of Speech Signals (Rabiner & [...]

Short Time Energy in Octave

Here is a quick prototype of the short time energy function in GNU Octave for a the speech sample “Mister Meryk”. The plot below shows the average magnitude of the phrase using a window size of 320 samples, calculated every 80 samples.

Average magnitude function.

Here is the code that I used to generate the [...]

Short Time Energy of Speech Signals

The short time energy measurement of a speech signal can be used to determine voiced vs. unvoiced speech.  Short time energy can also be used to detect the transition from unvoiced to voiced speech and vice versa.  The energy of voiced speech is much greater than the energy of unvoiced speech.

Equation 1 Short time [...]

BeagleBoard Project Update

My project, Speed Reader, has been approved for the BeagleBoard Sponsored Projects Program.  Now I’ll receive a BeagleBoard to prototype a Speech Timescale Modification application for playing audio books.  I’m one step closer to building a real application!

Zooming in on speech

The plot from last time doesn’t reveal much about the nature of speech so I’ve been looking at some smaller bits.  First I isolated the first phrase from the sample speech: “Mister Maryk”:

Plot of the phrase "Mister Meryk"

This is a good chunk of speech for experimenting with classifying algorithms but this is still too [...]

Algorithm Prototyping: GNU Octave

Signal processing algorithms are usually prototyped and tested using high level math tools such as MATLAB or Mathcad. These tools use a high level language that closely models actual mathematical equations. Tools like these include many built in math functions and utilities to plot results. MATLAB even has add-on packages that can [...]