The plot from last time doesn’t reveal much about the nature of speech so I’ve been looking at some smaller bits. First I isolated the first phrase from the sample speech: “Mister Maryk”:
This is a good chunk of speech for experimenting with classifying algorithms but this is still too large a sample to see the fine grain structure of the sound wave form. Next lets take a look at the “eh” sound from the phrase “Mister Maryk”.
You can clearly see a periodic wave form though it is a bit noisy. This is voiced speech and I’ll use a pitch detection algorithm on sounds like this to determine the pitch period. Just judging by eye it looks like the period is about 75 samples which at an 8 kHz sample rate works out to be about 107 Hz. Yes, Humphrey Bogart had a pretty low voice. Note that all of the plots were made using GNU Octave and GNU Plot.
Two gross measurements are used to classify the component sounds of speech in this application: the short term average magnitude and the short term zero crossing rate. I’ll talk about those in my next post.









