In 1995, John Benedetto and Anthony Teolis published “A Wavelet Auditory Model and Data Compression”, in which they proposed a continuous transform design to mimic the auditory response of the human ear.
From 2009 to 2011, I worked with John Benedetto on a research project to develop a software implementation of WAM. I hired a team of students at the University of Maryland, we used Microsoft Windows and MATLAB, and published the software on sourceforge. The results are described in this video:
I’m now working to expand WAM into a full speech recognition system.
My first task is to port the existing code to Octave, the GNU Project’s free and open alternative to Matlab. Since Octave is designed to run Matlab code, this isn’t too difficult, except for the GUI, which is fairly proprietary to Matlab.
So far, I’ve gotten the main window and the “Filter Inspector” working. The dialog box to edit parameters doesn’t work, but they can be edited in the
initializeGlobals.m file before running the program.
Update: The entire GUI now works.
I don’t want to spend too much time porting code, since the Pioneer contest only runs for a month.
Here’s my first progress report for the Pioneer contest, which shows the program running on Ubuntu Linux with Octave:
My next step is to interface the output of the WAM transform to an artificial neural net, configured as a self-organizing map, to recognize patterns in the output.
You can follow my progress (or help contribute) by cloning the git repository at
matlab directory, you can run Octave, then type
wam at the Octave prompt. Hopefully, a dialog box will appear, prompting you to select a
.wav file (or some other audio format understood by Octave), after which the WAM transform will be computed and displayed.
Be warned – the code does not currently run in real-time! As the video shows, it takes about 20 seconds to process a quarter second of audio.