This evening I thought I'd take a crack at looking at voice recognition in flash. The idea for this came after the realisation that when using my phone I have almost entirely given up on using the keypad. Instead I simply utter a few words (When passers by are out of ear shot) and a voice appears at the other end. Wonderful! I haven't read up on voice recognition at all so in part I am using the knowledge I've gained by looking at image recognition and tracking software. Being a big Aphex Twin fan and having written a spectrograph to visualize
this song in flash before I had at least a bit of a grounding. The idea would be to record a small sample of voice, using some microphone thresholding, draw a spectograph of the sample and then compare the spectograph to a database of possible patterns and determine which command or whatever is most similar. I'm pleased to say that I have completed the first half of this today and you can click to see a demo below (The top image will launch it):
The images are of a whistling sequence, a glass being rung, and me saying hello. Interestingly you can see the harmonics of the glass being rung (a few set frequencies). My voice is a bit harder to translate but using some analysis I'm sure it will be possible.
Anyway as usual the source code can be downloaded
here and let me know if you come up with anything using it. How about auto-tuning?!
The plan from here is to save single voice clips as images, and train a neural network to find patterns in both the volume and the spectograph for a set number of commands. I will report my findings shortly!