This is a small testing program that uses both CMUSphinx and GTK+ to demonstrate keyword spotting (KWS) algorithm.
KWS is the technique used to detect the keyword at anytime. Yes, this is the technique applied for “Okay, Google” and “Hey, Siri”. Whenever the keyword is heard by the machine, some callback function will be fired up.
History
- Originally, Hidden Markov Model system
- Google, 2014, Deep Neural Network (DNN), demos outperformance to HMM system
- Google, 2015, Convolutional Neural Networks (CNNs), demos outperformance to DNN
- ignore input topology, as the (fixed) input can be presented in any order without affecting the performance of the network
- not explicitly designed to model translational variance within speech signals, which can exist due to different speaking styles / capture translational invariance with far fewer parameters by averaging the outputs of hidden units
Tools
CMU Sphinx Project by Carnegie Mellon University
- CMU LTI, Language Technology Institute
- Designed to be adopted on different platforms including iOS, Android, Raspberry Pi, etc.
- License: BSD-style (nice!)
Raspberry Pi 2 – Speech Recognition on device
- Upload word list to http://www.speech.cs.cmu.edu/tools/lmtool-new.html
- Link .lm and .dict file, command:
pocketsphinx_continuous -inmic yes -lm 0730.lm -dict 0730.dic -samprate 16000/8000/48000
My Code
Github link: https://github.com/heronyang/kws-color-demo
Components
In main.c
, the program fires up a thread for handling GUI jobs right after it started. Then, it started to setup pocketsphinx and call recognize_from_microphone
or recognize_from_file
for the audio input. Since argc/argv is passed into the settings, the user can specify the dictionary file or log file as what is written in run.sh
.
Run
> ./run.sh
No comments:
Post a Comment