Monday, April 4, 2016

Keyword Spotting for Controlling Window Background Color

This is a small testing program that uses both CMUSphinx and GTK+ to demonstrate keyword spotting (KWS) algorithm.

KWS is the technique used to detect the keyword at anytime. Yes, this is the technique applied for “Okay, Google” and “Hey, Siri”. Whenever the keyword is heard by the machine, some callback function will be fired up.

History

  • Originally, Hidden Markov Model system
  • Google, 2014, Deep Neural Network (DNN), demos outperformance to HMM system
  • Google, 2015, Convolutional Neural Networks (CNNs), demos outperformance to DNN
    • ignore input topology, as the (fixed) input can be presented in any order without affecting the performance of the network
    • not explicitly designed to model translational variance within speech signals, which can exist due to different speaking styles / capture translational invariance with far fewer parameters by averaging the outputs of hidden units

Tools

CMU Sphinx Project by Carnegie Mellon University

  • CMU LTI, Language Technology Institute
  • Designed to be adopted on different platforms including iOS, Android, Raspberry Pi, etc.
  • License: BSD-style (nice!)

Raspberry Pi 2 – Speech Recognition on device

  • Upload word list to http://www.speech.cs.cmu.edu/tools/lmtool-new.html
  • Link .lm and .dict file, command: pocketsphinx_continuous -inmic yes -lm 0730.lm -dict 0730.dic -samprate 16000/8000/48000

My Code

Github link: https://github.com/heronyang/kws-color-demo

Components

In main.c, the program fires up a thread for handling GUI jobs right after it started. Then, it started to setup pocketsphinx and call recognize_from_microphone or recognize_from_file for the audio input. Since argc/argv is passed into the settings, the user can specify the dictionary file or log file as what is written in run.sh.

Run

> ./run.sh

Demo