Speech recognition

Speech recognition is a technology that allows computers to recognize and translate spoken words into text. Automatic speech recognition systems can also be used by people to give voice commands to computers and electronic devices.

Advantages

  • Once a speech recognition system is trained and it recognizes one’s voice, as well as their way of speaking, dictating, will be faster than typing
  • Even if you add the time spent correcting mistakes, it is faster to generate text documents by dictating them rather than typing
  • It is more convenient to interact with a device’s virtual assistants verbally rather than by typing
  • People with physical disabilities can make use of computers and electronic devices
  • Organizations offering customer support can identify users while they are waiting to get in touch with an operator – users can input their personal details as well as the reason they are calling, so agents will already know who they are talking to and how they can help them

Disadvantages

  • Speech recognition requires a learning curve, as users have to train themselves to acquire a better dictation method before the machine understands them at its best
  • Although speech recognition has been around for decades, the technology is not close to perfection, as it has only recently got very close to human parity
  • The efficiency of speech recognition technology is severely influenced by background noises, so users have to speak as closely as possible to the microphone and interact with the systems in silent environments
  • Speech recognition programs will make more errors when attempting to understand people with accents and those speaking in dialect
  • This technology also has issues with homonyms and will have a hard time in making a distinction between their, there, and they’re etc.

Components

  • Speech capturing device
  • Digital signal processor
  • Preprocessed signal storage
  • Hidden Markov Models
  • Language model
  • Pattern matching algorithm
  • Neural networks

Development tools

  • Microsoft Speech Platform SDK
  • Microsoft Bing Speech API
  • Google Cloud Speech API
  • CMUSphinx
  • JustSay-Ah
  • Kaldi
  • HTK
  • Simon
  • Julius
  • OpenEars
Scroll to Top