An Exploration of Today’s Voice Recognition Reality

2018-02-06

Whether it’s Siri, Cortana, or Alexa, it seems talking to virtual entities has become a regular part of our day-to-day lives. What once seemed solely relegated to the realm of science fiction is now very much a reality. The way the technology is rapidly evolving, it would appear there is almost no limit to what voice recognition technology can do.

Me: Alexa, write a 1000-word article on the emergence of voice recognition technology.

Alexa: [Prolonged Silence]

So, while it is clear the technology still does have some limitations, they seem to be becoming fewer and fewer as the days go by. Voice recognition technology is growing so much a part of our lives that it’s sometimes easy to forget that it was not that long ago when the technology was not advanced enough to really have any significant impact.

The Early Days of Voice Recognition

The first name in voice recognition was not Siri but rather Audrey, who made her virtual debut in 1952. The brainchild of Bell Laboratories, Audrey’s appeal was hampered by severely limited functionality. She could only understand numbers spoken by specific people. In the 1960’s, IBM unveiled their Shoebox Machine, but this was not much more advanced in what it could do. Shoebox could understand 16 words but only coming from one particular speaker.

The first breakthrough in the technology came in the 1980’s with the advent of the Hidden Markov Model (HMM). In his 2016 piece, A Brief History of Voice Control, Josh Benji touches on why the development was so significant.

“The HMM drastically altered the development of a viable speech recognition software. By way of HMM, speech recognition went from using templates to understanding words to a statistical method that measured the probability of unknown sounds being words. This allowed for the number of understandable words to go from a few hundred to a few thousand. The potential to recognise an unlimited number of words was on the horizon.”

With the dawn of the new millennium, there were some valiant attempts to build on this breakthrough with software that offered voice-to-text capabilities. In 1990, Dragon Dictate became the first commercial software on the market, but at price point of $9,000, it had very few takers. In 1997, the same company released Dragon/Naturally Speaking for $695. This was the most functional software to date but still wasn’t practical enough to break into the mainstream.

The First Golden Age of Voice Recognition

It would take the introduction of the smartphone to make a significant breakthrough in voice recognition. It wasn’t because the technology was inherently better but because they finally had a critical mass of users. In her article History of voice recognition: from Audrey to Siri, Melanie Pinola touches on why the Google voice search app for the iPhone was a real turning point in the development of the technology.

“The impact of Google’s app is significant for two reasons. First, cell phones and other mobile devices are ideal vehicles for speech recognition, as the desire to replace their tiny on-screen keyboards serves as an incentive to develop better, alternative input methods. Second, Google had the ability to offload the processing for its app to its cloud data centres, harnessing all that computing power to perform the large-scale data analysis necessary to make matches between the user’s words and the enormous number of human-speech examples it gathered.”

Apple introduced the iPhone’s Siri in 2011, and it would not take long for others to follow. In July of 2012, Google launched Google Now. The following year, Microsoft launched Cortana, and in 2014, Amazon released Alexa and the Echo system. In less than three years, the tech giants all had voice recognition systems in the mainstream.

TECHNOLOGY: An Exploration of Today's Voice Recognition Reality — Image via Gettyimages

What Does It Mean For Us Today?

Voice recognition is becoming a bigger part of lives in ways that we welcome and in some ways we don’t. A lot of people took to the new phenomenon of being able to ask your phone to play a certain song or call a friend. There is a real convenience in being to operate hands-free, especially when the smartphone keyboards are relatively small. The technology seems less than ideal when we are asked to state our reason for calling to a virtual customer service AI bot over and over again.

Whether we embrace the technology or not, one thing that is clear is that it is not going away. As we move more and more to the Internet of Things (IoT), voice recognition remains a central part of it.

Perhaps there is no better example than the current use of Alexa as part of an integrated ‘smart home.’ Today you can ask Alexa to dim the lights, set the thermostat to a specific temperature, operate home entertainment consoles, turn on and off major appliances, lock the front door, and operate cameras and home security systems. All with nothing more than a voice command.

Voice Recognition’s reach is not limited to the home. Banks are beginning to adopt voice recognition as one of their strongest weapons against identity theft and fraud. Voices are like fingerprints in that they are nearly impossible to forge.

[ms-protect-content id=”4069,4129″]

Voice Recognition Tomorrow

The idea of Alexa and the smart home would have been thought of as fantasy only ten years ago. So it’s anyone’s guess as to what may be possible ten years from now. That said, we’re already getting a glimpse of what that future may have in store. Homeowners can now ask refrigerators for advice on what to make for dinner. You can ask it for suggestions and it can advise you on what food items are close to their expiration date and should be used right away.

One of the more interesting commercial applications is found in Anryze, an American company that provides speech recognition services for brokerage firms, compliance departments and call centres. Anryze uses blockchain technology to provide real-time transcription of calls or recorded audio. Their breakthrough was in using the same technology that crypto-currency miners use to produce Bitcoin. By breaking up the source audio into small portions and farming those out to computers in the network, they are able to process these bite-sized transcriptions almost immediately and then piece them all together as one completed transcription file.

So as the technology continues to improve at an exponential, who knows what might be possible in the not too distant future.

Me: Alexa, end this article with a pithy yet brilliant observation.

Alexa: [Prolonged Silence]

Maybe one of these days…

[/ms-protect-content]