Have you ever dreamed of being able to control every device in your home with your voice? Many of us have long-standing fantasies of being able to speak to our homes to turn on the lights, play our favorite music, and get dinner started. Voice control is the holy grail of home automation.
We already have limited capabilities in this respect. For example, you can control your lighting with a Vivint smart home hub or a smart speaker device from Amazon or Google. Just install smart lighting sockets and connect them to your network. Now you can issue a command to control every light in your home.
Turning lights on and off with your voice is a novelty. But have you ever noticed that for voice control to work, commands need to be straightforward and simple? And you have to use the exact same command every time. This is because voice control is still quite limited. The biggest challenge of improving it is us. That’s right, humans are the one hurdle that needs to be overcome.
Simple Commands Are Easy
The earliest voice recognition systems were programmed to recognize single phrases. Simple enough. Builders could digitally record a voice command and use it to program a device’s memory. That device would then ‘listen’ for that particular command. If the command was heard, the action was taken.
That’s basically how smart speakers work. They listen for specific commands. For example, tell your Google device to give you the weather. It is listening for only a few specific words in your sentence – with ‘weather’ being the most important one.
If you were to ask your smart speaker how you should dress for the day, the software inside would not necessarily understand you are inquiring about the weather. In your mind, the weather will dictate how you dress. To your smart speaker software, the two topics are unrelated.
In short, simple commands are easy. Complex commands are not. Moreover, interpreting human language is extremely difficult for machines.
The Speech Recognition Problem
You have probably heard the term ‘speech recognition’. As a science, speech recognition involves a lot more than just recording sounds made by the human voice. It is also about interpreting those sounds to understand words, then interpreting those words to understand intent. Getting it right is not so easy.
Having a face-to-face conversation with someone allows your brain to understand what is going on by paying attention to a variety of cues. While your ears are hearing the conversation and transmitting information to your brain, your eyes are also paying attention to body language, mouth movements, and facial expressions.
This interaction alone helps you understand conversations in ways that machines cannot. But it goes further still. Your brain understands how people use certain words and phrases. Your brain associates certain words with other words that tend to follow closely after it. How does this help?
Your brain can self-correct when someone else speaks something incorrectly. You could be having a conversation with someone who makes a clear grammatical mistake. You do not even recognize it because your brain accounts for it. But if you were to play that same conversation to a machine, the grammar mistake will throw it off completely.
Speech Recognition and Home Automation
This very real limit of speech recognition is what makes it difficult to advance home automation beyond what it can currently do. Right now, smart speakers cannot determine intent. They cannot auto correct when users speak commands incorrectly.
To overcome that, big tech is working on something known as natural language processing (NLP). This is the ability of machines to interpret intent based on the entire context of what someone is saying. NLP requires a combination of machine learning, deep learning, and artificial intelligence.
Speech Recognition and Accents
Another challenge in making voice-controlled home automation better are human accents. Again, this is a bigger deal than you might think. If you do not understand why, take a few minutes and search online for audio files featuring different forms of English.
We Americans have a specific accent. So do English people, Scots, Irish, Australians, and New Zealanders. We can all utter the exact same sentence yet pronounce the words in five different ways. Speech recognition doesn’t handle that well. It has to be trained to recognize accents one-at-a-time. This is not so easy.
Within a single accent there may be multiple dialects as well. That makes speech recognition even more complicated. Remember that machines cannot self-correct like the human brain can. Our brains can overcome different accents and dialects because we already know intent. We understand what words and phrases tend to go together.
Now, you may find a Scotsman nearly impossible to understand. Even so, you have a better shot at figuring out what he is saying than a machine that hasn’t been programmed to understand a Scottish accent.
We Will Get There
Despite the many challenges of speech recognition and voice-controlled home automation, we will get to where we want to go eventually. It will just take time. How long though is anyone’s guess. Will it be fifty years? Will it be one hundred? No one really knows.
The big challenge right now is duplicating what the human brain can do. Our brains are able to successfully overcome poor pronunciation. Our brains can deal with different accents and dialects. They can account for poor grammar and incorrect vocabulary. Machines cannot.
At that point they can, voice-controlled automation will have a lot more to offer. We will be able to control all sorts of devices in the home by speaking a variety of commands. We will probably even be able to speak to our homes like we speak to one another.
Until then, controlling things like lighting and security cameras require very specific commands spoken in a clear, concise manner. We have to give our smart speakers a lot of help. But still, it is a lot better than the way things used to be. Voice control has come a long way.