Experts agree: 2018 will be the year of voice user interfaces. Instead of clicking through more and more tabs, you will be able to ask the tool to perform the desired task, and what’s more: its interface will tell you what you should do…
…if, of course, designers will be able to face the challenges that lie ahead. And there are plenty of them.
Principles of designing voice user interfaces (VUI) and graphical user interfaces (GUI) are significantly different from each other.
The first and most important difference relates to the method of interaction preferred by the users. A user interacting with the voice user interface expects something different than a user interacting with the graphical user interface. It is a cliché, but only seemingly. In practice, many designers forget about this principle. As a result, voice user interfaces are created in a way that is not adapted to the actual needs and expectations of users.
The second difference concerns the instructions. Even the most transparent interface should be equipped with an instruction manual. Relying only on user’s intuition usually ends badly for design. Designers of graphical user interfaces have extensive experience in this matter; the interfaces they create guide users through their most important functions, without having to resort to external instructions or extensive manuals. The introduction does not have to be a separate module. A tutorial integrated with the tool significantly accelerates the whole onboarding, and the learning process takes place “on the go”, that is, when working with the functions themselves.
The third difference concerns the lack of a compendium of proven solutions. GUI designers have it much easier: there are a lot of well-designed graphical user interfaces, and the principles of their design have been repeatedly codified, processed and verified. On the other hand, people working on VUI can be compared to discoverers of new lands. Many things cannot be predicted. Seemingly irrelevant errors can result in many hours of programmers’ work. Other types of errors can cause the interface to be a topic of mockery on the Internet. These problems also happen to industry leaders – e.g. the recent situation with Siri, which mistook the hymn of Bulgaria with Despacito. The press all over the world wrote about it.
Many voice user interfaces are completely non-intuitive products that do not meet the users’ expectations as to the communication process. This problem also applies to well-designed VUI – Siri, Alexa or Google Assistant. In part, this is due to technical limitations. Although interpersonal communication is significantly different from communication with an interface, the user using the voice user interface expects that it will be closer to the first type. As a result, the user tries to communicate with the interface as if with another person. Meanwhile, even the best VUI still have a problem with this kind of communication: they do not manage to fully recognize the actual intentions of the user or the context. So if you have wondered sometimes why Siri or Alexa so often mistakes your intentions or gives absurd answers – this is the reason.
How to deal with these and other problems faced by the designers of voice user interfaces? Below is a list of recommended practices and ways to avoid problems.
The possibilities and limitations of graphical user interfaces are visible to the naked eye. With voice user interfaces, things are different. It is good practice to inform the user about the interface’s features. How to achieve this? It’s best to get straight to the point. Your interface can inform the user about its basic function using the following question: “Do you want to turn up the music/switch on the light in the living room/check the weather forecast for the weekend?“. It can also directly inform: “If you want, I can turn up music for you. I can do it in such and such a range.”
Informing about limitations helps to avoid user irritation. An interface, which itself informs that music can’t be turned up louder, or the lights made brighter, is designed better than one that informs about it only after the user’s attempt to do so.
If the user uses any of the interface functions, he should not have any doubts about it. If the given function takes a long time, the interface should inform the user that it has accepted the instruction and it is working on it (e.g. “I’m ordering your shopping list items right now”). This information should not be repeated. It is better to replace any repetitions with information about the progress (e.g. “It will take me 6 seconds to submit your order to the store”).
It is also worth making sure that complete information is provided. Notice that the most popular voice user interfaces, especially Alexa, always communicate using complete sentences. Thanks to this, the user has no problem with the identification of his command (which may be important for tasks, the implementation of which takes longer than a dozen or so minutes). Nor does the user have to wonder if the interface correctly understood the command. Alexa informs him about it directly, thus enabling verification and possible modification of the earlier instructions.
In a similar way, the voice interface can provide the user with information about the form of communication preferred by the user. The general rule: the more complete sentences the interface uses, the shorter the user’s answers have to be. Such exchange can take the following form:
User: “Check the weather forecast for me.”
Interface: “At the moment I can check the weather forecast for you for today, for the upcoming week or for the weekend. Which forecast would you like to hear?“.
This is a much better solution than an interface that communicates, for example, in this way: “If you want to know the weather forecast for today, say the word «Today». If you want to know the weekly forecast, say the word «Week». If you want to know the forecast for the weekend… “, etc. However, in some situations, the latter solution may also be justified – for example, if there is a limited number of diametrically different possibilities, possibly in the case of voice-graphic user interfaces (such as infolines).
It is also good practice to inform the user about not understanding the instructions – preferably immediately with the reasons given. An interface that tells you why it did not understand your command and what you can do about it will always have an advantage one which only informs you of the misunderstanding. Example:
User: “Will you check it for me?“
Interface: “Unfortunately, I do not understand your instructions. We were talking about the weather forecast and Poland’s match with Germany. Do you want me to check the weather forecast? Or maybe you’re interested in the result of the match?“
This solution will certainly work better than an interface informing about the weather forecast in response to the question about the result of the match.
We all know how annoying the overly complex voice-graphical user interfaces can be. “Bank’s offer – press 1. Your products – press 2. Cash loans – press 3. Mortgages – press 4…” – and so on and on, until number nine, before you will finally find out that you need to press 8, not 9, to talk to a consultant. If you add products advertisements to this (which in such situations does happen quite often), the result will be very bad usability. And even worse user experience.
The user must know that his commands are being heard. He should also know when the device supported by your interface switches from standby to sleep mode and vice versa. The device status is best indicated using non-sound signals (e.g. a glowing LED can inform the user about the standby status). This solution is used by all leading manufacturers of this type of solutions – with the already mentioned Alexa leading the way.
An alternative is to inform the user about status changes by means of simple, easily recognizable but not intrusive sound signals. However, this solution has one serious drawback: the signal lasts just a moment, the status – not necessarily. Problems also arise when trying to design such information indication for devices that often change from one status to another. Problems can also be caused by an extended status changing process (e.g. in industrial devices or smart homes).
Even the best voice user interfaces sometimes happen to “hear something” incorrectly. In such situations, forcing the user to repeat the entire command is tantamount to punishing him for the equipment’s error. This, in turn, translates into a significant increase in irritation.
A well-designed voice user interface allows the user to quickly make corrections without having to repeat the entire command. This is especially important for more extensive programs. If the user instructs the interface to send a text message to a specific person and the interface incorrectly saves its content, the user should be able to easily identify the parts that need correction and to replace them. The same applies to a situation in which the error is the fault of the user himself. Making corrections to an earlier command should be simpler and less time-consuming than undoing it and re-instructing the interface from the beginning.
Creating a catalog of errors most often made by your interface may prove time-consuming and require additional tests. Its correct implementation can, however, result in a significant improvement in the user experience. Thanks to the automatic error correction system, the interface can easily cope with many potentially troublesome situations (e.g. such as described in the previous section). A correction will also reduce the number of serious misunderstandings (e.g. when the device actually performs an action following a misheard command). An example of a well-implemented correction mechanism is the accent-recognition function.
According to the predictions of J. Koetsier – a well-known Canadian entrepreneur and publicist associated with the most widely-read outlets regarding new technologies – devices using voice user interfaces at the end of 2017 were present in 33 million American homes. That is 24 and a half million more than at the end of 2016. The most popular of these was – of course – Amazon Echo.
Although it is not expected – says Koetsier – that voice user interfaces should replace graphical ones in the near future, the demand for them will continue to grow exponentially. The devices themselves will feature more and more new skills. For example: Currently, Alexa has approximately 25,000 of them. That’s over 15,000 more than a year ago (when it still had less than 10,000). Skills are – in the simplest terms – unique functions of the interface (e.g. ability to control the coffee machine or to send text messages via the telephone integrated with the interface).
The emergence of new functions should be understood as a direct response to the growing user demands and requirements. There is no doubt that the world has welcomed VUIs with great enthusiasm. The question is: for how long – and what will happen next?
Designers are still asking themselves questions about the preferred direction of equipment development. Should the interfaces designed by them become more similar to people? And if so – in what sense? And how to avoid the dangerous phenomenon of the “uncanny valley”?
The phenomenon of the uncanny valley – in a nutshell – means that the more the behavior of the device, e.g. the way it talks, is similar to a human, the more repulsive the relatively small deviations from generally accepted human norms are to the users. As a consequence, we are more disturbed by small errors in the animation of realistically presented faces than large errors in unrealistic animations. This is one of the reasons designers decide, at least for the time being, to design devices that are nothing like people; it can be seen not only in the external appearance of the receivers/transmitters, but also in the ways they talk (which are always somewhat exaggerated, but not always in a way justified by usability or the considerations mentioned above). There is also no indication that anything will change in this matter – in any case, not in the near future.
Although mainstream technology press is pleased to announce the end of graphical user interfaces and the advent of a new era of voice user interfaces, a much more likely scenario is the increase in the popularity of devices using both. At the moment there are no authoritative studies regarding the advantage of VUI over GUI or the possible replacement of one by the other. Voice user interfaces are still a very new thing. This opens up completely new possibilities for designers. Time will tell if they manage to take advantage of them.