With some melancholy one thinks of the time companies could be called by phone and an actual human would be available to answer your questions. Alas, those times are over. Phone numbers are getting harder to find, or they are simply absent. Progressively, direct human contact is replaced by “chatbots” which some companies lovingly call their “digital colleagues”. The Dutch post (PostNL), for example, boasts that their digital colleague “Daan” successfully handles over 160.000 queries per month.
You may have asked yourself, what system lies behind the interaction with a digital colleague? When you access PostNL’s Daan via the website, it is fairly easy to spot that some form of closed-question chatbot system is being used (e.g., first ask question A, if the answer is such-and-such then go to question B otherwise, go to question C). For example, Daan might ask you “What kind of package is involved?” and then you could either select “Package within the Netherlands.” or “Package outside of the Netherlands.” upon which it proceeds to a particular follow-up question.
A more impressive form of interaction, however, is that one can also connect to Daan via a Google Home speaker by asking (in Dutch) to “connect me to PostNL”. Google will then relay you to Daan, who awaits your spoken input. Here, he only allows you two options: “Track a package” and “Find out what the nearest post-office is”. When you access Daan in this way you may, however, find out that he is not as smart as a human. For example, at some point he might ask you “may I know your location?” and if you simply answer “yes” then that is fine. However, there are many ways for a human to say “yes”. For example one could say: “no problem”, “alright then”, “go ahead”, “okay” or “just this once”. It often happens that dialog systems, when activated through speech, have difficulties mapping whatever you say (which could literally be anything) to a certain “intent” (in this case “yes” — you may use my location — or “no” — you may not). So, Daan may not realize that “no problem” is supposed to map to the “yes” intent and produce an error instead.
Usually, chatbot creators supply various “training sentences” for given intents. For example, a weather chatbot must learn to recognize that “What’s the weather gonna be?”, “Is it going to rain soon?”, and “Do I need to bring an umbrella today?” refer to the same intent (i.e., “provide a weather report”). Furthermore, the intent, once recognized, often needs to be supplemented with “entities”, that is: additional information related to the intent. For example, a weather update needs an entity for “location” and also a “date and time” entity. If the user already supplies this information in the question this can be taken on board immediately (e.g., do I need to bring an umbrella today at 17.00?) and the system will only need one more follow-up question to obtain the location. Typically, the creator of a chatbot provides a large variety of example sentences including cases with and without entities and an algorithm learns to classify them to the appropriate intent, through “machine learning”.
Actually, Daan’s “big brother”, The Google Home speaker that we used earlier to connect to him, is extremely good at this. When not activated, it continuously listens for a so-called “hot word” like “OK Google” and upon encountering this cue then tries to guess what you want from it (i.e., your intent). Considering the vast array of options one could ask (e.g., “what is the phone number of KLM?”, “what is the color of snow?”, “tell me a joke”, “translate bicycle into Japanese”, “show me a picture of a panda on my phone”, “make an appointment for tomorrow 15.00”, etc.), it is truly remarkable how accurate it is most of the time.
However, for many people, talking to a loudspeaker or a phone, even one as advanced as Google’s, is not the same as actually talking to something “with a face”. A lot of the communication we do is in fact nonverbal (why else would we put happy or sad emojis in our texts?). Many companies have realized that this is important, as reflected in the increasing number of “embodied assistants” being offered both online (do you remember Microsoft office’s “paperclip”?) and in real-life. Physically embodied examples could range from smart, cute, and affordable “desktop pets” (like VECTOR and EMO) to the more professionally oriented market in which social robots like PEPPER and the FURHAT play a significant role (see images below).
The Furhat robot actually represents a rather unique concept in the social robotics world as it projects a full-fledged digitally flexible human face onto a plastic mask. Where cute desktop assistants try to convey non-verbal gestures by changing their eye patterns (e.g. the EMO seems to be “happy” in the picture through squinting its eyes in a particular way) and Pepper cannot make facial expressions at all; the Furhat is able to display even very subtle gestures, just like a human. In fact, you can now even “record” an actual human’s facial gestures with your phone and simply “play” them back on the Furhat. In addition, it has the ability to change its entire face into a different ethnicity and/or gender and change its voice as well.
It seems only a matter of time before embodied chatbots, like the Furhat and others, will become more commonplace as a relevant and hopefully enjoyable point of interaction at train stations, hospitals, schools, and perhaps even at your local PostNL point as a physically embodied version of Daan. In fact, in Japan there are already robot-run hotels and the Pepper robot is present at many Softbank stores to help you.
Although we will likely remain intuitively aware that we are dealing with a machine (Clark & Fischer 2002), the more “human” the verbal- and non-verbal communication appears, the more we are prone to be immersed in it.
For a moment we might even forget we are talking to a computer…
This blog post was written by Rinus Verdonschot
Clark, H. H., & Fischer, K. (2022). Social robots as depictions of social agents. Behavioral and Brain Sciences, 1–33. https://doi.org/10.1017/S0140525X22000668