My daughter Nora is working as a host at a local restaurant this summer. Recently she told me about a strange phenomenon she’s noticed when she answers the phone.
Often, the caller will be Google Assistant. The potential diner uses Google Assistant to find out whether there’s a wait time. So far, so good. This is how Nora describes the conversation with Google Assistant:
Nora: Hello, this is Yummy Local Restaurant!
GA: This is Google Assistant calling on behalf of my client, and this call is being recorded. My client would like to know how long the wait is for a table for two.
Nora: There’s currently no wait.
(This is where it gets interesting.)
GA: (slowly) Oohhhhhh. I gotcha. Thanks.
Nora describes this tone as disappointed and confused. Regardless of what she tells Google Assistant, (“there’s no wait, 45 minute wait, 15 minute wait”) the response is exactly the same.
Google has, roughly speaking, all the money in the world to workshop, focus group, and record these responses. The voice sounds human, complete with pauses and “um”s, which makes it an even stranger experience.
What made Google choose this tone for its response? I’m assuming it’s like a film director choosing from a bunch of takes for the final edit. “Okay, we’ve got the ‘calm’ and ‘appreciative’ responses, the ‘enthusiastic’ one, the ‘indifferent’ one, the one that sounds like the guy’s in a hurry and needs to go…oh, here’s the ‘disappointed and confused’ one. You know what? Let’s go with that.”
Does this matter a lot in the long run? No, probably not. But in the final analysis, there is only one human taking part in this conversation. Shouldn’t someone choose an automated response that makes that human’s experience as frictionless as possible?
You can check out a version of the Google Assistant/Host conversation here: