ECCA: Pelikan: Coordinating Actions with a Mobile Embodied AI

Pelikan will present a poster at ECCA2020 on ‘Coordinating actions with a mobile embodied AI’.


This paper explores human coordination with an embodied and mobile artificial intelligence (AI) in the form of a toy robot. Focus lies on a face recognition activity that requires humans to remain oriented to the robot for several seconds, while it scans and analyses (i.e. “learns”) the human’s facial features. When everything is going according to the programmer’s plan (Suchman, 1987), the face learning activity takes less than ten seconds. However, during the first attempts in a family home, it often takes the embodied AI several minutes to learn a new face. As becomes evident in the data, family members often gather in front of the robot, watching it curiously. For successful mapping of a face to a name, only one person should be in the robot’s camera frame. If several people position themselves in front of the robot at the same time, the wrong person may be matched with the entered name. In contrast to taking someone’s picture with a conventional camera, users cannot see what the embodied AI is “seeing”. They have to learn how to position themselves correctly through interaction with the robot. This gets further complicated by the fact that the robot is mobile, following previously detected human faces and turning back to its original position whenever it is pushed into a new direction. Ultimately, humans find their own ways to coordinate their bodies with that of the embodied AI, for instance by ducking under a table or covering their faces with their hands.

Data for this paper come from a corpus of video recordings of four Swedish families, whom I videotaped in their homes. They are interacting with the small toy robot Cozmo, which is inspired by Pixar’s Wall-E and Eve. Cozmo does not talk but communicates through sounds, animated eyes and body movements. It makes sense of its surroundings with the help of several sensors, among others a video camera. The robot is controlled through a smartphone app, which has been installed on one of the family members’ phones. The app allows participants to operate the robot in different modes such as playing games or letting it roam freely. One option is to teach it a person’s name and face, which will result in the robot saying their name whenever it recognizes them. Exploring how humans draw on non-lexical means for coordinating their different bodies (Keevallik, 2020), the paper scrutinizes how humans draw on Cozmo’s sounds and movements to make sense of and to coordinate with the embodied AI.

Suchman, L. A. (1987). Plans and situated actions: The problem of human-machine communication. Cambridge: Cambridge University Press.

Keevallik, L. (2020). Grammatical coordination of embodied action: The Estonian ja ‘and’ as a temporal coordinator of Pilates moves. In Y. Maschler, S. Pekarek Doehler, J. Lindström, & L. Keevallik (eds), Emergent syntax for conversation: Clausal patterns and the organization of action (pp.221-244). Amsterdam: John Benjamins.