One of the essential assets of human beings is our ability to coordinate action and collaborate in shared tasks. This project studies vocal practices for achieving embodied coordination in real time, with a focus on non-lexical but nevertheless communicative vocalizations, such as ugh, aargh. Targeting the liminal boundary between body and language, individual voice and intersubjectivity, a dialogical theory of language and mind will be empirically contextualized in the temporal organization of coordinated multimodal action. The project examines four physical activities: dancing, improvisational art, sports coaching and manual labor (construction) where vocalizations are likely to occur, in order to lay the empirical foundations for a multimodal theory of action. It problematizes the traditional boundaries of linguistics where uses of voice such as grunts and groans are not treated as part of the lexicon. As a complement to seminal interactional studies on action in highly intellectual professional settings, such as surgery, the flight control room, and the architectural studio, the current project targets areas of human activity where embodied activities are profiled and simple work tasks carried out. By starting from some of the most down-to-earth human activities and analyzing these with the cutting-edge methods of multimodal interaction analysis, the project aims to disclose the basic temporal organization of action through the closely coordinated deployment of linguistic-vocal and embodied semiotic means (Goodwin 2000; Keevallik 2013). In contrast to the majority of interaction research that focuses on the sequential formation of action (Levinson 2013), the current project targets simultaneity and the continuous unfolding and mutual recalibration of interpretable action.