Real-time Translation of Upper-body Gestures to Virtual Avatars in Dissimilar Telepresence Environments

Abstract

User egocentric views (top) and room perspectives (bottom) of our MR telepresence spaces A (left) and B (right). In space B, avatar X′ represents user X from space A. In space A, avatar Y′ represents user Y from space B. Virtual avatars and objects are augmented at different locations and scales in each space. Both users focus their eye gaze on the avatar of the other party. They point at a particular object (Jupiter) with one hand (user X using the left hand and user Y using the right hand) while using the other hand to perform explanatory gestures. Our system accurately translates these upper-body gestures into the respective avatar motions in the remote space in realtime, enabling effective bi-directional interaction between users in remote locations.

In mixed reality (MR) avatar-mediated telepresence, avatar movement must be adjusted to convey the user's intent in a dissimilar space. This paper presents a novel neural network-based framework designed for translating upper-body gestures, which adjusts virtual avatar movements in dissimilar environments to accurately reflect the user's intended gestures in real-time. Our framework translates a wide range of upper-body gestures, including eye gaze, deictic gestures, free-form gestures, and the transitions between them. A key feature of our framework is its ability to generate natural upper-body gestures for users of different sizes, irrespective of handedness and eye dominance, even though the training is based on data from a single person. Unlike previous methods that require paired motion between users and avatars for training, our framework uses an unpaired approach, significantly reducing training time and allowing for generating a wider variety of motion types. These advantages were made possible by designing two separate networks: the Motion Progression Network, which interprets sparse tracking signals from the user to determine motion progression, and the Upper-body Gesture Network, which autoregressively generates the avatar's pose based on these progressions. We demonstrate the effectiveness of our framework through quantitative comparisons with state-of-the-art methods, qualitative animation results, and a user evaluation in MR telepresence scenarios.

BibTeX

@article{kang2025real, title={Real-time Translation of Upper-body Gestures to Virtual Avatars in Dissimilar Telepresence Environments}, author={Kang, Jiho and Kim, Taehei and Kim, Hyeshim and Lee, Sung-Hee}, journal={IEEE Transactions on Visualization and Computer Graphics}, year={2025}, publisher={IEEE} }

Real-time Translation of Upper-body Gestures to Virtual Avatars in Dissimilar Telepresence Environments

Abstract

Presentation Video

Supplemental Video

User Evaluation Video

BibTeX