InteractAvatar

Gold definitionUpdated Apr 2, 2026

Definition

InteractAvatar is a dual-stream framework for generating full-body talking avatars capable of grounded human-object interaction (GHOI). It decouples perception and planning from video synthesis, utilizing specialized modules to create text-aligned motions and vivid avatar videos, thereby addressing the control-quality dilemma in GHOI.

At a glance

Executive summary

InteractAvatar is a new AI system that creates realistic digital avatars capable of interacting with objects in their environment based on text commands. It solves the problem of making avatars perform complex actions by separating how they understand their surroundings and plan movements from how their video is generated, leading to higher quality and more controlled interactions.

TL;DR

InteractAvatar is a system that generates lifelike talking avatars that can interact with objects based on text, by splitting the task into understanding the environment, planning actions, and then creating the video.

Key points

Dual-stream framework decoupling perception/planning from video synthesis, using PIM for motion and AIM for video, with a motion-to-video aligner for co-generation.
Solves the open challenge of generating grounded human-object interaction (GHOI) for talking avatars, overcoming environmental perception and control-quality dilemmas.
Used by researchers and engineers in video generation, virtual reality, gaming, and advanced human-computer interaction seeking realistic, interactive digital humans.
Unlike existing methods that struggle with GHOI, InteractAvatar specifically tackles this by decoupling tasks and enabling text-aligned interactions, offering a more robust solution.
Focus on enhancing avatar realism and interactivity, particularly in complex scenarios involving environmental awareness and object manipulation, moving beyond simple motion generation.

Use cases

Virtual Reality and Gaming: Creating highly interactive NPCs (Non-Player Characters) that can realistically pick up, use, and respond to objects in virtual worlds.
Digital Assistants and Customer Service: Developing advanced AI assistants that can visually demonstrate product features or perform tasks in a virtual environment, enhancing user engagement.
Telepresence and Remote Collaboration: Enabling more immersive virtual meetings where participants' avatars can interact with shared digital objects and whiteboards naturally.
Content Creation and Animation: Automating the generation of complex character animations involving object interaction for films, advertisements, or educational content, reducing manual effort.
Robotics Simulation and Training: Generating realistic human-robot interaction scenarios in simulated environments for training and testing autonomous systems.