InteractAvatar is a dual-stream framework for generating full-body talking avatars capable of grounded human-object interaction (GHOI). It decouples perception and planning from video synthesis, utilizing specialized modules to create text-aligned motions and vivid avatar videos, thereby addressing the control-quality dilemma in GHOI.
InteractAvatar is a new AI system that creates realistic digital avatars capable of interacting with objects in their environment based on text commands. It solves the problem of making avatars perform complex actions by separating how they understand their surroundings and plan movements from how their video is generated, leading to higher quality and more controlled interactions.
Was this definition helpful?