Current research in robotics AI is increasingly focused on enhancing the adaptability and efficiency of robotic systems in complex environments. Recent advancements highlight the integration of multimodal learning, combining visual and tactile data to improve manipulation capabilities in occluded scenarios. Techniques like the Cosmos Policy streamline the adaptation of pretrained video models for real-time action generation, significantly improving performance in both simulated and real-world tasks. Furthermore, frameworks such as BayesianVLA address the challenge of generalizing language instructions in robot manipulation, mitigating biases that lead to ineffective learning. Collaborative approaches, exemplified by COHORT, optimize resource allocation among multiple robots, crucial for mission-critical applications where bandwidth and battery life are limited. The field is also exploring the use of world models for reinforcement learning, which shows promise in bridging the gap between simulated training and real-world application, indicating a shift towards more robust, generalizable robotic systems capable of handling diverse tasks autonomously.
Humanoid robot manipulation is a crucial research area for executing diverse human-level tasks, involving high-level semantic reasoning and low-level action generation. However, precise scene understa...
Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks ...
Tactile information plays a crucial role in human manipulation tasks and has recently garnered increasing attention in robotic manipulation. However, existing approaches mostly focus on the alignment ...
Vision-Language-Action (VLA) models have shown promise in robot manipulation but often struggle to generalize to new instructions or complex multi-task scenarios. We identify a critical pathology in c...
Recent video generation models demonstrate remarkable ability to capture complex physical interactions and scene evolution over time. To leverage their spatiotemporal priors, robotics works have adapt...
Vision-language-action (VLA) models enable robots to follow natural-language instructions grounded in visual observations, but the instruction channel also introduces a critical vulnerability: small t...
Foundation vision-language models are becoming increasingly relevant to robotics because they can provide richer semantic perception than narrow task-specific pipelines. However, their practical adopt...
Robot learning from interacting with the physical world is fundamentally bottlenecked by the cost of physical interaction. The two alternatives, supervised finetuning (SFT) from expert demonstrations ...
Determining the occupancy status of locations in the environment is a fundamental task for safety-critical robotic applications. Traditional occupancy grid mapping methods subdivide the environment in...
Robotic manipulation often requires memory: occlusion and state changes can make decision-time observations perceptually aliased, making action selection non-Markovian at the observation level because...