Embodied AI

Proof pending

33papers

6.6viability

Proof pending

Proof pending. Core topic summary fields are still materializing.

State of the Field

Embodied AI is advancing rapidly, focusing on enhancing agents' capabilities to interact with and navigate their environments effectively. Current research emphasizes frameworks like Seed2Scale and Fast-WAM, which improve data generation and planning efficiency, respectively. These innovations address critical challenges such as data bottlenecks and real-time decision-making, making them essential for builders aiming to develop robust AI systems. Additionally, platforms like TeachAnything facilitate diverse training data collection, supporting the evolution of agents that can adapt to complex tasks in dynamic settings. As these technologies mature, they pave the way for more capable and intelligent embodied systems that can operate seamlessly in real-world scenarios.

Last updated May 26, 2026

Topic-linked question coverage is still building for this proof surface.

Topic trend

Topic-specific paper and score movement from the daily diff ledger.

Papers

1-10 of 33

Research Paper·Mar 17, 2026

Fast-WAM: Do World Action Models Need Test-time Future Imagination?

World Action Models (WAMs) have emerged as a promising alternative to Vision-Language-Action (VLA) models for embodied control because they explicitly model how visual observations may evolve under ac...

8.0 viability

Research Paper·Mar 31, 2026

Benchmarking Interaction, Beyond Policy: a Reproducible Benchmark for Collaborative Instance Object Navigation

We propose Question-Asking Navigation (QAsk-Nav), the first reproducible benchmark for Collaborative Instance Object Navigation (CoIN) that enables an explicit, separate assessment of embodied navigat...

8.0 viability

Research Paper·May 13, 2026

What Limits Vision-and-Language Navigation ?

Vision-and-Language Navigation (VLN) is a cornerstone of embodied intelligence. However, current agents often suffer from significant performance degradation when transitioning from simulation to real...

8.0 viability

Research Paper·May 28, 2026

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks,...

8.0 viability

Research Paper·Apr 27, 2026

AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents

Embodied AI agents increasingly rely on large language models (LLMs) for planning, yet per-step LLM calls impose severe latency and cost. In this paper, we show that embodied tasks exhibit strong plan...

8.0 viabilityHas code

Research Paper·Mar 9, 2026

Seed2Scale: A Self-Evolving Data Engine for Embodied AI via Small to Large Model Synergy and Multimodal Evaluation

Existing data generation methods suffer from exploration limits, embodiment gaps, and low signal-to-noise ratios, leading to performance degradation during self-iteration. To address these challenges,...

8.0 viability

Research Paper·Mar 12, 2026

SVLL: Staged Vision-Language Learning for Physically Grounded Embodied Task Planning

Embodied task planning demands vision-language models to generate action sequences that are both visually grounded and causally coherent over time. However, existing training paradigms face a critical...

8.0 viability

Research Paper·Apr 3, 2026

Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

Vision-Language-Action (VLA) models, as large foundation models for embodied control, have shown strong performance in manipulation tasks. However, their performance comes at high inference cost. To i...

7.0 viabilityHas code

Research Paper·Mar 26, 2026

VideoWeaver: Multimodal Multi-View Video-to-Video Transfer for Embodied Agents

Recent progress in video-to-video (V2V) translation has enabled realistic resimulation of embodied AI demonstrations, a capability that allows pretrained robot policies to be transferable to new envir...

7.0 viability

Research Paper·Mar 19, 2026

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

Effective embodied exploration requires agents to accumulate and retain spatial knowledge over time. However, existing scene representations, such as discrete scene graphs or static view-based snapsho...

7.0 viability

Page 1 of 4

Embodied AI

Proof pending

State of the Field

Topic trend

Papers

Fast-WAM: Do World Action Models Need Test-time Future Imagination?

Benchmarking Interaction, Beyond Policy: a Reproducible Benchmark for Collaborative Instance Object Navigation

What Limits Vision-and-Language Navigation ?

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents

Seed2Scale: A Self-Evolving Data Engine for Embodied AI via Small to Large Model Synergy and Multimodal Evaluation

SVLL: Staged Vision-Language Learning for Physically Grounded Embodied Task Planning

Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

VideoWeaver: Multimodal Multi-View Video-to-Video Transfer for Embodied Agents

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

Filters

Topic proof surfaces

Embodied AI

Use this topic page as a durable research-area proof surface