Vision-Language-Action

Proof pending

10papers

5.9viability

Proof pending

Proof pending. Core topic summary fields are still materializing.

State of the Field

Vision-Language-Action (VLA) models are advancing robotic manipulation by integrating visual observations and language instructions to generate actions. Recent innovations focus on improving inference speed and accuracy, addressing challenges such as high latency and the need for complex reasoning in multi-step tasks. Techniques like DepthCache and DualCoT-VLA enhance efficiency by optimizing visual token processing and incorporating parallel reasoning mechanisms. Additionally, frameworks like AR-VLA and ReMem-VLA introduce memory-aware strategies to improve context retention and action consistency, crucial for real-world applications. These developments are vital for builders aiming to deploy responsive and capable robotic systems in dynamic environments, as they enhance both the speed and reliability of task execution.

Last updated May 28, 2026

Vision-Language-Action

Proof pending

State of the Field

Top Questions

Topic trend

Papers

DepthCache: Depth-Guided Training-Free Visual Token Merging for Vision-Language-Action Model Inference

DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models

AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models

ReMem-VLA: Empowering Vision-Language-Action Model with Memory via Dual-Level Recurrent Queries

StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

KineVLA: Towards Kinematics-Aware Vision-Language-Action Models with Bi-Level Action Decomposition

HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation

LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models

Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models

Filters

Topic proof surfaces

Vision-Language-Action

Use this topic page as a durable research-area proof surface