ARXIV:2603.22078 · ROBOTICS · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Do World Action Models Generalize Better than VLAs? A Robustness Study

Zhanguang Zhang · Zhiyuan Li · Behnam Rahmati · Rui Heng Yang · Yintao Ma · Amir Rasouli · +7 at arXiv

This research compares world action models and vision-language-action models for robot control, demonstrating superior robustness of world action models in challenging scenarios and providing insights for future development.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain This research compares world action models and vision-language-action models for robot control, demonstrating superior robustness of world action models in challenging scenarios and providing insights for future development.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting how it will evolve in response to actions. Vision-language-action (VLA), which repurpose large-scale vision-language models for robot action generation using action experts, have achieved notable success across a variety of robotic tasks. Nevertheless, their performance remains constrained by the scope of their training data, exhibiting limited generalization to unseen scenarios and vulnerability to diverse contextual perturbations. More recently, world models have been revisited as an alternative to VLAs. These models, referred to as world action models (WAMs), are built upon world models that are trained on large corpora of video data to predict future states. With minor adaptations, their latent representation can be decoded into robot actions. It has been suggested that their explicit dynamic prediction capacity, combined with spatiotemporal priors acquired from web-scale video pretraining, enables WAMs to generalize more effectively than VLAs. In this paper, we conduct a comparative study of prominent state-of-the-art VLA policies and recently released WAMs. We evaluate their performance on the LIBERO-Plus and RoboTwin 2.0-Plus benchmarks under various visual and language perturbations. Our results show that WAMs achieve strong robustness, with LingBot-VA reaching 74.2% success rate on RoboTwin 2.0-Plus and Cosmos-Policy achieving 82.2% on LIBERO-Plus. While VLAs such as $π_{0.5}$ can achieve comparable robustness on certain tasks, they typically require extensive training with diverse robotic datasets and varied learning objectives. Hybrid approaches that partially incorporate video-based dynamic learning exhibit intermediate robustness, highlighting the importance of how video priors are integrated.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. It has been suggested that their explicit dynamic prediction capacity, combined with spatiotemporal priors acquired from web-scale video pretraining, enables WAMs to generalize more…

WHY NOW

Robotics moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainThis research compares world action models and vision-language-action models for robot control, demonstrating superior robustness of world action models in challenging scenarios and providing insights for future development.

Evidence0 refs | 0 sources | 17% coverage

Blockerno shell-level blocker reported

Analysis summary

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Do World Action Models Generalize Better than VLAs? A Robustness Study

Zhanguang Zhang · Zhiyuan Li · Behnam Rahmati · Rui Heng Yang · Yintao Ma · Amir Rasouli · +7 at arXiv

Competitive landscape

Segment

Robotics

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "5d359419-0603-415f-85d5-dfb20c6568f6", "arxiv_id": "2603.22078", "canonical_route": "/paper/do-world-action-models-generalize-better-than-vlas-a-robustness-study", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "do-world-action-models-generalize-better-than-vlas-a-robustness-study", "endpoints": { "paper_pack": "/api/v1/paper/do-world-action-models-generalize-better-than-vlas-a-robustness-study/paper-pack", "build_passport": "/api/v1/paper/do-world-action-models-generalize-better-than-vlas-a-robustness-study/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Do World Action Models Generalize Better than VLAs? A Robustness Study", "normalized_query": "2603.22078", "route": "/paper/do-world-action-models-generalize-better-than-vlas-a-robustness-study", "paper_ref": "do-world-action-models-generalize-better-than-vlas-a-robustness-study", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/do-world-action-models-generalize-better-than-vlas-a-robustness-study#webpage", "url": "https://sciencetostartup.com/paper/do-world-action-models-generalize-better-than-vlas-a-robustness-study", "name": "Do World Action Models Generalize Better than VLAs? A Robustness Study", "description": "This research compares world action models and vision-language-action models for robot control, demonstrating superior robustness of world action models in challenging scenarios and providing insights for future development.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/do-world-action-models-generalize-better-than-vlas-a-robustness-study#scholarlyArticle", "headline": "Do World Action Models Generalize Better than VLAs? A Robustness Study", "description": "This research compares world action models and vision-language-action models for robot control, demonstrating superior robustness of world action models in challenging scenarios and providing insights for future development.", "url": "https://sciencetostartup.com/paper/do-world-action-models-generalize-better-than-vlas-a-robustness-study", "sameAs": "https://arxiv.org/abs/2603.22078", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.22078" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-23T15:13:15.000Z", "author": [ { "@type": "Person", "name": "Zhanguang Zhang" }, { "@type": "Person", "name": "Zhiyuan Li" }, { "@type": "Person", "name": "Behnam Rahmati" }, { "@type": "Person", "name": "Rui Heng Yang" }, { "@type": "Person", "name": "Yintao Ma" }, { "@type": "Person", "name": "Amir Rasouli" }, { "@type": "Person", "name": "Sajjad Pakdamansavoji" }, { "@type": "Person", "name": "Yangzheng Wu" }, { "@type": "Person", "name": "Lingfeng Zhang" }, { "@type": "Person", "name": "Tongtong Cao" }, { "@type": "Person", "name": "Feng Wen" }, { "@type": "Person", "name": "Xingyue Quan" }, { "@type": "Person", "name": "Yingxue Zhang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Robotics" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Robotics", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Do World Action Models Generalize Better than VLAs? A Robust", "item": "https://sciencetostartup.com/paper/do-world-action-models-generalize-better-than-vlas-a-robustness-study" } ] } ] }

Competitive landscape

Segment

Robotics

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Do World Action Models Generalize Better than VLAs? A Robustness Study

Do World Action Models Generalize Better than VLAs? A Robustness Study

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline