ARXIV:2605.15618 · VIDEO FOUNDATION MODELS · SUBMITTED 18 MAY · 20:32 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Latent Video Prediction Learns Better World Models

Ali J Alrasheed · Aryan Yazdan Parast · Basim Azam · James Bailey · Naveed Akhtar · arXiv

This research systematically evaluates latent-prediction video models across robustness axes, showing their advantages over pixel-reconstruction models for world modeling.

Ship in 2-4 weeks›Score4.0Evidence unverified

Opportunity summary

Pain This research systematically evaluates latent-prediction video models across robustness axes, showing their advantages over pixel-reconstruction models for world modeling.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

This research systematically evaluates latent-prediction video models across robustness axes, showing their advantages over pixel-reconstruction models for world modeling. This leaves a major gap in comprehending their potential as world models.

METHOD

Full abstract

Self-supervised video models are increasingly framed as world models, yet their evaluation remains largely confined to a single top-1 accuracy score on clean benchmarks. This leaves a major gap in comprehending their potential as world models. We present the first systematic study addressing this gap, analyzing four matched-capacity frontier video foundation models, V-JEPA 2.1, V-JEPA 2, VideoPrism, and VideoMAEv2, across five robustness axes relevant to their deployment as video world models: feature discriminability, corruption robustness, fine-grained discrimination, occlusion robustness, and sensitivity to temporal direction. Our evaluations establish that across all five axes, latent-prediction models form a distinct and consistent profile. They degrade more gracefully under pixel corruption, preserve usable class structure rather than mere geometric stability under occlusion, capture fine-grained physical contact cues without reconstructing pixels, and uniquely encode the arrow of time. These advantages can even survive task adaptation: a frozen V-JEPA 2 backbone with a lightweight attentive probe outperforms a fully fine-tuned VideoMAE and a supervised TimeSformer on corruption and occlusion robustness. Our extensive results offer concrete new evidence in favor of latent prediction for robust world modeling.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. Our extensive results offer concrete new evidence in favor of latent prediction for robust world modeling. Code availability is flagged in the production record;…

WHY NOW

Video Foundation Models moved forward this cycle; last verified May 2026. Public score 4.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainThis research systematically evaluates latent-prediction video models across robustness axes, showing their advantages over pixel-reconstruction models for world modeling.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

This research systematically evaluates latent-prediction video models across robustness axes, showing their advantages over pixel-reconstruction models for world modeling.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

This research systematically evaluates latent-prediction video models across robustness axes, showing their advantages over pixel-reconstruction models for world modeling.

Segment

Video Foundation Models

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "aaf95e31-0441-40cb-9eee-6a522985554d", "arxiv_id": "2605.15618", "canonical_route": "/paper/latent-video-prediction-learns-better-world-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "latent-video-prediction-learns-better-world-models", "endpoints": { "paper_pack": "/api/v1/paper/latent-video-prediction-learns-better-world-models/paper-pack", "build_passport": "/api/v1/paper/latent-video-prediction-learns-better-world-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Latent Video Prediction Learns Better World Models", "normalized_query": "2605.15618", "route": "/paper/latent-video-prediction-learns-better-world-models", "paper_ref": "latent-video-prediction-learns-better-world-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/latent-video-prediction-learns-better-world-models#webpage", "url": "https://sciencetostartup.com/paper/latent-video-prediction-learns-better-world-models", "name": "Latent Video Prediction Learns Better World Models", "description": "This research systematically evaluates latent-prediction video models across robustness axes, showing their advantages over pixel-reconstruction models for world modeling.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/latent-video-prediction-learns-better-world-models#scholarlyArticle", "headline": "Latent Video Prediction Learns Better World Models", "description": "This research systematically evaluates latent-prediction video models across robustness axes, showing their advantages over pixel-reconstruction models for world modeling.", "url": "https://sciencetostartup.com/paper/latent-video-prediction-learns-better-world-models", "sameAs": "https://arxiv.org/abs/2605.15618", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.15618" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-15T04:59:30.000Z", "author": [ { "@type": "Person", "name": "Ali J Alrasheed" }, { "@type": "Person", "name": "Aryan Yazdan Parast" }, { "@type": "Person", "name": "Basim Azam" }, { "@type": "Person", "name": "James Bailey" }, { "@type": "Person", "name": "Naveed Akhtar" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Video Foundation Models" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Video Foundation Models", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Latent Video Prediction Learns Better World Models", "item": "https://sciencetostartup.com/paper/latent-video-prediction-learns-better-world-models" } ] } ] }

Competitive landscape

This research systematically evaluates latent-prediction video models across robustness axes, showing their advantages over pixel-reconstruction models for world modeling.

Segment

Video Foundation Models

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Latent Video Prediction Learns Better World Models

Latent Video Prediction Learns Better World Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline