ARXIV:2603.26599 · GENERATIVE VIDEO · SUBMITTED 30 MAR · 22:19 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Zhaochong An · Orest Kupyn · Théo Uscidda · Andrea Colaco · Karan Ahuja · Serge Belongie · +2 at arXiv

A latent geometry-guided framework for post-training video diffusion models to achieve world-consistent generation with improved camera stability and geometric coherence.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A latent geometry-guided framework for post-training video diffusion models to achieve world-consistent generation with improved camera stability and geometric coherence.

Evidence 6 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A latent geometry-guided framework for post-training video diffusion models to achieve world-consistent generation with improved camera stability and geometric coherence. Prior approaches improve consistency either by augmenting the generator with additional modules or applying…

METHOD

Full abstract

Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compromise the generalization of internet-scale pretrained models, while existing alignment methods are limited to static scenes and rely on RGB-space rewards that require repeated VAE decoding, incurring substantial compute overhead and failing to generalize to highly dynamic real-world scenes. To preserve the pretrained capacity while improving geometric consistency, we propose VGGRPO (Visual Geometry GRPO), a latent geometry-guided framework for geometry-aware video post-training. VGGRPO introduces a Latent Geometry Model (LGM) that stitches video diffusion latents to geometry foundation models, enabling direct decoding of scene geometry from the latent space. By constructing LGM from a geometry model with 4D reconstruction capability, VGGRPO naturally extends to dynamic scenes, overcoming the static-scene limitations of prior methods. Building on this, we perform latent-space Group Relative Policy Optimization with two complementary rewards: a camera motion smoothness reward that penalizes jittery trajectories, and a geometry reprojection consistency reward that enforces cross-view geometric coherence. Experiments on both static and dynamic benchmarks show that VGGRPO improves camera stability, geometry consistency, and overall quality while eliminating costly VAE decoding, making latent-space geometry-guided reinforcement an efficient and flexible approach to world-consistent video generation.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Code availability is flagged in the production record; the…

WHY NOW

Generative Video moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA latent geometry-guided framework for post-training video diffusion models to achieve world-consistent generation with improved camera stability and geometric coherence.

Evidence6 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A latent geometry-guided framework for post-training video diffusion models to achieve world-consistent generation with improved camera stability and geometric coherence.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A latent geometry-guided framework for post-training video diffusion models to achieve world-consistent generation with improved camera stability and geometric coherence.

Segment

Generative Video

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "1edeefbc-1bb2-4eba-bf57-249d76c40e30", "arxiv_id": "2603.26599", "canonical_route": "/paper/vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward", "endpoints": { "paper_pack": "/api/v1/paper/vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward/paper-pack", "build_passport": "/api/v1/paper/vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward", "normalized_query": "2603.26599", "route": "/paper/vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward", "paper_ref": "vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward#webpage", "url": "https://sciencetostartup.com/paper/vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward", "name": "VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward", "description": "A latent geometry-guided framework for post-training video diffusion models to achieve world-consistent generation with improved camera stability and geometric coherence.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward#scholarlyArticle", "headline": "VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward", "description": "A latent geometry-guided framework for post-training video diffusion models to achieve world-consistent generation with improved camera stability and geometric coherence.", "url": "https://sciencetostartup.com/paper/vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward", "sameAs": "https://arxiv.org/abs/2603.26599", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26599" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T16:57:51.000Z", "author": [ { "@type": "Person", "name": "Zhaochong An" }, { "@type": "Person", "name": "Orest Kupyn" }, { "@type": "Person", "name": "Théo Uscidda" }, { "@type": "Person", "name": "Andrea Colaco" }, { "@type": "Person", "name": "Karan Ahuja" }, { "@type": "Person", "name": "Serge Belongie" }, { "@type": "Person", "name": "Mar Gonzalez-Franco" }, { "@type": "Person", "name": "Marta Tintore Gazulla" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Generative Video" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Generative Video", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "VGGRPO: Towards World-Consistent Video Generation with 4D La", "item": "https://sciencetostartup.com/paper/vggrpo-towards-world-consistent-video-generation-with-4d-latent-reward" } ] } ] }

Competitive landscape

A latent geometry-guided framework for post-training video diffusion models to achieve world-consistent generation with improved camera stability and geometric coherence.

Segment

Generative Video

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline