ARXIV:2604.01761 · GENERATIVE VIDEO · SUBMITTED 03 APR · 20:50 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion

Edoardo A. Dominici · Thomas Deixelberger · Konstantinos Vardis · Markus Steinberger · arXiv

Control image-to-video diffusion models for tasks like domain transfer and 3D scene generation by conditioning on disentangled appearance features.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain Control image-to-video diffusion models for tasks like domain transfer and 3D scene generation by conditioning on disentangled appearance features.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Control image-to-video diffusion models for tasks like domain transfer and 3D scene generation by conditioning on disentangled appearance features. Many applications in generation and transfer rely on conditioning these models, typically through perceptual, geometric,…

METHOD

Full abstract

Video models have recently been applied with success to problems in content generation, novel view synthesis, and, more broadly, world simulation. Many applications in generation and transfer rely on conditioning these models, typically through perceptual, geometric, or simple semantic signals, fundamentally using them as generative renderers. At the same time, high-dimensional features obtained from large-scale self-supervised learning on images or point clouds are increasingly used as a general-purpose interface for vision models. The connection between the two has been explored for subject specific editing, aligning and training video diffusion models, but not in the role of a more general conditioning signal for pretrained video diffusion models. Features obtained through self-supervised learning like DINO, contain a lot of entangled information about style, lighting and semantics of the scene. This makes them great at reconstruction tasks but limits their generative capabilities. In this paper, we show how we can use the features for tasks such as video domain transfer and video-from-3D generation. We introduce a lightweight architecture and training strategy that decouples appearance from other features that we wish to preserve, enabling robust control for appearance changes such as stylization and relighting. Furthermore, we show that low spatial resolution can be compensated by higher feature dimensionality, improving controllability in generative rendering from explicit spatial representations.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. In this paper, we show how we can use the features for tasks such as video domain transfer and video-from-3D generation. Code availability is…

WHY NOW

Generative Video moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainControl image-to-video diffusion models for tasks like domain transfer and 3D scene generation by conditioning on disentangled appearance features.

Evidence0 refs | 0 sources | 33% coverage

Blockerno shell-level blocker reported

Analysis summary

Control image-to-video diffusion models for tasks like domain transfer and 3D scene generation by conditioning on disentangled appearance features.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Control image-to-video diffusion models for tasks like domain transfer and 3D scene generation by conditioning on disentangled appearance features.

Segment

Generative Video

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "ea048520-c0ca-4e35-b963-1b718c02f00b", "arxiv_id": "2604.01761", "canonical_route": "/paper/control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion", "endpoints": { "paper_pack": "/api/v1/paper/control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion/paper-pack", "build_passport": "/api/v1/paper/control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion", "normalized_query": "2604.01761", "route": "/paper/control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion", "paper_ref": "control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion#webpage", "url": "https://sciencetostartup.com/paper/control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion", "name": "Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion", "description": "Control image-to-video diffusion models for tasks like domain transfer and 3D scene generation by conditioning on disentangled appearance features.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion#scholarlyArticle", "headline": "Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion", "description": "Control image-to-video diffusion models for tasks like domain transfer and 3D scene generation by conditioning on disentangled appearance features.", "url": "https://sciencetostartup.com/paper/control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion", "sameAs": "https://arxiv.org/abs/2604.01761", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.01761" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-02T08:27:48.000Z", "author": [ { "@type": "Person", "name": "Edoardo A. Dominici" }, { "@type": "Person", "name": "Thomas Deixelberger" }, { "@type": "Person", "name": "Konstantinos Vardis" }, { "@type": "Person", "name": "Markus Steinberger" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Generative Video" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Generative Video", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Control-DINO: Feature Space Conditioning for Controllable Im", "item": "https://sciencetostartup.com/paper/control-dino-feature-space-conditioning-for-controllable-image-to-video-diffusion" } ] } ] }

Competitive landscape

Control image-to-video diffusion models for tasks like domain transfer and 3D scene generation by conditioning on disentangled appearance features.

Segment

Generative Video

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion

Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline