ARXIV:2604.02265 · TEXT-TO-IMAGE SAFETY · SUBMITTED 03 APR · 20:50 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Modular Energy Steering for Safe Text-to-Image Generation with Foundation Models

Yaoteng Tan · Zikui Cai · M. Salman Asif · arXiv

A modular, training-free framework that uses existing foundation models to steer text-to-image generation towards safe outputs without sacrificing quality.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A modular, training-free framework that uses existing foundation models to steer text-to-image generation towards safe outputs without sacrificing quality.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A modular, training-free framework that uses existing foundation models to steer text-to-image generation towards safe outputs without sacrificing quality. Existing safety approaches typically rely on model fine-tuning or curated datasets, which can degrade generation…

METHOD

Full abstract

Controlling the behavior of text-to-image generative models is critical for safe and practical deployment. Existing safety approaches typically rely on model fine-tuning or curated datasets, which can degrade generation quality or limit scalability. We propose an inference-time steering framework that leverages gradient feedback from frozen pretrained foundation models to guide the generation process without modifying the underlying generator. Our key observation is that vision-language foundation models encode rich semantic representations that can be repurposed as off-the-shelf supervisory signals during generation. By injecting such feedback through clean latent estimates at each sampling step, our method formulates safety steering as an energy-based sampling problem. This design enables modular, training-free safety control that is compatible with both diffusion and flow-matching models and can generalize across diverse visual concepts. Experiments demonstrate state-of-the-art robustness against NSFW red-teaming benchmarks and effective multi-target steering, while preserving high generation quality on benign non-targeted prompts. Our framework provides a principled approach for utilizing foundation models as semantic energy estimators, enabling reliable and scalable safety control for text-to-image generation.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. This design enables modular, training-free safety control that is compatible with both diffusion and flow-matching models and can generalize across diverse visual concepts. Code…

WHY NOW

Text-to-Image Safety moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA modular, training-free framework that uses existing foundation models to steer text-to-image generation towards safe outputs without sacrificing quality.

Evidence0 refs | 0 sources | 33% coverage

Blockerno shell-level blocker reported

Analysis summary

A modular, training-free framework that uses existing foundation models to steer text-to-image generation towards safe outputs without sacrificing quality.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A modular, training-free framework that uses existing foundation models to steer text-to-image generation towards safe outputs without sacrificing quality.

Segment

Text-to-Image Safety

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "e07ece3b-e842-43c9-a23d-937673f2178a", "arxiv_id": "2604.02265", "canonical_route": "/paper/modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models", "endpoints": { "paper_pack": "/api/v1/paper/modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models/paper-pack", "build_passport": "/api/v1/paper/modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Modular Energy Steering for Safe Text-to-Image Generation with Foundation Models", "normalized_query": "2604.02265", "route": "/paper/modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models", "paper_ref": "modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models#webpage", "url": "https://sciencetostartup.com/paper/modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models", "name": "Modular Energy Steering for Safe Text-to-Image Generation with Foundation Models", "description": "A modular, training-free framework that uses existing foundation models to steer text-to-image generation towards safe outputs without sacrificing quality.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models#scholarlyArticle", "headline": "Modular Energy Steering for Safe Text-to-Image Generation with Foundation Models", "description": "A modular, training-free framework that uses existing foundation models to steer text-to-image generation towards safe outputs without sacrificing quality.", "url": "https://sciencetostartup.com/paper/modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models", "sameAs": "https://arxiv.org/abs/2604.02265", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.02265" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-02T16:59:28.000Z", "author": [ { "@type": "Person", "name": "Yaoteng Tan" }, { "@type": "Person", "name": "Zikui Cai" }, { "@type": "Person", "name": "M. Salman Asif" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Text-to-Image Safety" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Text-to-Image Safety", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Modular Energy Steering for Safe Text-to-Image Generation wi", "item": "https://sciencetostartup.com/paper/modular-energy-steering-for-safe-text-to-image-generation-with-foundation-models" } ] } ] }

Competitive landscape

A modular, training-free framework that uses existing foundation models to steer text-to-image generation towards safe outputs without sacrificing quality.

Segment

Text-to-Image Safety

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Modular Energy Steering for Safe Text-to-Image Generation with Foundation Models

Modular Energy Steering for Safe Text-to-Image Generation with Foundation Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline