ARXIV:2604.06916 · REINFORCEMENT LEARNING · SUBMITTED 10 APR · 02:46 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

Yitong Li · Junsong Chen · Shuchen Xue · Pengcuo Zeren · Siyuan Fu · Dinghao Yang · +5 at arXiv

A novel framework that enhances reinforcement learning for diffusion models through efficient rollout scaling.

Ship in 2-4 weeks›Score6.0Evidence unverified

Opportunity summary

Pain A novel framework that enhances reinforcement learning for diffusion models through efficient rollout scaling.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A novel framework that enhances reinforcement learning for diffusion models through efficient rollout scaling. In recent studies, increasing the rollout group size yields pronounced performance improvements, indicating substantial room for further alignment gains.

METHOD

Full abstract

Reinforcement-Learning-based post-training has recently emerged as a promising paradigm for aligning text-to-image diffusion models with human preferences. In recent studies, increasing the rollout group size yields pronounced performance improvements, indicating substantial room for further alignment gains. However, scaling rollouts on large-scale foundational diffusion models (e.g., FLUX.1-12B) imposes a heavy computational burden. To alleviate this bottleneck, we explore the integration of FP4 quantization into Diffusion RL rollouts. Yet, we identify that naive quantized pipelines inherently introduce risks of performance degradation. To overcome this dilemma between efficiency and training integrity, we propose Sol-RL (Speed-of-light RL), a novel FP4-empowered Two-stage Reinforcement Learning framework. First, we utilize high-throughput NVFP4 rollouts to generate a massive candidate pool and extract a highly contrastive subset. Second, we regenerate these selected samples in BF16 precision and optimize the policy exclusively on them. By decoupling candidate exploration from policy optimization, Sol-RL integrates the algorithmic mechanisms of rollout scaling with the system-level throughput gains of NVFP4. This synergistic algorithm-hardware design effectively accelerates the rollout phase while reserving high-fidelity samples for optimization. We empirically demonstrate that our framework maintains the training integrity of BF16 precision pipeline while fully exploiting the throughput gains enabled by FP4 arithmetic. Extensive experiments across SANA, FLUX.1, and SD3.5-L substantiate that our approach delivers superior alignment performance across multiple metrics while accelerating training convergence by up to $4.64\times$, unlocking the power of massive rollout scaling at a fraction of the cost.

RESULT

ScienceToStartup currently rates this 6.0/10 on the public viability pass. We empirically demonstrate that our framework maintains the training integrity of BF16 precision pipeline while fully exploiting the throughput gains enabled by FP4 arithmetic.…

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 6.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score6.0

PainA novel framework that enhances reinforcement learning for diffusion models through efficient rollout scaling.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A novel framework that enhances reinforcement learning for diffusion models through efficient rollout scaling.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A novel framework that enhances reinforcement learning for diffusion models through efficient rollout scaling.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "780ac0e5-d1a9-441f-8846-37a7b7af5e8a", "arxiv_id": "2604.06916", "canonical_route": "/paper/fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling", "endpoints": { "paper_pack": "/api/v1/paper/fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling/paper-pack", "build_passport": "/api/v1/paper/fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling", "normalized_query": "2604.06916", "route": "/paper/fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling", "paper_ref": "fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling#webpage", "url": "https://sciencetostartup.com/paper/fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling", "name": "FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling", "description": "A novel framework that enhances reinforcement learning for diffusion models through efficient rollout scaling.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling#scholarlyArticle", "headline": "FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling", "description": "A novel framework that enhances reinforcement learning for diffusion models through efficient rollout scaling.", "url": "https://sciencetostartup.com/paper/fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling", "sameAs": "https://arxiv.org/abs/2604.06916", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.06916" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-08T10:14:47.000Z", "author": [ { "@type": "Person", "name": "Yitong Li" }, { "@type": "Person", "name": "Junsong Chen" }, { "@type": "Person", "name": "Shuchen Xue" }, { "@type": "Person", "name": "Pengcuo Zeren" }, { "@type": "Person", "name": "Siyuan Fu" }, { "@type": "Person", "name": "Dinghao Yang" }, { "@type": "Person", "name": "Yangyang Tang" }, { "@type": "Person", "name": "Junjie Bai" }, { "@type": "Person", "name": "Ping Luo" }, { "@type": "Person", "name": "Song Han" }, { "@type": "Person", "name": "Enze Xie" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 6 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reinforcement Learning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "FP4 Explore, BF16 Train: Diffusion Reinforcement Learning vi", "item": "https://sciencetostartup.com/paper/fp4-explore-bf16-train-diffusion-reinforcement-learning-via-efficient-rollout-scaling" } ] } ] }

Competitive landscape

A novel framework that enhances reinforcement learning for diffusion models through efficient rollout scaling.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline