ARXIV:2604.06491 · REINFORCEMENT LEARNING · SUBMITTED 09 APR · 20:10 UTC · FRESHNESS UNKNOWN

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Discrete Flow Matching Policy Optimization

Maojiang Su · Po-Chung Hsieh · Weimin Wu · Mingcheng Lu · Jiunhau Chen · Jerry Yao-Chieh Hu · +1 at arXiv

A unified RL framework for fine-tuning discrete flow matching models, improving controllable discrete sequence generation.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain A unified RL framework for fine-tuning discrete flow matching models, improving controllable discrete sequence generation.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A unified RL framework for fine-tuning discrete flow matching models, improving controllable discrete sequence generation. Our key idea is to view the DFM sampling procedure as a multi-step Markov Decision Process.

METHOD

Full abstract

We introduce Discrete flow Matching policy Optimization (DoMinO), a unified framework for Reinforcement Learning (RL) fine-tuning Discrete Flow Matching (DFM) models under a broad class of policy gradient methods. Our key idea is to view the DFM sampling procedure as a multi-step Markov Decision Process. This perspective provides a simple and transparent reformulation of fine-tuning reward maximization as a robust RL objective. Consequently, it not only preserves the original DFM samplers but also avoids biased auxiliary estimators and likelihood surrogates used by many prior RL fine-tuning methods. To prevent policy collapse, we also introduce new total-variation regularizers to keep the fine-tuned distribution close to the pretrained one. Theoretically, we establish an upper bound on the discretization error of DoMinO and tractable upper bounds for the regularizers. Experimentally, we evaluate DoMinO on regulatory DNA sequence design. DoMinO achieves stronger predicted enhancer activity and better sequence naturalness than the previous best reward-driven baselines. The regularization further improves alignment with the natural sequence distribution while preserving strong functional performance. These results establish DoMinO as an useful framework for controllable discrete sequence generation.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. DoMinO achieves stronger predicted enhancer activity and better sequence naturalness than the previous best reward-driven baselines.

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA unified RL framework for fine-tuning discrete flow matching models, improving controllable discrete sequence generation.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

A unified RL framework for fine-tuning discrete flow matching models, improving controllable discrete sequence generation.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A unified RL framework for fine-tuning discrete flow matching models, improving controllable discrete sequence generation.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "7107532f-c3c9-494f-b0d6-7f97bfc0d1e6", "arxiv_id": "2604.06491", "canonical_route": "/paper/discrete-flow-matching-policy-optimization", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "discrete-flow-matching-policy-optimization", "endpoints": { "paper_pack": "/api/v1/paper/discrete-flow-matching-policy-optimization/paper-pack", "build_passport": "/api/v1/paper/discrete-flow-matching-policy-optimization/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Discrete Flow Matching Policy Optimization", "normalized_query": "2604.06491", "route": "/paper/discrete-flow-matching-policy-optimization", "paper_ref": "discrete-flow-matching-policy-optimization", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/discrete-flow-matching-policy-optimization#webpage", "url": "https://sciencetostartup.com/paper/discrete-flow-matching-policy-optimization", "name": "Discrete Flow Matching Policy Optimization", "description": "A unified RL framework for fine-tuning discrete flow matching models, improving controllable discrete sequence generation.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/discrete-flow-matching-policy-optimization#scholarlyArticle", "headline": "Discrete Flow Matching Policy Optimization", "description": "A unified RL framework for fine-tuning discrete flow matching models, improving controllable discrete sequence generation.", "url": "https://sciencetostartup.com/paper/discrete-flow-matching-policy-optimization", "sameAs": "https://arxiv.org/abs/2604.06491", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.06491" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-07T21:49:29.000Z", "author": [ { "@type": "Person", "name": "Maojiang Su" }, { "@type": "Person", "name": "Po-Chung Hsieh" }, { "@type": "Person", "name": "Weimin Wu" }, { "@type": "Person", "name": "Mingcheng Lu" }, { "@type": "Person", "name": "Jiunhau Chen" }, { "@type": "Person", "name": "Jerry Yao-Chieh Hu" }, { "@type": "Person", "name": "Han Liu" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reinforcement Learning" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Discrete Flow Matching Policy Optimization", "item": "https://sciencetostartup.com/paper/discrete-flow-matching-policy-optimization" } ] } ] }

Competitive landscape

A unified RL framework for fine-tuning discrete flow matching models, improving controllable discrete sequence generation.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Discrete Flow Matching Policy Optimization

Discrete Flow Matching Policy Optimization

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline