ARXIV:2602.05051 · REINFORCEMENT LEARNING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation

arXiv

ReFORM optimizes flow policies to enhance offline RL by reducing out-of-distribution errors and maximizing policy performance.

Blocked on Code›Score5.0Evidence unverified

Opportunity summary

Pain ReFORM optimizes flow policies to enhance offline RL by reducing out-of-distribution errors and maximizing policy performance.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

ReFORM optimizes flow policies to enhance offline RL by reducing out-of-distribution errors and maximizing policy performance. One common challenge that arises in this setting is the out-of-distribution (OOD) error, which occurs when the policy…

METHOD

Full abstract

Offline reinforcement learning (RL) aims to learn the optimal policy from a fixed dataset generated by behavior policies without additional environment interactions. One common challenge that arises in this setting is the out-of-distribution (OOD) error, which occurs when the policy leaves the training distribution. Prior methods penalize a statistical distance term to keep the policy close to the behavior policy, but this constrains policy improvement and may not completely prevent OOD actions. Another challenge is that the optimal policy distribution can be multimodal and difficult to represent. Recent works apply diffusion or flow policies to address this problem, but it is unclear how to avoid OOD errors while retaining policy expressiveness. We propose ReFORM, an offline RL method based on flow policies that enforces the less restrictive support constraint by construction. ReFORM learns a behavior cloning (BC) flow policy with a bounded source distribution to capture the support of the action distribution, then optimizes a reflected flow that generates bounded noise for the BC flow while keeping the support, to maximize the performance. Across 40 challenging tasks from the OGBench benchmark with datasets of varying quality and using a constant set of hyperparameters for all tasks, ReFORM dominates all baselines with hand-tuned hyperparameters on the performance profile curves.

RESULT

ScienceToStartup currently rates this 5.0/10 on the public viability pass. We propose ReFORM, an offline RL method based on flow policies that enforces the less restrictive support constraint by construction.

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 5.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainReFORM optimizes flow policies to enhance offline RL by reducing out-of-distribution errors and maximizing policy performance.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

ReFORM optimizes flow policies to enhance offline RL by reducing out-of-distribution errors and maximizing policy performance.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

ReFORM optimizes flow policies to enhance offline RL by reducing out-of-distribution errors and maximizing policy performance.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "64cc882b-a2ff-45a9-8835-3a1d82da75c4", "arxiv_id": "2602.05051", "canonical_route": "/paper/reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation", "endpoints": { "paper_pack": "/api/v1/paper/reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation/paper-pack", "build_passport": "/api/v1/paper/reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation", "normalized_query": "2602.05051", "route": "/paper/reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation", "paper_ref": "reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation#webpage", "url": "https://sciencetostartup.com/paper/reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation", "name": "ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation", "description": "ReFORM optimizes flow policies to enhance offline RL by reducing out-of-distribution errors and maximizing policy performance.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation#scholarlyArticle", "headline": "ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation", "description": "ReFORM optimizes flow policies to enhance offline RL by reducing out-of-distribution errors and maximizing policy performance.", "url": "https://sciencetostartup.com/paper/reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation", "sameAs": "https://arxiv.org/abs/2602.05051", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2602.05051" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-02-04T21:03:11.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 5 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reinforcement Learning" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "ReFORM: Reflected Flows for On-support Offline RL via Noise ", "item": "https://sciencetostartup.com/paper/reform-reflected-flows-for-on-support-offline-rl-via-noise-manipulation" } ] } ] }

Competitive landscape

ReFORM optimizes flow policies to enhance offline RL by reducing out-of-distribution errors and maximizing policy performance.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation

ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline