ARXIV:2601.21008 · OPERATIONS RESEARCH AI · SUBMITTED 19 MAR · 21:31 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsErrorProof: failed

Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

arXiv

A new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing methods in speed and accuracy.

Blocked on Code›Score9.0Evidence failed

Opportunity summary

Pain A new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing methods in speed and accuracy.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence failed

Open Build Read PDF Signal Canvas Track

PROBLEM

A new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing methods in speed and accuracy. Yet existing LLM benchmarks evaluate OR as one-shot translation -- given a problem description,…

METHOD

Full abstract

Operations Research practitioners routinely debug infeasible models through an iterative process: analyzing Irreducible Infeasible Subsystems (\IIS{}), identifying constraint conflicts, and systematically repairing formulations until feasibility is achieved. Yet existing LLM benchmarks evaluate OR as one-shot translation -- given a problem description, generate solver code -- ignoring this diagnostic loop entirely. We introduce two benchmarks that place the \textbf{solver in the evaluation loop}. \textbf{\ORDebug{}} evaluates iterative self-correction through 5,000+ problems spanning 9 error types; each repair action triggers solver re-execution and \IIS{} recomputation, providing deterministic, verifiable feedback. \textbf{\ORBias{}} evaluates behavioral rationality through 2,000 newsvendor instances (1,000 ID + 1,000 OOD), measuring systematic deviations from closed-form optimal policies. Across 26 models and 12,000+ samples, we find that domain-specific RLVR training enables an 8B model to surpass frontier APIs: 95.3\% vs 86.2\% recovery rate (+9.1\%), 62.4\% vs 47.8\% diagnostic accuracy (+14.6\%), and 2.25 vs 3.78 steps to resolution (1.7$\times$ faster). On \ORBias{}, curriculum training achieves the only negative ID$\rightarrow$OOD bias drift among models evaluated (-9.6\%), reducing systematic bias by 48\% (from 20.0\% to 10.4\%). These results demonstrate that process-level evaluation with verifiable oracles enables targeted training that outperforms scale.

RESULT

ScienceToStartup currently rates this 9.0/10 on the public viability pass. Across 26 models and 12,000+ samples, we find that domain-specific RLVR training enables an 8B model to surpass frontier APIs: 95.3\% vs 86.2\% recovery…

WHY NOW

Operations Research AI moved forward this cycle; last verified April 2026. Public score 9.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score9.0

PainA new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing methods in speed and accuracy.

Evidence0 refs | 0 sources | 33% coverage

Blockermissing authors

Analysis summary

A new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing methods in speed and accuracy.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsErrorProof: failed

Competitive landscape

A new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing methods in speed and accuracy.

Segment

Operations Research AI

Adoption evidence

No public code link in the paper record yet

Commercial read

9.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "7c8bed5d-37a3-4124-9b7e-e805bd856ae6", "arxiv_id": "2601.21008", "canonical_route": "/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research", "endpoints": { "paper_pack": "/api/v1/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research/paper-pack", "build_passport": "/api/v1/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research", "normalized_query": "2601.21008", "route": "/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research", "paper_ref": "solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research#webpage", "url": "https://sciencetostartup.com/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research", "name": "Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research", "description": "A new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing methods in speed and accuracy.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research#scholarlyArticle", "headline": "Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research", "description": "A new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing methods in speed and accuracy.", "url": "https://sciencetostartup.com/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research", "sameAs": "https://arxiv.org/abs/2601.21008", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2601.21008" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-01-28T20:02:44.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 9 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Operations Research AI" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Operations Research AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction", "item": "https://sciencetostartup.com/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research" } ] } ] }

Competitive landscape

A new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing methods in speed and accuracy.

Segment

Operations Research AI

Adoption evidence

No public code link in the paper record yet

Commercial read

9.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline