ARXIV:2605.12474 · RL EVALUATION · SUBMITTED 13 MAY · 21:04 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Reward Hacking in Rubric-Based Reinforcement Learning

Anas Mahmoud · MohammadHossein Rezaei · Zihao Wang · Anisha Gunjal · Bing Liu · Yunzhong He · arXiv

Investigating reward hacking in rubric-based reinforcement learning to understand and mitigate divergence between training verifiers and human judges.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain Investigating reward hacking in rubric-based reinforcement learning to understand and mitigate divergence between training verifiers and human judges.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Investigating reward hacking in rubric-based reinforcement learning to understand and mitigate divergence between training verifiers and human judges. We study reward hacking in rubric-based RL, where a policy is optimized against a training verifier…

METHOD

Full abstract

Reinforcement learning with verifiable rewards has enabled strong post-training gains in domains such as math and coding, though many open-ended settings rely on rubric-based rewards. We study reward hacking in rubric-based RL, where a policy is optimized against a training verifier but evaluated against a cross-family panel of three frontier judges, reducing dependence on any single evaluator. Our framework separates two sources of divergence: verifier failure, where the training verifier credits rubric criteria that reference verifiers reject, and rubric-design limitations, where even strong rubric-based verifiers favor responses that rubric-free judges rate worse overall. Across medical and science domains, weak verifiers produce large proxy-reward gains that do not transfer to the reference verifiers; exploitation grows over training and concentrates in recurring failures such as partial satisfaction of compound criteria, treating implicit content as explicit, and imprecise topical matching. Stronger verifiers substantially reduce, but do not eliminate, verifier exploitation. We also introduce a self-internalization gap, a verifier-free diagnostic based on policy log-probabilities, which tracks reference-verifier quality, detecting when the policy trained using the weak verifier stops improving. Finally, in our setting, stronger verification does not prevent reward hacking when the rubric leaves important failure modes unspecified: rubric-based verifiers prefer the RL checkpoint, while rubric-free judges prefer the base model. These disagreements coincide with gains concentrated in completeness and presence-based criteria, alongside declines in factual correctness, conciseness, relevance, and overall quality. Together, these results suggest that stronger verification reduces reward hacking, but does not by itself ensure that rubric gains correspond to broader quality gains.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. Together, these results suggest that stronger verification reduces reward hacking, but does not by itself ensure that rubric gains correspond to broader quality gains.

WHY NOW

RL Evaluation moved forward this cycle; last verified May 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainInvestigating reward hacking in rubric-based reinforcement learning to understand and mitigate divergence between training verifiers and human judges.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Investigating reward hacking in rubric-based reinforcement learning to understand and mitigate divergence between training verifiers and human judges.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Investigating reward hacking in rubric-based reinforcement learning to understand and mitigate divergence between training verifiers and human judges.

Segment

RL Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d83338c0-b726-4d22-a0ce-3e5cc4be1e59", "arxiv_id": "2605.12474", "canonical_route": "/paper/reward-hacking-in-rubric-based-reinforcement-learning", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "reward-hacking-in-rubric-based-reinforcement-learning", "endpoints": { "paper_pack": "/api/v1/paper/reward-hacking-in-rubric-based-reinforcement-learning/paper-pack", "build_passport": "/api/v1/paper/reward-hacking-in-rubric-based-reinforcement-learning/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Reward Hacking in Rubric-Based Reinforcement Learning", "normalized_query": "2605.12474", "route": "/paper/reward-hacking-in-rubric-based-reinforcement-learning", "paper_ref": "reward-hacking-in-rubric-based-reinforcement-learning", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/reward-hacking-in-rubric-based-reinforcement-learning#webpage", "url": "https://sciencetostartup.com/paper/reward-hacking-in-rubric-based-reinforcement-learning", "name": "Reward Hacking in Rubric-Based Reinforcement Learning", "description": "Investigating reward hacking in rubric-based reinforcement learning to understand and mitigate divergence between training verifiers and human judges.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/reward-hacking-in-rubric-based-reinforcement-learning#scholarlyArticle", "headline": "Reward Hacking in Rubric-Based Reinforcement Learning", "description": "Investigating reward hacking in rubric-based reinforcement learning to understand and mitigate divergence between training verifiers and human judges.", "url": "https://sciencetostartup.com/paper/reward-hacking-in-rubric-based-reinforcement-learning", "sameAs": "https://arxiv.org/abs/2605.12474", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.12474" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-12T17:54:25.000Z", "author": [ { "@type": "Person", "name": "Anas Mahmoud" }, { "@type": "Person", "name": "MohammadHossein Rezaei" }, { "@type": "Person", "name": "Zihao Wang" }, { "@type": "Person", "name": "Anisha Gunjal" }, { "@type": "Person", "name": "Bing Liu" }, { "@type": "Person", "name": "Yunzhong He" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "RL Evaluation" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "RL Evaluation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Reward Hacking in Rubric-Based Reinforcement Learning", "item": "https://sciencetostartup.com/paper/reward-hacking-in-rubric-based-reinforcement-learning" } ] } ] }

Competitive landscape

Investigating reward hacking in rubric-based reinforcement learning to understand and mitigate divergence between training verifiers and human judges.

Segment

RL Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Reward Hacking in Rubric-Based Reinforcement Learning

Reward Hacking in Rubric-Based Reinforcement Learning

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline