ARXIV:2603.17544 · REINFORCEMENT LEARNING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Per-Domain Generalizing Policies: On Learning Efficient and Robust Q-Value Functions (Extended Version with Technical Appendix)

arXiv

This paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning.

Blocked on Code›Score2.0Evidence unverified

Opportunity summary

Pain This paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

This paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning. Standard approaches learn state-value functions represented as graph neural networks using supervised learning on optimal plans generated by…

METHOD

Full abstract

Learning per-domain generalizing policies is a key challenge in learning for planning. Standard approaches learn state-value functions represented as graph neural networks using supervised learning on optimal plans generated by a teacher planner. In this work, we advocate for learning Q-value functions instead. Such policies are drastically cheaper to evaluate for a given state, as they need to process only the current state rather than every successor. Surprisingly, vanilla supervised learning of Q-values performs poorly as it does not learn to distinguish between the actions taken and those not taken by the teacher. We address this by using regularization terms that enforce this distinction, resulting in Q-value policies that consistently outperform state-value policies across a range of 10 domains and are competitive with the planner LAMA-first.

RESULT

ScienceToStartup currently rates this 2.0/10 on the public viability pass. We address this by using regularization terms that enforce this distinction, resulting in Q-value policies that consistently outperform state-value policies across a range of…

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 2.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score2.0

PainThis paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

This paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

This paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

2.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "5e4e10c0-2cbf-426a-bb9a-9188d01e061c", "arxiv_id": "2603.17544", "canonical_route": "/paper/per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe", "endpoints": { "paper_pack": "/api/v1/paper/per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe/paper-pack", "build_passport": "/api/v1/paper/per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Per-Domain Generalizing Policies: On Learning Efficient and Robust Q-Value Functions (Extended Version with Technical Appendix)", "normalized_query": "2603.17544", "route": "/paper/per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe", "paper_ref": "per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe#webpage", "url": "https://sciencetostartup.com/paper/per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe", "name": "Per-Domain Generalizing Policies: On Learning Efficient and Robust Q-Value Functions (Extended Version with Technical Appendix)", "description": "This paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe#scholarlyArticle", "headline": "Per-Domain Generalizing Policies: On Learning Efficient and Robust Q-Value Functions (Extended Version with Technical Appendix)", "description": "This paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning.", "url": "https://sciencetostartup.com/paper/per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe", "sameAs": "https://arxiv.org/abs/2603.17544", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.17544" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-18T09:48:38.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 2 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reinforcement Learning" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Per-Domain Generalizing Policies: On Learning Efficient and ", "item": "https://sciencetostartup.com/paper/per-domain-generalizing-policies-on-learning-efficient-and-robust-q-value-functions-extended-version-with-technical-appe" } ] } ] }

Competitive landscape

This paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

2.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Per-Domain Generalizing Policies: On Learning Efficient and Robust Q-Value Functions (Extended Version with Technical Appendix)

Per-Domain Generalizing Policies: On Learning Efficient and Robust Q-Value Functions (Extended Version with Technical Appendix)

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline