ARXIV:2604.25379 · REINFORCEMENT LEARNING · SUBMITTED 29 APR · 02:44 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Safe-Support Q-Learning: Learning without Unsafe Exploration

Yeeun Lim · Narim Jeong · Donghwan Lee · arXiv

A Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training.

Ship in 2-4 weeks›Score3.0Evidence unverified

Opportunity summary

Pain A Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training. While most safe RL methods mitigate risk through constraints or penalization, they still allow exploration of unsafe states during training.

METHOD

Full abstract

Ensuring safety during reinforcement learning (RL) training is critical in real-world applications where unsafe exploration can lead to devastating outcomes. While most safe RL methods mitigate risk through constraints or penalization, they still allow exploration of unsafe states during training. In this work, we adopt a stricter safety requirement that eliminates unsafe state visitation during training. To achieve this goal, we propose a Q-learning-based safe RL framework that leverages a behavior policy supported on a safe set. Under the assumption that the induced trajectories remain within the safe set, this policy enables sufficient exploration within the safe region without requiring near-optimality. We adopt a two-stage framework in which the Q-function and policy are trained separately. Specifically, we introduce a KL-regularized Bellman target that constrains the Q-function to remain close to the behavior policy. We then derive the policy induced from the trained Q-values and propose a parametric policy extraction method to approximate the optimal policy. Our approach provides a unified framework that can be adapted to different action spaces and types of behavior policies. Experimental results demonstrate that the proposed method achieves stable learning and well-calibrated value estimates and yields safer behavior with comparable or better performance than existing baselines.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. To achieve this goal, we propose a Q-learning-based safe RL framework that leverages a behavior policy supported on a safe set. Code availability is…

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 3.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainA Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "97f95453-d6da-4261-a607-2f142c58add1", "arxiv_id": "2604.25379", "canonical_route": "/paper/safe-support-q-learning-learning-without-unsafe-exploration", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "safe-support-q-learning-learning-without-unsafe-exploration", "endpoints": { "paper_pack": "/api/v1/paper/safe-support-q-learning-learning-without-unsafe-exploration/paper-pack", "build_passport": "/api/v1/paper/safe-support-q-learning-learning-without-unsafe-exploration/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Safe-Support Q-Learning: Learning without Unsafe Exploration", "normalized_query": "2604.25379", "route": "/paper/safe-support-q-learning-learning-without-unsafe-exploration", "paper_ref": "safe-support-q-learning-learning-without-unsafe-exploration", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/safe-support-q-learning-learning-without-unsafe-exploration#webpage", "url": "https://sciencetostartup.com/paper/safe-support-q-learning-learning-without-unsafe-exploration", "name": "Safe-Support Q-Learning: Learning without Unsafe Exploration", "description": "A Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/safe-support-q-learning-learning-without-unsafe-exploration#scholarlyArticle", "headline": "Safe-Support Q-Learning: Learning without Unsafe Exploration", "description": "A Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training.", "url": "https://sciencetostartup.com/paper/safe-support-q-learning-learning-without-unsafe-exploration", "sameAs": "https://arxiv.org/abs/2604.25379", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.25379" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-28T08:43:39.000Z", "author": [ { "@type": "Person", "name": "Yeeun Lim" }, { "@type": "Person", "name": "Narim Jeong" }, { "@type": "Person", "name": "Donghwan Lee" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reinforcement Learning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Safe-Support Q-Learning: Learning without Unsafe Exploration", "item": "https://sciencetostartup.com/paper/safe-support-q-learning-learning-without-unsafe-exploration" } ] } ] }

Competitive landscape

A Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Safe-Support Q-Learning: Learning without Unsafe Exploration

Safe-Support Q-Learning: Learning without Unsafe Exploration

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline