ARXIV:2603.27884 · SAFE REINFORCEMENT LEARNING · SUBMITTED 31 MAR · 20:25 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

Kihyun Yu · Seoungbin Bae · Dabeen Lee · arXiv

A theoretical algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards, achieving near-optimal regret bounds.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain A theoretical algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards, achieving near-optimal regret bounds.

Evidence 41 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A theoretical algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards, achieving near-optimal regret bounds. We propose a primal-dual policy optimization algorithm that achieves regret and constraint violation bounds of $\widetilde{O}(\sqrt{d^2…

METHOD

Full abstract

We study safe reinforcement learning in finite-horizon linear mixture constrained Markov decision processes (CMDPs) with adversarial rewards under full-information feedback and an unknown transition kernel. We propose a primal-dual policy optimization algorithm that achieves regret and constraint violation bounds of $\widetilde{O}(\sqrt{d^2 H^3 K})$ under mild conditions, where $d$ is the feature dimension, $H$ is the horizon, and $K$ is the number of episodes. To the best of our knowledge, this is the first provably efficient algorithm for linear mixture CMDPs with adversarial rewards. In particular, our regret bound is near-optimal, matching the known minimax lower bound up to logarithmic factors. The key idea is to introduce a regularized dual update that enables a drift-based analysis. This step is essential, as strong duality-based analysis cannot be directly applied when reward functions change across episodes. In addition, we extend weighted ridge regression-based parameter estimation to the constrained setting, allowing us to construct tighter confidence intervals that are crucial for deriving the near-optimal regret bound.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. We propose a primal-dual policy optimization algorithm that achieves regret and constraint violation bounds of $\widetilde{O}(\sqrt{d^2 H^3 K})$ under mild conditions, where $d$ is…

WHY NOW

Safe Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainA theoretical algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards, achieving near-optimal regret bounds.

Evidence41 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A theoretical algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards, achieving near-optimal regret bounds.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A theoretical algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards, achieving near-optimal regret bounds.

Segment

Safe Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "99883efa-4dd7-405d-b148-55ba705ceff2", "arxiv_id": "2603.27884", "canonical_route": "/paper/near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards", "endpoints": { "paper_pack": "/api/v1/paper/near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards/paper-pack", "build_passport": "/api/v1/paper/near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards", "normalized_query": "2603.27884", "route": "/paper/near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards", "paper_ref": "near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards#webpage", "url": "https://sciencetostartup.com/paper/near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards", "name": "Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards", "description": "A theoretical algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards, achieving near-optimal regret bounds.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards#scholarlyArticle", "headline": "Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards", "description": "A theoretical algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards, achieving near-optimal regret bounds.", "url": "https://sciencetostartup.com/paper/near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards", "sameAs": "https://arxiv.org/abs/2603.27884", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.27884" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-29T21:51:33.000Z", "author": [ { "@type": "Person", "name": "Kihyun Yu" }, { "@type": "Person", "name": "Seoungbin Bae" }, { "@type": "Person", "name": "Dabeen Lee" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Safe Reinforcement Learning" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Safe Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Near-Optimal Primal-Dual Algorithm for Learning Linear Mixtu", "item": "https://sciencetostartup.com/paper/near-optimal-primal-dual-algorithm-for-learning-linear-mixture-cmdps-with-adversarial-rewards" } ] } ] }

Competitive landscape

A theoretical algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards, achieving near-optimal regret bounds.

Segment

Safe Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline