ARXIV:2603.12110 · REINFORCEMENT LEARNING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

arXiv

A framework for learning robust reinforcement learning policies against external disturbances.

Blocked on Code›Score6.0Evidence unverified

Opportunity summary

Pain A framework for learning robust reinforcement learning policies against external disturbances.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A framework for learning robust reinforcement learning policies against external disturbances. However, RL agents often exhibit unstable or degraded performance when deployed in environments subject to unexpected external disturbances and model uncertainties.

METHOD

Full abstract

Reinforcement learning (RL) has achieved remarkable success in a wide range of control and decision-making tasks. However, RL agents often exhibit unstable or degraded performance when deployed in environments subject to unexpected external disturbances and model uncertainties. Consequently, ensuring reliable performance under such conditions remains a critical challenge. In this paper, we propose minimax deep deterministic policy gradient (MMDDPG), a framework for learning disturbance-resilient policies in continuous control tasks. The training process is formulated as a minimax optimization problem between a user policy and an adversarial disturbance policy. In this problem, the user learns a robust policy that minimizes the objective function, while the adversary generates disturbances that maximize it. To stabilize this interaction, we introduce a fractional objective that balances task performance and disturbance magnitude. This objective prevents excessively aggressive disturbances and promotes robust learning. Experimental evaluations in MuJoCo environments demonstrate that the proposed MMDDPG achieves significantly improved robustness against both external force perturbations and model parameter variations.

RESULT

ScienceToStartup currently rates this 6.0/10 on the public viability pass. Experimental evaluations in MuJoCo environments demonstrate that the proposed MMDDPG achieves significantly improved robustness against both external force perturbations and model parameter variations.

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 6.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score6.0

PainA framework for learning robust reinforcement learning policies against external disturbances.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

A framework for learning robust reinforcement learning policies against external disturbances.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

A framework for learning robust reinforcement learning policies against external disturbances.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

References(17)

Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control

2025Taeho Lee, Donghwan Lee

Learning H-Infinity Locomotion Control

2024Junfeng Long, Wenye Yu et al.

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

2024Lucas Schott, Joséphine Delas et al.

Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint

2022Peng Zhai, Jie Luo et al.

Robust Deep Reinforcement Learning for Quadcopter Control

2021A Deshpande, A. Minai et al.

Robust Reinforcement Learning using Adversarial Populations

2020Eugene Vinitsky, Yuqing Du et al.

Action Robust Reinforcement Learning and Applications in Continuous Control

2019Chen Tessler, Yonathan Efroni et al.

QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

2018Dmitry Kalashnikov, A. Irpan et al.

Mastering the game of Go without human knowledge

2017David Silver, Julian Schrittwieser et al.

Robust Adversarial Reinforcement Learning

2017Lerrel Pinto, James Davidson et al.

Continuous control with deep reinforcement learning

2015T. Lillicrap, Jonathan J. Hunt et al.

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

2015J. Pérolat, B. Scherrer et al.

Human-level control through deep reinforcement learning

2015Volodymyr Mnih, K. Kavukcuoglu et al.

Deterministic Policy Gradient Algorithms

2014David Silver, Guy Lever et al.

MuJoCo: A physics engine for model-based control

2012E. Todorov, Tom Erez et al.

A dynamic games approach to controller design: disturbance rejection in discrete time

1989T. Başar

On the theory of brownian motion

1973R. Mazo

{ "contract_version": "paper-r2", "paper_id": "71d3489c-bc39-4cad-9dd7-e50b0b5b6112", "arxiv_id": "2603.12110", "canonical_route": "/paper/taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives", "endpoints": { "paper_pack": "/api/v1/paper/taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives/paper-pack", "build_passport": "/api/v1/paper/taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives", "normalized_query": "2603.12110", "route": "/paper/taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives", "paper_ref": "taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives#webpage", "url": "https://sciencetostartup.com/paper/taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives", "name": "Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives", "description": "A framework for learning robust reinforcement learning policies against external disturbances.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives#scholarlyArticle", "headline": "Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives", "description": "A framework for learning robust reinforcement learning policies against external disturbances.", "url": "https://sciencetostartup.com/paper/taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives", "sameAs": "https://arxiv.org/abs/2603.12110", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.12110" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-12T16:15:06.000Z", "citation": [ { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "965f39463783c2b9b999bf23737f51e0b514637d" }, "url": "https://www.semanticscholar.org/paper/965f39463783c2b9b999bf23737f51e0b514637d" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "b6215a1f680c36025a659bdbcf5f63703e9b3e0b" }, "url": "https://www.semanticscholar.org/paper/b6215a1f680c36025a659bdbcf5f63703e9b3e0b" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "37b8a10c4178ee6cba63f6e9f700623d3cf5f421" }, "url": "https://www.semanticscholar.org/paper/37b8a10c4178ee6cba63f6e9f700623d3cf5f421" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "d6ebaa2daebcb509250f633badc3fb154ef53f72" }, "url": "https://www.semanticscholar.org/paper/d6ebaa2daebcb509250f633badc3fb154ef53f72" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "73a086d4cd46b44e978fd7b230bada9abdecbfc1" }, "url": "https://www.semanticscholar.org/paper/73a086d4cd46b44e978fd7b230bada9abdecbfc1" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "317b1dec6d4f950d4607a80df32447827da4799a" }, "url": "https://www.semanticscholar.org/paper/317b1dec6d4f950d4607a80df32447827da4799a" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "bcdb21ca1703fc6f62df420626e36d138480a6a1" }, "url": "https://www.semanticscholar.org/paper/bcdb21ca1703fc6f62df420626e36d138480a6a1" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "eb37e7b76d26b75463df22b2a3aa32b6a765c672" }, "url": "https://www.semanticscholar.org/paper/eb37e7b76d26b75463df22b2a3aa32b6a765c672" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "c27db32efa8137cbf654902f8f728f338e55cd1c" }, "url": "https://www.semanticscholar.org/paper/c27db32efa8137cbf654902f8f728f338e55cd1c" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "9c4082bfbd46b781e70657f14895306c57c842e3" }, "url": "https://www.semanticscholar.org/paper/9c4082bfbd46b781e70657f14895306c57c842e3" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "024006d4c2a89f7acacc6e4438d156525b60a98f" }, "url": "https://www.semanticscholar.org/paper/024006d4c2a89f7acacc6e4438d156525b60a98f" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "f13a4996036016beadddd2683b817a8a4c53a9a2" }, "url": "https://www.semanticscholar.org/paper/f13a4996036016beadddd2683b817a8a4c53a9a2" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "340f48901f72278f6bf78a04ee5b01df208cc508" }, "url": "https://www.semanticscholar.org/paper/340f48901f72278f6bf78a04ee5b01df208cc508" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "687d0e59d5c35f022ce4638b3e3a6142068efc94" }, "url": "https://www.semanticscholar.org/paper/687d0e59d5c35f022ce4638b3e3a6142068efc94" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "b354ee518bfc1ac0d8ac447eece9edb69e92eae1" }, "url": "https://www.semanticscholar.org/paper/b354ee518bfc1ac0d8ac447eece9edb69e92eae1" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "45e7bb9149354a415dcbb684681341d4b756aae8" }, "url": "https://www.semanticscholar.org/paper/45e7bb9149354a415dcbb684681341d4b756aae8" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "6aeb22e31b1d808754bfca8ba2bf597d92972d06" }, "url": "https://www.semanticscholar.org/paper/6aeb22e31b1d808754bfca8ba2bf597d92972d06" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 6 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reinforcement Learning" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Taming the Adversary: Stable Minimax Deep Deterministic Poli", "item": "https://sciencetostartup.com/paper/taming-the-adversary-stable-minimax-deep-deterministic-policy-gradient-via-fractional-objectives" } ] } ] }

Competitive landscape

A framework for learning robust reinforcement learning policies against external disturbances.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

References(17)

Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control

2025Taeho Lee, Donghwan Lee

Learning H-Infinity Locomotion Control

2024Junfeng Long, Wenye Yu et al.

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

2024Lucas Schott, Joséphine Delas et al.

Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint

2022Peng Zhai, Jie Luo et al.

Robust Deep Reinforcement Learning for Quadcopter Control

2021A Deshpande, A. Minai et al.

Robust Reinforcement Learning using Adversarial Populations

2020Eugene Vinitsky, Yuqing Du et al.

Action Robust Reinforcement Learning and Applications in Continuous Control

2019Chen Tessler, Yonathan Efroni et al.

QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

2018Dmitry Kalashnikov, A. Irpan et al.

Mastering the game of Go without human knowledge

2017David Silver, Julian Schrittwieser et al.

Robust Adversarial Reinforcement Learning

2017Lerrel Pinto, James Davidson et al.

Continuous control with deep reinforcement learning

2015T. Lillicrap, Jonathan J. Hunt et al.

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

2015J. Pérolat, B. Scherrer et al.

Human-level control through deep reinforcement learning

2015Volodymyr Mnih, K. Kavukcuoglu et al.

Deterministic Policy Gradient Algorithms

2014David Silver, Guy Lever et al.

MuJoCo: A physics engine for model-based control

2012E. Todorov, Tom Erez et al.

A dynamic games approach to controller design: disturbance rejection in discrete time

1989T. Başar

On the theory of brownian motion

1973R. Mazo

Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(17)

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(17)

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline