ARXIV:2605.10816 · REINFORCEMENT LEARNING · SUBMITTED 12 MAY · 20:16 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Policy Gradient Methods for Non-Markovian Reinforcement Learning

Avik Kar · Siddharth Chandak · Rahul Singh · Soumitra Sinhahajari · Eric Moulines · Shalabh Bhatnagar · +1 at arXiv

A novel policy gradient method for non-Markovian reinforcement learning that jointly optimizes agent state dynamics and control policy for improved performance.

Ship in 2-4 weeks›Score5.0Evidence unverified

Opportunity summary

Pain A novel policy gradient method for non-Markovian reinforcement learning that jointly optimizes agent state dynamics and control policy for improved performance.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A novel policy gradient method for non-Markovian reinforcement learning that jointly optimizes agent state dynamics and control policy for improved performance. To handle this dependence, the agent maintains an internal state that is recursively…

METHOD

Full abstract

We study policy gradient methods for reinforcement learning in non-Markovian decision processes (NMDPs), where observations and rewards depend on the entire interaction history. To handle this dependence, the agent maintains an internal state that is recursively updated to provide a compact summary of past observations and actions. In contrast to approaches that treat the agent state dynamics as fixed or learn it via predictive objectives, we propose a reward-centric formulation that jointly optimizes the agent state dynamics and the control policy to maximize the expected cumulative reward. To this end, we consider a class of Agent State-Markov (ASM) policies, comprising an agent state dynamics and a control policy that maps the agent state to actions. We establish a novel policy gradient theorem for ASM policies, extending the classical policy gradient results from the Markovian setting to episodic and infinite-horizon discounted NMDPs. Building on this gradient expression, we propose the Agent State-Markov Policy Gradient (ASMPG) algorithm, which leverages the recursive structure of the agent state dynamics for efficient optimization. We establish finite-time and almost sure convergence guarantees, and empirically demonstrate that, on a range of non-Markovian tasks, ASMPG outperforms baselines that learn state representations via predictive objectives.

RESULT

ScienceToStartup currently rates this 5.0/10 on the public viability pass. We establish a novel policy gradient theorem for ASM policies, extending the classical policy gradient results from the Markovian setting to episodic and infinite-horizon…

WHY NOW

Reinforcement Learning moved forward this cycle; last verified May 2026. Public score 5.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainA novel policy gradient method for non-Markovian reinforcement learning that jointly optimizes agent state dynamics and control policy for improved performance.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

A novel policy gradient method for non-Markovian reinforcement learning that jointly optimizes agent state dynamics and control policy for improved performance.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A novel policy gradient method for non-Markovian reinforcement learning that jointly optimizes agent state dynamics and control policy for improved performance.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "17904827-c4fb-4b0c-92d4-5f85ecd51fce", "arxiv_id": "2605.10816", "canonical_route": "/paper/policy-gradient-methods-for-non-markovian-reinforcement-learning", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "policy-gradient-methods-for-non-markovian-reinforcement-learning", "endpoints": { "paper_pack": "/api/v1/paper/policy-gradient-methods-for-non-markovian-reinforcement-learning/paper-pack", "build_passport": "/api/v1/paper/policy-gradient-methods-for-non-markovian-reinforcement-learning/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Policy Gradient Methods for Non-Markovian Reinforcement Learning", "normalized_query": "2605.10816", "route": "/paper/policy-gradient-methods-for-non-markovian-reinforcement-learning", "paper_ref": "policy-gradient-methods-for-non-markovian-reinforcement-learning", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/policy-gradient-methods-for-non-markovian-reinforcement-learning#webpage", "url": "https://sciencetostartup.com/paper/policy-gradient-methods-for-non-markovian-reinforcement-learning", "name": "Policy Gradient Methods for Non-Markovian Reinforcement Learning", "description": "A novel policy gradient method for non-Markovian reinforcement learning that jointly optimizes agent state dynamics and control policy for improved performance.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/policy-gradient-methods-for-non-markovian-reinforcement-learning#scholarlyArticle", "headline": "Policy Gradient Methods for Non-Markovian Reinforcement Learning", "description": "A novel policy gradient method for non-Markovian reinforcement learning that jointly optimizes agent state dynamics and control policy for improved performance.", "url": "https://sciencetostartup.com/paper/policy-gradient-methods-for-non-markovian-reinforcement-learning", "sameAs": "https://arxiv.org/abs/2605.10816", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.10816" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-11T16:34:28.000Z", "author": [ { "@type": "Person", "name": "Avik Kar" }, { "@type": "Person", "name": "Siddharth Chandak" }, { "@type": "Person", "name": "Rahul Singh" }, { "@type": "Person", "name": "Soumitra Sinhahajari" }, { "@type": "Person", "name": "Eric Moulines" }, { "@type": "Person", "name": "Shalabh Bhatnagar" }, { "@type": "Person", "name": "Nicholas Bambos" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 5 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reinforcement Learning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Policy Gradient Methods for Non-Markovian Reinforcement Lear", "item": "https://sciencetostartup.com/paper/policy-gradient-methods-for-non-markovian-reinforcement-learning" } ] } ] }

Competitive landscape

A novel policy gradient method for non-Markovian reinforcement learning that jointly optimizes agent state dynamics and control policy for improved performance.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Policy Gradient Methods for Non-Markovian Reinforcement Learning

Policy Gradient Methods for Non-Markovian Reinforcement Learning

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline