ARXIV:2605.23652 · GAME AI · SUBMITTED 25 MAY · 20:33 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Yoosung Hong · arXiv

A single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency.

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency. Life simulation games require hundreds to thousands of non-player characters (NPCs) that behave consistently with distinct personalities…

METHOD

Full abstract

On a 300-persona life-simulation benchmark, pcsp achieves compositional zero-shot persona identification up to 17x above chance, Spearman rho approx 0.73 semantic-behavioral alignment, and 22x faster inference than an LLM-as-policy baseline. Life simulation games require hundreds to thousands of non-player characters (NPCs) that behave consistently with distinct personalities while remaining controllable through designer-authored natural language. Existing methods fail on constraints like persona consistency, controllability, or real-time inference. We introduce pcsp (Persona Conditioned Shared Policy), a single reinforcement learning policy conditioned on frozen LLM embeddings of free-form persona descriptions. pcsp combines once-per-NPC persona encoding, low-rank persona projection, neural persona conditioning, and a PPO + InfoNCE consistency + KL diversity training objective. Across three experimental settings, ablations show that the InfoNCE trajectory-consistency objective is load bearing: removing it collapses zero-shot persona identification to chance. External validation on Melting Pot 2.4.0 substrates confirms that our method produces persona-conditioned behavioral divergence in multi-agent strategic environments. We distinguish two senses of held-out evaluation: compositional zero-shot and vocabulary-expansion held-out. Finally, a UE5 deployment reproduces the in-engine persona-conditioning ablation at 64 agents with a low failure rate, showing that the sub-frame inference profile survives in a commercial game engine. These results prove that shared RL policies can support scalable, real-time, persona-conditioned NPC control.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. On a 300-persona life-simulation benchmark, pcsp achieves compositional zero-shot persona identification up to 17x above chance, Spearman rho approx 0.73 semantic-behavioral alignment, and 22x…

WHY NOW

Game AI moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

A single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency.

Segment

Game AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2d618562-e428-481a-be48-5b064fdd7f67", "arxiv_id": "2605.23652", "canonical_route": "/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents", "endpoints": { "paper_pack": "/api/v1/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents/paper-pack", "build_passport": "/api/v1/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents", "normalized_query": "2605.23652", "route": "/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents", "paper_ref": "one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents#webpage", "url": "https://sciencetostartup.com/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents", "name": "One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents", "description": "A single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents#scholarlyArticle", "headline": "One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents", "description": "A single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency.", "url": "https://sciencetostartup.com/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents", "sameAs": "https://arxiv.org/abs/2605.23652", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.23652" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-22T14:04:43.000Z", "author": [ { "@type": "Person", "name": "Yoosung Hong" } ], "codeRepository": "https://github.com/yoosunghong/pcsp", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Game AI" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents#software", "name": "One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents - Source Code", "description": "A single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency.", "codeRepository": "https://github.com/yoosunghong/pcsp", "url": "https://github.com/yoosunghong/pcsp" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Game AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "One Policy, Infinite NPCs: Persona-Traceable Shared RL Polic", "item": "https://sciencetostartup.com/paper/one-policy-infinite-npcs-persona-traceable-shared-rl-policies-for-scalable-game-agents" } ] } ] }

Competitive landscape

A single reinforcement learning policy for scalable game agents that can adopt diverse personas and maintain consistency.

Segment

Game AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline