ARXIV:2605.20423 · THEORY OF MIND · SUBMITTED 21 MAY · 20:31 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

Sharmin Sultana Srishty · Kazi Mahathir Rahman · Malaika Parizat Sakkhi · Samia Shahid Prianna · Shaikhul Islam Sinat · arXiv

OSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain OSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning.

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

OSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning. Existing benchmarks, including ExploreToM, do not always test the recursive beliefs and information asymmetries that make these settings difficult.

METHOD

Full abstract

Large Language Models (LLMs) perform well on many language tasks, but their Theory of Mind (ToM) reasoning is still uneven in complex social settings. Existing benchmarks, including ExploreToM, do not always test the recursive beliefs and information asymmetries that make these settings difficult. This paper presents OSCToM (Observer-Self Conflict Theory of Mind), an approach for modeling nested belief conflicts in LLM-based ToM tasks. The key case is one in which an observer's view of another agent conflicts with the observer's own belief state. Such cases go beyond simple perspective-taking and require recursive, multi-layered reasoning. OSCToM combines reinforcement learning (RL), an extended domain-specific language, and compositional surrogate models to generate observer-self conflicts. In our experiments, OSCToM-8B gives the best overall result among the systems tested. It improves on the reported ExploreToM results on FANToM and remains competitive on Hi-ToM and BigToM. On the information-asymmetric FANToM benchmark, OSCToM reaches 76% accuracy, compared with the 0.2% reported by ExploreToM. The data-synthesis procedure is also 6x more efficient, indicating that targeted training data can help smaller models handle advanced cognitive reasoning. The project code is available at https://github.com/sharminsrishty/osct.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. In our experiments, OSCToM-8B gives the best overall result among the systems tested. A public repository is linked, so build verification can inspect implementation…

WHY NOW

Theory of Mind moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainOSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

OSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

OSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning.

Segment

Theory of Mind

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2a6dc306-179e-4c29-a7ee-c14cc0b8908c", "arxiv_id": "2605.20423", "canonical_route": "/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind", "endpoints": { "paper_pack": "/api/v1/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind/paper-pack", "build_passport": "/api/v1/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind", "normalized_query": "2605.20423", "route": "/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind", "paper_ref": "osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind#webpage", "url": "https://sciencetostartup.com/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind", "name": "OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind", "description": "OSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind#scholarlyArticle", "headline": "OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind", "description": "OSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning.", "url": "https://sciencetostartup.com/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind", "sameAs": "https://arxiv.org/abs/2605.20423", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.20423" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-19T19:19:26.000Z", "author": [ { "@type": "Person", "name": "Sharmin Sultana Srishty" }, { "@type": "Person", "name": "Kazi Mahathir Rahman" }, { "@type": "Person", "name": "Malaika Parizat Sakkhi" }, { "@type": "Person", "name": "Samia Shahid Prianna" }, { "@type": "Person", "name": "Shaikhul Islam Sinat" } ], "codeRepository": "https://github.com/sharminsrishty/osct", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Theory of Mind" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind#software", "name": "OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind - Source Code", "description": "OSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning.", "codeRepository": "https://github.com/sharminsrishty/osct", "url": "https://github.com/sharminsrishty/osct" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Theory of Mind", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "OSCToM: RL-Guided Adversarial Generation for High-Order Theo", "item": "https://sciencetostartup.com/paper/osctom-rl-guided-adversarial-generation-for-high-order-theory-of-mind" } ] } ] }

Competitive landscape

OSCToM enhances LLMs' Theory of Mind reasoning by modeling nested belief conflicts using reinforcement learning.

Segment

Theory of Mind

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline