ARXIV:2605.09826 · AGENTS · SUBMITTED 12 MAY · 20:15 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents

Gurusha Juneja · Dylan Lu · Saaket Agashe · Parth Diwane · Edward Gunn · Jayanth Srinivasa · +4 at arXiv

A new benchmark for embodied AI agents to test functional theory of mind, revealing significant gaps in current frontier models' ability to coordinate and act on implicit beliefs.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A new benchmark for embodied AI agents to test functional theory of mind, revealing significant gaps in current frontier models' ability to coordinate and act on implicit beliefs.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A new benchmark for embodied AI agents to test functional theory of mind, revealing significant gaps in current frontier models' ability to coordinate and act on implicit beliefs. AI agents need the same capacity…

METHOD

Full abstract

Theory of Mind (ToM), the ability to track others epistemic state, makes humans efficient collaborators. AI agents need the same capacity in multi agent settings, yet existing benchmarks mostly test literal ToM by asking direct belief questions. The ability act optimally on implicit beliefs in embodied environments, called functional ToM, remains largely untested. We introduce EnactToM, an evolving benchmark of 300 embodied multi-agent tasks set in a 3D household with partial observability, private information, and constrained communication. Each task is formally verified for solvability and required epistemic depth, and new tasks are generated increase difficulty as models improve. On the hard split, all seven evaluated frontier models score 0.0% Pass^3 on functional task completion, while averaging 45.0% on literal belief probes. Manual analysis traces 93% of sampled failures to epistemic coordination breakdowns such as withheld information, ignored partner constraints, and misallocated messages, providing a concrete target for future work.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Each task is formally verified for solvability and required epistemic depth, and new tasks are generated increase difficulty as models improve. Code availability is…

WHY NOW

Agents moved forward this cycle; last verified May 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA new benchmark for embodied AI agents to test functional theory of mind, revealing significant gaps in current frontier models' ability to coordinate and act on implicit beliefs.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

A new benchmark for embodied AI agents to test functional theory of mind, revealing significant gaps in current frontier models' ability to coordinate and act on implicit beliefs.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A new benchmark for embodied AI agents to test functional theory of mind, revealing significant gaps in current frontier models' ability to coordinate and act on implicit beliefs.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d3389a26-8443-4c0b-9fe3-36a75d2061c1", "arxiv_id": "2605.09826", "canonical_route": "/paper/enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents", "endpoints": { "paper_pack": "/api/v1/paper/enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents/paper-pack", "build_passport": "/api/v1/paper/enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents", "normalized_query": "2605.09826", "route": "/paper/enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents", "paper_ref": "enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents#webpage", "url": "https://sciencetostartup.com/paper/enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents", "name": "EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents", "description": "A new benchmark for embodied AI agents to test functional theory of mind, revealing significant gaps in current frontier models' ability to coordinate and act on implicit beliefs.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents#scholarlyArticle", "headline": "EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents", "description": "A new benchmark for embodied AI agents to test functional theory of mind, revealing significant gaps in current frontier models' ability to coordinate and act on implicit beliefs.", "url": "https://sciencetostartup.com/paper/enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents", "sameAs": "https://arxiv.org/abs/2605.09826", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.09826" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-11T00:04:19.000Z", "author": [ { "@type": "Person", "name": "Gurusha Juneja" }, { "@type": "Person", "name": "Dylan Lu" }, { "@type": "Person", "name": "Saaket Agashe" }, { "@type": "Person", "name": "Parth Diwane" }, { "@type": "Person", "name": "Edward Gunn" }, { "@type": "Person", "name": "Jayanth Srinivasa" }, { "@type": "Person", "name": "Gaowen Liu" }, { "@type": "Person", "name": "William Yang Wang" }, { "@type": "Person", "name": "Yali Du" }, { "@type": "Person", "name": "Xin Eric Wang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "EnactToM: An Evolving Benchmark for Functional Theory of Min", "item": "https://sciencetostartup.com/paper/enacttom-an-evolving-benchmark-for-functional-theory-of-mind-in-embodied-agents" } ] } ] }

Competitive landscape

A new benchmark for embodied AI agents to test functional theory of mind, revealing significant gaps in current frontier models' ability to coordinate and act on implicit beliefs.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents

EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline