Evidence Receipt. Related Resources.
Evidence Receipt. Related Resources.
Compared to this week’s papers
Verification pending
Use This Via API or MCP
Signal Canvas is the citation-first public layer for turning one paper into a structured commercialization narrative. Use it to hand off into REST, MCP, Build Loop, and launch-pack execution without losing source lineage.
Use This Via API or MCP
Route this paper proof surface into REST, MCP, or developer workflows while preserving the same evidence receipt and related-resource context.
Page Freshness
Canonical route: /signal-canvas/selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mind
This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.
Agent Handoff
Canonical ID selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mind | Route /signal-canvas/selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mind
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/signal-canvas/selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mindMCP example
{
"tool": "search_signal_canvas",
"arguments": {
"mode": "paper",
"paper_ref": "selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mind",
"query_text": "Summarize Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind"
}
}source_context
{
"surface": "signal_canvas",
"mode": "paper",
"query": "Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind",
"normalized_query": "2603.26089",
"route": "/signal-canvas/selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mind",
"paper_ref": "selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mind",
"topic_slug": null,
"benchmark_ref": null,
"dataset_ref": null
}Claims: 12
References: 17
Proof: Verification pending
Freshness state: computing
Source paper: Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind
PDF: https://arxiv.org/pdf/2603.26089v1
Source count: 3
Coverage: 50%
Last proof check: 2026-03-30T21:55:06.832Z
Signal Canvas receipt window
/buildability/selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mind
Subject: Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind
Verdict
Watch
Verdict is Watch because viability or proof quality is intermediate and should be re-evaluated before execution.
Preparing verified analysis
Dimensions overall score 7.0
No public code linked for this paper yet.
LLMs released before mid-2025 fail at all of our tasks
This is explicitly stated in the abstract and supported by the findings presented in the figures and text.
partial
more recent LLMs achieve human-level performance on modeling the cognitive states of others
This is explicitly stated in the abstract and supported by the text indicating an upward trend for 'other-modeling' tasks with recent LLMs.
partial
even frontier LLMs fail at our self-modeling task - unless afforded a scratchpad in the form of a reasoning trace
This is explicitly stated in the abstract and supported by the text contrasting performance with and without a scratchpad.
partial
we further demonstrate cognitive load effects on other-modeling tasks, offering suggestive evidence that LLMs are using something akin to limited-capacity working memory to hold these mental representations in mind during a single forward pass
The abstract suggests this based on observed cognitive load effects, indicating suggestive evidence rather than a definitive conclusion.
partial
we show that they readily engage in strategic deception
The abstract states this as a finding from exploring the mechanisms by which reasoning models succeed.
partial
We therefore develop a novel experimental paradigm that requires that subjects form representations of the mental states of themselves and others and act on them strategically rather than merely describe them
The abstract clearly describes the development of a new paradigm with specific requirements.
partial
We test a wide range of leading open and closed source LLMs released since 2024, as well as human subjects, on this paradigm
The abstract explicitly mentions testing human subjects alongside LLMs.
partial
We find that 1) LLMs released before mid-2025 fail at all of our tasks
This is explicitly stated in the abstract and supported by the findings presented in Figure 2 (nonthinking models).
partial
2) more recent LLMs achieve human-level performance on modeling the cognitive states of others
This is explicitly stated in the abstract and supported by the trend shown in Figure 2 (nonthinking models) for other-modeling tasks.
partial
3) and even frontier LLMs fail at our self-modeling task - unless afforded a scratchpad in the form of a reasoning trace.
This is explicitly stated in the abstract and supported by the comparison between 'nonthinking' and 'thinking' models in Figure 3.
partial
We further demonstrate cognitive load effects on other-modeling tasks, offering suggestive evidence that LLMs are using something akin to limited-capacity working memory to hold these mental representations in mind during a single forward pass.
The abstract suggests this as 'suggestive evidence' based on observed cognitive load effects.
partial
Finally, we explore the mechanisms by which reasoning models succeed at the self- and other-modeling tasks, and show that they readily engage in strategic deception.
This is stated in the abstract as a finding from exploring the mechanisms of successful self- and other-modeling.
partial
Use an AI coding agent to implement this research.
Lightweight coding agent in your terminal.
Agentic coding tool for terminal workflows.
AI agent mindset installer and workflow scaffolder.
AI-first code editor built on VS Code.
Free, open-source editor by Microsoft.
Estimated $10K - $14K over 6-10 weeks.
See exactly what it costs to build this -- with 3 comparable funded startups.
7-day free trial. Cancel anytime.
Discover the researchers behind this paper and find similar experts.
7-day free trial. Cancel anytime.
Time to first demo
Insufficient data
No first-demo timestamp, owner estimate, or elapsed demo receipt is attached to this surface.
Structured compute envelope
Insufficient data
No data, compute, hardware, memory, latency, dependency, or serving requirement receipt is attached.
Receipt path
/buildability/selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mind
Paper ref
selective-deficits-in-llm-mental-self-modeling-in-a-behavior-based-test-of-theory-of-mind
arXiv id
2603.26089
Generated at
2026-03-30T21:55:06.832Z
Evidence freshness
stale
Last verification
2026-03-30T21:55:06.832Z
Sources
3
References
17
Coverage
50%
Lineage hash
a57ed63b2bc4c1a329f31f3b14c560c5c45fea715bd9b307df176d2827b57c2b
Canonical opportunity-kernel lineage hash.
External signature
unsigned_external
No founder, registry, pilot, or production-adoption signature is attached to this receipt.
Verification
not_verified
Verification is blocked until an external signature is provided.
17 refs / 3 sources / Verification pending
repo_url
proof_status