Proof pending. This topic has not reached the minimum paper threshold yet.
AI agents need to plan to achieve complex goals that involve orchestrating perception, sub-goal decomposition, and execution. These plans consist of ordered steps structured according to a Temporal Ex...
As LLMs achieved breakthroughs in general reasoning, their proficiency in specialized scientific domains reveals pronounced gaps in existing benchmarks due to data contamination, insufficient complexi...
Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions. However, assessing how well LLMs can follow ...
Existing benchmarks for AI reasoning provide limited insight into how closely these capabilities resemble human reasoning in naturalistic contexts. We present an adaptation of the Watson & Holmes dete...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID benchmark-development | Route /topic/benchmark-development
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/benchmark-developmentMCP example
{
"tool": "search_papers",
"arguments": {
"query": "Benchmark Development",
"cluster": "Benchmark Development"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "Benchmark Development",
"normalized_query": "benchmark-development",
"route": "/topic/benchmark-development",
"paper_ref": null,
"topic_slug": "benchmark-development",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.