Proof pending. This topic has not reached the minimum paper threshold yet.
The rapid advancement of visual generation models has outpaced traditional evaluation approaches, necessitating the adoption of Vision-Language Models as surrogate judges. In this work, we systematica...
Evaluating mathematical reasoning in LLMs is constrained by limited benchmark sizes and inherent model stochasticity, yielding high-variance accuracy estimates and unstable rankings across platforms. ...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID evaluation-frameworks | Route /topic/evaluation-frameworks
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/evaluation-frameworksMCP example
{
"tool": "search_papers",
"arguments": {
"query": "Evaluation Frameworks",
"cluster": "Evaluation Frameworks"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "Evaluation Frameworks",
"normalized_query": "evaluation-frameworks",
"route": "/topic/evaluation-frameworks",
"paper_ref": null,
"topic_slug": "evaluation-frameworks",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.