Proof pending. This topic has not reached the minimum paper threshold yet.
Large language models (LLMs) are widely used as reference-free evaluators via prompting, but this "LLM-as-a-Judge" paradigm is costly, opaque, and sensitive to prompt design. In this work, we investig...
As comprehensive large model evaluation becomes prohibitively expensive, predicting model performance from limited observations has become essential. However, existing statistical methods struggle wit...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID ai-model-evaluation | Route /topic/ai-model-evaluation
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/ai-model-evaluationMCP example
{
"tool": "search_papers",
"arguments": {
"query": "AI Model Evaluation",
"cluster": "AI Model Evaluation"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "AI Model Evaluation",
"normalized_query": "ai-model-evaluation",
"route": "/topic/ai-model-evaluation",
"paper_ref": null,
"topic_slug": "ai-model-evaluation",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.