AI Evaluation Tools

Proof pending

3papers

6.3viability

-50%30d

Proof pending

Proof pending. This topic has not reached the minimum paper threshold yet.

Topic-linked question coverage is still building for this proof surface.

Papers

1-2 of 2

Research Paper·Jan 30, 2026

Automating Forecasting Question Generation and Resolution for AI Evaluation

Forecasting future events is highly valuable in decision-making and is a robust measure of general intelligence. As forecasting is probabilistic, developing and evaluating AI forecasters requires gene...

7.0 viability

Research Paper·Jan 14, 2026

What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding

Large language model (LLM) agents have demonstrated remarkable capabilities in complex decision-making and tool-use tasks, yet their ability to generalize across varying environments remains a under-e...

5.0 viability

AI Evaluation Tools

Proof pending

Papers

Automating Forecasting Question Generation and Resolution for AI Evaluation

What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding

Filters

Topic proof surfaces

AI Evaluation Tools

Use this topic page as a durable research-area proof surface