AI Model Evaluation

TrendingProof pending

3papers

5.0viability

+100%30d

Proof pending

Proof pending. This topic has not reached the minimum paper threshold yet.

Topic-linked question coverage is still building for this proof surface.

Papers

1-2 of 2

Research Paper·Jan 30, 2026

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

Large language models (LLMs) are widely used as reference-free evaluators via prompting, but this "LLM-as-a-Judge" paradigm is costly, opaque, and sensitive to prompt design. In this work, we investig...

5.0 viability

Research Paper·Feb 12, 2026

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction

As comprehensive large model evaluation becomes prohibitively expensive, predicting model performance from limited observations has become essential. However, existing statistical methods struggle wit...

5.0 viability

AI Model Evaluation

Proof pending

Papers

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction

Filters

Topic proof surfaces

AI Model Evaluation

Use this topic page as a durable research-area proof surface