ScienceToStartup
Product
Trends
Topics
Saved
Articles
Changelog
Careers
About
Enterprise
Resources
State of Benchmark Development | Report | ScienceToStartup
Home
Resources
State Reports
Benchmark Development
State of Benchmark Development
4 papers · avg viability 5.0
Download CSV
View topic page
Top papers
MATEO: A Multimodal Benchmark for Temporal Reasoning and Planning in LVLMs
(5.0)
SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy
(5.0)
Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions
(5.0)
Watson & Holmes: A Naturalistic Benchmark for Comparing Human and LLM Reasoning
(5.0)