Towards More Standardized AI Evaluation: From Models to Agents | ScienceToStartup | ScienceToStartup