How can AI benchmarks be adapted to evaluate AI systems in s | ScienceToStartup