How can AI benchmarking frameworks be adapted for evaluating | ScienceToStartup