Can Vision-Language Models Solve the Shell Game? | ScienceToStartup | ScienceToStartup