PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models | ScienceToStartup