SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models | ScienceToStartup | ScienceToStartup