Poly-EPO: Training Exploratory Reasoning Models | ScienceToStartup