Skip to main content
From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism | ScienceToStartup