Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation | ScienceToStartup | ScienceToStartup