Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation | Signal Canvas | ScienceToStartup