Skip to main content
Mitigating Distribution Sharpening in Math RLVR via Distribution-Aligned Hint Synthesis and Backward Hint Annealing | Signal Canvas | ScienceToStartup