Multi-Reward RL Optimization: GDPO for Language Models | ScienceToStartup | ScienceToStartup