Alternatives to Group Relative Policy Optimization (GRPO) | ScienceToStartup