When Right Meets Wrong: Bilateral Context Conditioning with Reward-Confidence Correction for GRPO | ScienceToStartup | ScienceToStartup