When Right Meets Wrong: Bilateral Context Conditioning with Reward-Confidence Correction for GRPO | Signal Canvas | ScienceToStartup