Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings | ScienceToStartup | ScienceToStartup