Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings | Signal Canvas | ScienceToStartup