Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling | ScienceToStartup