SiMPO: Measure Matching for Online Diffusion Reinforcement Learning | ScienceToStartup | ScienceToStartup