TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward | ScienceToStartup | ScienceToStartup