FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling | ScienceToStartup