Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages | ScienceToStartup | ScienceToStartup