Towards GUI Agents: Vision-Language Diffusion Models for GUI Grounding | ScienceToStartup | ScienceToStartup