Watch Before You Answer: Learning from Visually Grounded Post-Training | ScienceToStartup