DiT4DiT: Jointly Modeling Video Dynamics and Actions for Generalizable Robot Control | ScienceToStartup | ScienceToStartup