How does early vision-language fusion enhance generative mod | ScienceToStartup | ScienceToStartup