How can multimodal models in generative video improve the understanding of complex scenes?Answer not yet generated.