How can vision language models be adapted for real-time vide | ScienceToStartup | ScienceToStartup