How can Vision Transformers be fine-tuned for video analysis | ScienceToStartup | ScienceToStartup