Alternatives to vision-language models | ScienceToStartup