Multimodal Language Models Cannot Spot Spatial Inconsistencies | ScienceToStartup | ScienceToStartup