What are the challenges of real-world navigation tasks for existing large vision-language models?Answer not yet generated.