How can vision-language models be leveraged for explainable AI in visual question answering?Answer not yet generated.