How can vision-language fusion be leveraged to improve the relevance of distilled data?Answer not yet generated.