How does the architecture of BiCLIP differ from other contrastive learning-based vision-language models?Answer not yet generated.