Learn Before Represent (LBR) is a novel two-stage framework proposed to overcome the limitations of Large Language Models (LLMs) adapted via contrastive learning (LLM+CL) in specialized or 'vertical' domains such as chemistry, law, or medicine. While LLM+CL excels in general representation learning, it often struggles with domain-specific terminology and knowledge acquisition. LBR addresses this by first injecting crucial domain knowledge through an Information Bottleneck-Constrained Generative Learning stage, which maximizes knowledge acquisition while compressing semantics and preserving the LLM's causal attention. Following this, it performs Generative-Refined Contrastive Learning on these compressed representations for semantic alignment. This approach maintains architectural consistency and effectively resolves the inherent objective conflict between generative and contrastive learning. LBR is critical for researchers and ML engineers aiming to build accurate and robust LLM-based systems for specialized applications where deep domain understanding is paramount, enabling LLMs to move beyond general-purpose tasks into highly technical fields.
Learn Before Represent (LBR) is a new method that helps large AI models understand specialized topics better. It works in two steps: first, it teaches the model specific knowledge, and then it refines how the model represents that knowledge. This allows AI models to perform much better in fields like medicine or chemistry where deep, specific understanding is needed.
LBR
Was this definition helpful?