Orthogonal Subspace Wake-up (OSW) is a novel technique designed to tackle the critical stability-plasticity dilemma in continual learning for Large Language Models (LLMs). This dilemma arises when LLMs are sequentially trained on new tasks, often leading to catastrophic forgetting of previously acquired knowledge. OSW addresses this by first identifying essential parameter subspaces corresponding to prior tasks through a brief "wake-up" phase. Subsequently, when learning new tasks, OSW enforces that parameter updates occur orthogonally to these identified subspaces. This core mechanism provides a mathematically grounded "safety guarantee," ensuring that the structural integrity of established knowledge, especially in fragile, structured domains like code generation, is preserved. It is particularly relevant for researchers and ML engineers developing LLMs that must continuously adapt to new information or tasks without compromising their existing, complex capabilities, offering a robust alternative to methods like Experience Replay which can cause negative transfer in such scenarios.
Orthogonal Subspace Wake-up (OSW) is a new method for Large Language Models to learn new things without forgetting old, complex skills, especially coding. It works by protecting the parts of the model responsible for old knowledge while allowing new learning in separate, "orthogonal" directions, preventing the common problem of catastrophic forgetting.
OSW
Was this definition helpful?