Sparsity-Aware Rejection Sampling is a technique used in Reinforcement Learning (RL) for Large Language Models (LLMs) to enable stable training under sparse rollouts. It mitigates policy mismatch caused by KV cache compression, correcting off-policy bias and reducing memory overhead while preserving performance.
Sparsity-Aware Rejection Sampling helps train large language models using reinforcement learning more efficiently by reducing memory usage. It does this by carefully managing compressed data to prevent the model from getting confused, ensuring it learns correctly without losing performance.
Was this definition helpful?