Adaptive pruning is a post-training model compression technique for LLMs that intelligently selects which layers to prune. It uses an AI agent to determine per-layer sparsity ratios based on dynamic sensitivity profiles, aiming to preserve critical knowledge pathways and mitigate factual degradation.
Adaptive pruning is a smart way to make large AI models smaller and faster without losing their key knowledge. Instead of removing parts uniformly, it uses another AI to figure out which parts are least important and can be removed, especially for language models that need to remember facts.
agent-guided pruning, dynamic pruning, intelligent pruning
Was this definition helpful?