On-policy reverse KL divergence is a metric used in online learning to align probability distributions, specifically employed with importance-aware weighting to prioritize critical tokens. It helps bridge representational gaps, such as the acoustic-semantic gap in Large Audio Language Models.
On-policy reverse KL divergence is a method used in AI models, especially those combining audio and text, to make sure the audio part learns correctly from the text part. It helps fix issues where audio models struggle to understand and reason, by focusing on important information during the learning process.
reverse KL, on-policy KL, online reverse KL
Was this definition helpful?