on-policy reverse KL divergence | Glossary | ScienceToStartup