Reinforcement Learning with Verifiable Rewards (RLVR) is a training paradigm for LLMs that explicitly rewards abstention ("I don't know") alongside correctness. It employs a ternary reward structure to promote intellectual humility, reduce hallucinations, and enhance model reliability in factual domains.
Reinforcement Learning with Verifiable Rewards (RLVR) trains AI models, especially large language models, to be more honest and reliable. It teaches them to say "I don't know" when unsure, alongside giving correct answers, by using a special reward system. This helps reduce false information and makes AI more trustworthy in important factual areas.
Verifiable RL, Abstention RL, RL with Intellectual Humility
Was this definition helpful?