423 papers - avg viability 4.7
Reinforcement learning (RL) is advancing rapidly, focusing on enhancing the reasoning capabilities of models through innovative frameworks like hierarchical skill management and efficient reward design. Recent developments, such as ARISE and CoUR, streamline the training process by leveraging intrinsic skills and large language models to optimize reward functions. These methodologies enable models to learn from diverse interactions, improving their adaptability and performance across various tasks. The integration of structured exploration techniques and robust representation methods further enhances the efficiency of RL systems, making them more applicable to real-world scenarios. As builders seek to implement RL in practical applications, these advancements provide essential tools for developing intelligent agents capable of complex decision-making and problem-solving in dynamic environments.
ARISE enhances mathematical reasoning in language models through a hierarchical reinforcement learning framework that evolves skills over time.
OpenClaw-RL enables agents to learn from user interactions in real-time, enhancing their performance through continuous feedback.
A framework for automatically generating high-performance reinforcement learning environments with minimal engineering effort.
ProgAgent is a continual reinforcement learning agent that learns from unlabeled expert videos and adapts to new tasks, offering a robust solution for lifelong robotic learning.
R2R2 is a regularization method for Self-Predictive Learning that reduces overfitting in data-scarce reinforcement learning domains, improving agent efficiency.
Accelerate RL with FLAME, delivering one-step flow matching for optimal policy efficiency and low latency.
A novel reinforcement learning agent that efficiently learns from all goals simultaneously, significantly outperforming existing methods and offering a >250x speed-up.
NudgeRL provides efficient, strategy-guided exploration for RLVR, outperforming brute-force scaling and oracle-guided methods on challenging math benchmarks.
A novel framework using LLMs to automate and optimize reward function design in reinforcement learning, reducing evaluation costs and improving performance.
Cobalt enhances code generation in LLMs using a cost-effective hybrid of online and offline RL.