Recent advancements in AI optimization are focusing on refining the efficiency and effectiveness of large language models and generative frameworks. Techniques like process-supervised reinforcement learning are being developed to enhance retrieval-augmented generation by providing granular feedback on reasoning processes, addressing issues of reward sparsity and flawed logic. Meanwhile, multi-agent reinforcement learning is being employed to streamline reasoning by penalizing redundancy without sacrificing accuracy, leading to more concise outputs. Additionally, the exploration-exploitation balance in generative flow networks is being fine-tuned through new frameworks that allow for better mode discovery. Innovations in grounding methods are also emerging, utilizing large language models to expedite the grounding process in planning tasks. Collectively, these efforts aim to tackle commercial challenges such as reducing computational overhead and improving the deployment of AI systems, ultimately enhancing user experience and operational efficiency across various applications.
Post-training for long-horizon agentic tasks has a tension between compute efficiency and generalization. While supervised fine-tuning (SFT) is compute efficient, it often suffers from out-of-domain (...
Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with au...
Reinforcement learning (RL) has become a promising paradigm for optimizing Retrieval-Augmented Generation (RAG) in complex reasoning tasks. However, traditional outcome-based RL approaches often suffe...
The inference overhead induced by redundant reasoning undermines the interactive experience and severely bottlenecks the deployment of Large Reasoning Models. Existing reinforcement learning (RL)-base...
Generative Flow Network (GFlowNet) objectives implicitly fix an equal mixing of forward and backward policies, potentially constraining the exploration-exploitation trade-off during training. By furth...
As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing ...
Grounding is a critical step in classical planning, yet it often becomes a computational bottleneck due to the exponential growth in grounded actions and atoms as task size increases. Recent advances ...
The choice of activation function is an active area of research, with different proposals aimed at improving optimization, while maintaining expressivity. Additionally, the activation function can sig...