Recent advancements in large language model (LLM) reasoning are focusing on enhancing efficiency and accuracy in knowledge-intensive tasks. Techniques such as reinforcement learning with verifiable rewards are being employed to improve multi-hop reasoning capabilities, allowing models to navigate complex knowledge graphs more effectively. Innovations like confidence-aware self-consistency frameworks are optimizing reasoning paths by reducing unnecessary computational overhead while maintaining accuracy. Additionally, approaches that frame reasoning as uncertainty minimization are enabling models to select optimal continuations based on internal confidence metrics, which enhances performance across various benchmarks. The field is also exploring the dynamics of reasoning errors, revealing parallels with human biases, and developing methods that adaptively optimize reasoning strategies based on input difficulty. These developments not only promise to refine LLM reasoning abilities but also hold potential for commercial applications in areas like automated customer support, medical diagnostics, and complex problem-solving, where nuanced understanding and efficient processing are critical.
Chain-of-Thought (CoT) prompting has significantly improved the reasoning capabilities of large language models (LLMs). However, conventional CoT often relies on unstructured, flat reasoning chains th...
While large language models (LLMs) have demonstrated strong performance on complex reasoning tasks such as competitive programming (CP), existing methods predominantly focus on single-attempt settings...
Large Language Models (LLMs) demonstrate impressive natural language capabilities but often struggle with knowledge-intensive reasoning tasks. Knowledge Base Question Answering (KBQA), which leverages...
In-Context Reinforcement Learning (ICRL) enables Large Language Models (LLMs) to learn online from external rewards directly within the context window. However, a central challenge in ICRL is reward e...
Reinforcement learning from verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models. However, standard Group Relative Policy Optimization (GRPO) typica...
Geometric Problem Solving (GPS) remains at the heart of enhancing mathematical reasoning in large language models because it requires the combination of diagrammatic understanding, symbolic manipulati...
We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectabl...
Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant m...
Reinforcement learning with verifiable rewards (RLVR) has substantially improved the reasoning capabilities of large language models. While existing analyses identify that RLVR-induced changes are spa...
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs). However, vanilla RLVR suffers from in...