Current research in autonomous agents is increasingly focused on enhancing adaptability and reliability across diverse applications, addressing critical limitations in existing systems. Recent developments highlight frameworks like MetaClaw, which enables continuous learning and skill evolution without downtime, and CycleRL, which employs deep reinforcement learning to improve control in autonomous bicycles, showcasing adaptability in real-world scenarios. Additionally, advancements in memory management for GUI agents, as seen in AndroTMem, emphasize the importance of structured interaction memory for long-horizon tasks. The introduction of Boundary-Aware Policy Optimization aims to enhance the reliability of agentic search by encouraging agents to recognize their limitations. Meanwhile, platforms like EnterpriseLab streamline the development and deployment of enterprise agents, balancing capability with cost and data privacy. These innovations collectively signal a shift toward more robust, scalable, and user-centric agent systems, capable of operating effectively in complex environments while minimizing operational risks.
Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for c...
RL-based agentic search enables LLMs to solve complex questions via dynamic planning and external search. While this approach significantly enhances accuracy with agent policies optimized via large-sc...
Despite advances in multimodal large language models, autonomous web agents still struggle to reliably execute long-horizon tasks on complex and dynamic web interfaces. Existing agents often suffer fr...
LLM-based multi-agent systems (MAS) have emerged as a promising approach to tackle complex tasks that are difficult for individual LLMs. A natural strategy is to scale performance by increasing the nu...
Graphical User Interface (GUI) agent is pivotal to advancing intelligent human-computer interaction paradigms. Constructing powerful GUI agents necessitates the large-scale annotation of high-quality ...
Deploying AI agents in enterprise environments requires balancing capability with data sovereignty and cost constraints. While small language models offer privacy-preserving alternatives to frontier m...
Autonomous bicycles offer a promising agile solution for urban mobility and last-mile logistics, however, conventional control strategies often struggle with their underactuated nonlinear dynamics, su...
Large Language Model (LLM) Agents exhibit inherent reasoning abilities through the collaboration of multiple tools. However, during agent inference, existing methods often suffer from (i) locally myop...
Agent Skills are structured packages of procedural knowledge that augment LLM agents at inference time. Despite rapid adoption, there is no standard way to measure whether they actually help. We prese...
Tool-integrated LLMs can retrieve, compute, and take real-world actions via external tools, but reliability remains a key bottleneck. We argue that failures stem from both tool-use accuracy (how well ...