121 papers - avg viability 5.6
Current research on AI agents is increasingly focused on enhancing their autonomy and effectiveness across various domains, addressing commercial challenges such as efficiency in complex workflows and user interaction. Recent developments include frameworks that enable proactive agents to anticipate user needs and execute tasks autonomously, significantly improving user experience in digital environments. In investment banking, new benchmarks are being established to evaluate AI agents' performance in high-stakes analytical tasks, revealing substantial gaps in current models' capabilities. Additionally, the integration of multimodal skills is being explored to enhance visual agents' decision-making processes, allowing them to interpret and act on diverse inputs more effectively. This trend towards creating more robust and capable AI agents is underscored by efforts to bridge the gap between semantic understanding and physical action, as seen in applications ranging from sensor scheduling to scientific discovery, indicating a shift toward practical, deployment-ready systems that can operate reliably in real-world scenarios.
CMBEvolve and CosmoEvolve are agentic systems for cosmology that enable autonomous scientific discovery through LLM-guided code evolution and multi-agent research laboratories.
An AI-powered platform for scalable problem reductions, enabling easy integration of diverse computational problems with various solvers through a robust harness engineering approach.
Robust Agent Compensation (RAC) is an architectural extension for AI agent frameworks that provides a log-based safety net for reliable execution and improved latency and token economy.
An agentic framework for evaluating and assuring prompt-driven behavior in foundation models for mental health screening, ensuring stability and auditability.
MolLingo is a multi-agent system that automates molecular design by emulating chemist reasoning with domain-specific tools and a novel molecule representation, outperforming frontier LLMs in therapeutic design.
IoT-Brain bridges LLMs and sensor networks for proactive, intent-driven physical world interaction through semantic-spatial sensor scheduling.
A framework for creating and using reusable multimodal skills for visual agents, enabling them to learn from diverse interaction data and improve decision-making.
An architecture for safe autonomous agents that uses out-of-band metadata channels to enforce governance and security.
An agent framework that autonomously runs Monte Carlo simulations for colloidal packing using a custom Python package and LLM skills.
BankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards.