Proof pending. Core topic summary fields are still materializing.
Web agents are increasingly capable of automating complex online tasks, but their effectiveness is hindered by challenges such as limited training data, vulnerability to prompt injection attacks, and inadequate evaluation frameworks. Recent advancements like AutoSurfer and SnapGuard address these issues by improving trajectory generation and detection of malicious content, respectively. Additionally, frameworks such as Region4Web and GTA enhance the granularity of observations and task generation, leading to better performance in diverse environments. The emergence of evaluation tools like WebSP-Eval and TimeWarp further enables the assessment of agents in security and evolving web contexts. These developments are crucial for builders aiming to create robust, efficient web agents that can navigate the complexities of real-world web interactions and maintain security against evolving threats.
Topic-specific paper and score movement from the daily diff ledger.
Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-q...
Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into web...
Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as t...
Web agents, which couple language models with browsing and tool-use capabilities, show promise as open web assistants. Yet progress is increasingly limited by the lack of scalable, process-level super...
Web agents can autonomously complete online tasks by interacting with websites, but their exposure to open web environments makes them vulnerable to prompt injection attacks embedded in HTML content o...
Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or s...
The improvement of web agents on current benchmarks raises the question: Do today's agents perform just as well when the web changes? We introduce TimeWarp, a benchmark that emulates the evolving web ...
We introduce WebChain, the largest open-source dataset of human-annotated trajectories on real-world websites, designed to accelerate reproducible research in web agents. It contains 31,725 trajectori...
Web agents based on large language models (LLMs) rely on observations of web pages -- commonly represented as HTML -- as the basis for identifying available actions and planning subsequent steps. Prio...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID web-agents | Route /topic/web-agents
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/web-agentsMCP example
{
"tool": "search_papers",
"arguments": {
"query": "Web Agents",
"cluster": "Web Agents"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "Web Agents",
"normalized_query": "web-agents",
"route": "/topic/web-agents",
"paper_ref": null,
"topic_slug": "web-agents",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.