Web Agents

Proof pending

9papers

6.9viability

-33%30d

Proof pending

Proof pending. Core topic summary fields are still materializing.

State of the Field

Web agents are increasingly capable of automating complex online tasks, but their effectiveness is hindered by challenges such as limited training data, vulnerability to prompt injection attacks, and inadequate evaluation frameworks. Recent advancements like AutoSurfer and SnapGuard address these issues by improving trajectory generation and detection of malicious content, respectively. Additionally, frameworks such as Region4Web and GTA enhance the granularity of observations and task generation, leading to better performance in diverse environments. The emergence of evaluation tools like WebSP-Eval and TimeWarp further enables the assessment of agents in security and evolving web contexts. These developments are crucial for builders aiming to create robust, efficient web agents that can navigate the complexities of real-world web interactions and maintain security against evolving threats.

Last updated Jun 3, 2026

Topic-linked question coverage is still building for this proof surface.

Topic trend

Topic-specific paper and score movement from the daily diff ledger.

Papers

1-9 of 9

Research Paper·Apr 29, 2026

AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling

Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-q...

8.0 viability

Research Paper·Apr 28, 2026

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into web...

7.0 viability

Research Paper·May 8, 2026

Region4Web: Rethinking Observation Space Granularity for Web Agents

Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as t...

7.0 viability

Research Paper·May 28, 2026·B2BConsumer

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

Web agents, which couple language models with browsing and tool-use capabilities, show promise as open web assistants. Yet progress is increasingly limited by the lack of scalable, process-level super...

7.0 viabilityHas code

Research Paper·May 14, 2026·B2BConsumer

WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

Web agents can autonomously complete online tasks by interacting with websites, but their exposure to open web environments makes them vulnerable to prompt injection attacks embedded in HTML content o...

7.0 viabilityHas code

Research Paper·Apr 7, 2026

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or s...

7.0 viability

Research Paper·Mar 5, 2026

TimeWarp: Evaluating Web Agents by Revisiting the Past

The improvement of web agents on current benchmarks raises the question: Do today's agents perform just as well when the web changes? We introduce TimeWarp, a benchmark that emulates the evolving web ...

7.0 viability

Research Paper·Mar 5, 2026

WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

We introduce WebChain, the largest open-source dataset of human-annotated trajectories on real-world websites, designed to accelerate reproducible research in web agents. It contains 31,725 trajectori...

7.0 viability

Research Paper·Apr 2, 2026

Web Agents

Proof pending

State of the Field

Topic trend

Papers

AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Region4Web: Rethinking Observation Space Granularity for Web Agents

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

TimeWarp: Evaluating Web Agents by Revisiting the Past

WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

Read More, Think More: Revisiting Observation Reduction for Web Agents

Filters

Topic proof surfaces

Web Agents

Use this topic page as a durable research-area proof surface