ARXIV:2603.26648 · WEB DEVELOPMENT AGENTS · SUBMITTED 31 MAR · 20:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

Zehai He · Wenyi Hong · Zhen Yang · Ziyang Pan · Mingdao Liu · Xiaotao Gu · +1 at arXiv

A hierarchical benchmark and agent verification system for evaluating and improving AI agents in visual website development.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A hierarchical benchmark and agent verification system for evaluating and improving AI agents in visual website development.

Evidence 51 refs | 3 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A hierarchical benchmark and agent verification system for evaluating and improving AI agents in visual website development. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static…

METHOD

Full abstract

Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website development. The benchmark is constructed from real-world websites and comprises a total of 193 tasks across 16 categories, with 918 prototype images and 1,255 test cases. To support flexible, thorough and reliable evaluation, we propose workflow-based agent verification paradigm based on two complementary components: a GUI agent verifier and a VLM-based judge. We evaluate multiple visual language models instantiated under different coding-agent frameworks, revealing substantial performance gaps at all task levels, with state-of-the-art models still struggling on full-stack development.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. To support flexible, thorough and reliable evaluation, we propose workflow-based agent verification paradigm based on two complementary components: a GUI agent verifier and a…

WHY NOW

Web Development Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA hierarchical benchmark and agent verification system for evaluating and improving AI agents in visual website development.

Evidence51 refs | 3 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

A hierarchical benchmark and agent verification system for evaluating and improving AI agents in visual website development.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A hierarchical benchmark and agent verification system for evaluating and improving AI agents in visual website development.

Segment

Web Development Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "07a5e770-2d93-43ad-8117-3440aa0f9f54", "arxiv_id": "2603.26648", "canonical_route": "/paper/vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification", "endpoints": { "paper_pack": "/api/v1/paper/vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification/paper-pack", "build_passport": "/api/v1/paper/vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification", "normalized_query": "2603.26648", "route": "/paper/vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification", "paper_ref": "vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification#webpage", "url": "https://sciencetostartup.com/paper/vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification", "name": "Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification", "description": "A hierarchical benchmark and agent verification system for evaluating and improving AI agents in visual website development.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification#scholarlyArticle", "headline": "Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification", "description": "A hierarchical benchmark and agent verification system for evaluating and improving AI agents in visual website development.", "url": "https://sciencetostartup.com/paper/vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification", "sameAs": "https://arxiv.org/abs/2603.26648", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26648" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T17:50:45.000Z", "author": [ { "@type": "Person", "name": "Zehai He" }, { "@type": "Person", "name": "Wenyi Hong" }, { "@type": "Person", "name": "Zhen Yang" }, { "@type": "Person", "name": "Ziyang Pan" }, { "@type": "Person", "name": "Mingdao Liu" }, { "@type": "Person", "name": "Xiaotao Gu" }, { "@type": "Person", "name": "Jie Tang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Web Development Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Web Development Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Vision2Web: A Hierarchical Benchmark for Visual Website Deve", "item": "https://sciencetostartup.com/paper/vision2web-a-hierarchical-benchmark-for-visual-website-development-with-agent-verification" } ] } ] }

Competitive landscape

A hierarchical benchmark and agent verification system for evaluating and improving AI agents in visual website development.

Segment

Web Development Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline