ARXIV:2604.27419 · MULTIMODAL AGENTS · SUBMITTED 01 MAY · 15:04 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

Qiyao Wang · Haoran Hu · Longze Chen · Hongbo Wang · Hamid Alinejad-Rokny · Yuan Lin · +1 at arXiv

InteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain InteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions.

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

InteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution settings.

METHOD

Full abstract

With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution settings. In contrast, real-world development is constrained by a critical bottleneck: the semantic misalignment between ambiguous, low-quality instructions from non-expert users and model understanding, which results in a failure mode that we term blind execution. To address this gap, we introduce InteractWeb-Bench, the first multimodal interactive benchmark for website generation under non-expert low-code user conditions. InteractWeb-Bench introduces four types of user agents and persona-driven instruction perturbations to systematically simulate diverse user behaviors, including ambiguity, redundancy, and contradiction, grounded in requirement engineering defect taxonomies. We develop an interactive execution environment for agents, featuring a unified action space comprising Clarify, Implement, Verify, and Submit, enabling iterative intent refinement, code synthesis, and visual feedback-based validation. Extensive experiments and analysis reveal that frontier MLLM-based agents remain trapped in blind execution, exposing limitations in intent recognition and adaptive interaction.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. In contrast, real-world development is constrained by a critical bottleneck: the semantic misalignment between ambiguous, low-quality instructions from non-expert users and model understanding, which…

WHY NOW

Multimodal Agents moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainInteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

InteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

InteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions.

Segment

Multimodal Agents

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "a5485ab2-da7c-4c69-8a1b-a1a7d906ffc6", "arxiv_id": "2604.27419", "canonical_route": "/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation", "endpoints": { "paper_pack": "/api/v1/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation/paper-pack", "build_passport": "/api/v1/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?", "normalized_query": "2604.27419", "route": "/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation", "paper_ref": "interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation#webpage", "url": "https://sciencetostartup.com/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation", "name": "InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?", "description": "InteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation#scholarlyArticle", "headline": "InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?", "description": "InteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions.", "url": "https://sciencetostartup.com/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation", "sameAs": "https://arxiv.org/abs/2604.27419", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.27419" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-30T04:49:34.000Z", "author": [ { "@type": "Person", "name": "Qiyao Wang" }, { "@type": "Person", "name": "Haoran Hu" }, { "@type": "Person", "name": "Longze Chen" }, { "@type": "Person", "name": "Hongbo Wang" }, { "@type": "Person", "name": "Hamid Alinejad-Rokny" }, { "@type": "Person", "name": "Yuan Lin" }, { "@type": "Person", "name": "Min Yang" } ], "codeRepository": "https://github.com/AIforIP/InteractWeb-Bench", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Multimodal Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation#software", "name": "InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? - Source Code", "description": "InteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions.", "codeRepository": "https://github.com/AIforIP/InteractWeb-Bench", "url": "https://github.com/AIforIP/InteractWeb-Bench" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Multimodal Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "InteractWeb-Bench: Can Multimodal Agent Escape Blind Executi", "item": "https://sciencetostartup.com/paper/interactweb-bench-can-multimodal-agent-escape-blind-execution-in-interactive-website-generation" } ] } ] }

Competitive landscape

InteractWeb-Bench, a new benchmark and interactive environment for evaluating multimodal agents in website generation under realistic, ambiguous user instructions.

Segment

Multimodal Agents

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline