ARXIV:2603.25864 · GUI AGENTS · SUBMITTED 30 MAR · 21:56 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

Saelyne Yang · Jaesang Yu · Yi-Hao Peng · Kevin Qinghong Lin · Jae Won Cho · Yale Song · +1 at arXiv

A benchmark and dataset for AI agents that understand user intent in graphical interfaces to provide collaborative assistance.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A benchmark and dataset for AI agents that understand user intent in graphical interfaces to provide collaborative assistance.

Evidence 131 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A benchmark and dataset for AI agents that understand user intent in graphical interfaces to provide collaborative assistance. While prior research has primarily focused on automating user actions through clicks and keystrokes, this paradigm…

METHOD

Full abstract

Graphical User Interface (GUI) agents have the potential to assist users in interacting with complex software (e.g., PowerPoint, Photoshop). While prior research has primarily focused on automating user actions through clicks and keystrokes, this paradigm overlooks human intention, where users value the ability to explore, iterate, and refine their ideas while maintaining agency. To move beyond automation and toward collaboration, GUI agents must understand what users are doing and why. We introduce GUIDE (GUI User Intent Detection Evaluation), a benchmark that evaluates AI models on their ability to perceive user behavior, infer intent, and provide assistance in open-ended GUI tasks. GUIDE consists of 67.5 hours of screen recordings from 120 novice user demonstrations with think-aloud narrations, across 10 software. GUIDE defines three tasks - (i) Behavior State Detection, (ii) Intent Prediction, and (iii) Help Prediction that test a model's ability to recognize behavior state, reason about goals, and decide when and how to help. Evaluations across eight state-of-the-art multimodal models reveal that all models struggled, achieving only 44.6% and 55.0% accuracy on behavior state and help prediction. However, providing user context significantly improved the performance, raising help prediction by up to 50.2pp, highlighting the critical role of structured user understanding in effective assistance. Our dataset is available at https://guide-bench.github.io.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Our dataset is available at https://guide-bench.github.io. Code availability is flagged in the production record; the public repository link still needs proof alignment.

WHY NOW

GUI Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA benchmark and dataset for AI agents that understand user intent in graphical interfaces to provide collaborative assistance.

Evidence131 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A benchmark and dataset for AI agents that understand user intent in graphical interfaces to provide collaborative assistance.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A benchmark and dataset for AI agents that understand user intent in graphical interfaces to provide collaborative assistance.

Segment

GUI Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "4bf2dccc-f2e4-492f-aaa7-31ec1b279fe3", "arxiv_id": "2603.25864", "canonical_route": "/paper/guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks", "endpoints": { "paper_pack": "/api/v1/paper/guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks/paper-pack", "build_passport": "/api/v1/paper/guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks", "normalized_query": "2603.25864", "route": "/paper/guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks", "paper_ref": "guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks#webpage", "url": "https://sciencetostartup.com/paper/guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks", "name": "GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks", "description": "A benchmark and dataset for AI agents that understand user intent in graphical interfaces to provide collaborative assistance.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks#scholarlyArticle", "headline": "GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks", "description": "A benchmark and dataset for AI agents that understand user intent in graphical interfaces to provide collaborative assistance.", "url": "https://sciencetostartup.com/paper/guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks", "sameAs": "https://arxiv.org/abs/2603.25864", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.25864" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-26T19:37:53.000Z", "author": [ { "@type": "Person", "name": "Saelyne Yang" }, { "@type": "Person", "name": "Jaesang Yu" }, { "@type": "Person", "name": "Yi-Hao Peng" }, { "@type": "Person", "name": "Kevin Qinghong Lin" }, { "@type": "Person", "name": "Jae Won Cho" }, { "@type": "Person", "name": "Yale Song" }, { "@type": "Person", "name": "Juho Kim" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "GUI Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "GUI Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "GUIDE: A Benchmark for Understanding and Assisting Users in ", "item": "https://sciencetostartup.com/paper/guide-a-benchmark-for-understanding-and-assisting-users-in-open-ended-gui-tasks" } ] } ] }

Competitive landscape

A benchmark and dataset for AI agents that understand user intent in graphical interfaces to provide collaborative assistance.

Segment

GUI Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline