ARXIV:2604.17849 · AI AGENTS · SUBMITTED 21 APR · 02:41 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

On the Reliability of Computer Use Agents

Gonzalo Gonzalez-Pumariega · Saaket Agashe · Jiachen Yang · Ang Li · Xin Eric Wang · arXiv

This paper analyzes the sources of unreliability in computer-use agents, focusing on stochasticity, ambiguity, and behavioral variability.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain This paper analyzes the sources of unreliability in computer-use agents, focusing on stochasticity, ambiguity, and behavioral variability.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

This paper analyzes the sources of unreliability in computer-use agents, focusing on stochasticity, ambiguity, and behavioral variability. Yet even when the task and model are unchanged, an agent that succeeds once may fail on…

METHOD

Full abstract

Computer-use agents have rapidly improved on real-world tasks such as web navigation, desktop automation, and software interaction, in some cases surpassing human performance. Yet even when the task and model are unchanged, an agent that succeeds once may fail on a repeated execution of the same task. This raises a fundamental question: if an agent can succeed at a task once, what prevents it from doing so reliably? In this work, we study the sources of unreliability in computer-use agents through three factors: stochasticity during execution, ambiguity in task specification, and variability in agent behavior. We analyze these factors on OSWorld using repeated executions of the same task together with paired statistical tests that capture task-level changes across settings. Our analysis shows that reliability depends on both how tasks are specified and how agent behavior varies across executions. These findings suggest the need to evaluate agents under repeated execution, to allow agents to resolve task ambiguity through interaction, and to favor strategies that remain stable across runs.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. Our analysis shows that reliability depends on both how tasks are specified and how agent behavior varies across executions.

WHY NOW

AI Agents moved forward this cycle; last verified April 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainThis paper analyzes the sources of unreliability in computer-use agents, focusing on stochasticity, ambiguity, and behavioral variability.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

This paper analyzes the sources of unreliability in computer-use agents, focusing on stochasticity, ambiguity, and behavioral variability.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

This paper analyzes the sources of unreliability in computer-use agents, focusing on stochasticity, ambiguity, and behavioral variability.

Segment

AI Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "81ee735d-70ae-4053-a6fc-928a01a1fc5e", "arxiv_id": "2604.17849", "canonical_route": "/paper/on-the-reliability-of-computer-use-agents", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "on-the-reliability-of-computer-use-agents", "endpoints": { "paper_pack": "/api/v1/paper/on-the-reliability-of-computer-use-agents/paper-pack", "build_passport": "/api/v1/paper/on-the-reliability-of-computer-use-agents/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "On the Reliability of Computer Use Agents", "normalized_query": "2604.17849", "route": "/paper/on-the-reliability-of-computer-use-agents", "paper_ref": "on-the-reliability-of-computer-use-agents", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/on-the-reliability-of-computer-use-agents#webpage", "url": "https://sciencetostartup.com/paper/on-the-reliability-of-computer-use-agents", "name": "On the Reliability of Computer Use Agents", "description": "This paper analyzes the sources of unreliability in computer-use agents, focusing on stochasticity, ambiguity, and behavioral variability.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/on-the-reliability-of-computer-use-agents#scholarlyArticle", "headline": "On the Reliability of Computer Use Agents", "description": "This paper analyzes the sources of unreliability in computer-use agents, focusing on stochasticity, ambiguity, and behavioral variability.", "url": "https://sciencetostartup.com/paper/on-the-reliability-of-computer-use-agents", "sameAs": "https://arxiv.org/abs/2604.17849", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.17849" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-20T05:59:04.000Z", "author": [ { "@type": "Person", "name": "Gonzalo Gonzalez-Pumariega" }, { "@type": "Person", "name": "Saaket Agashe" }, { "@type": "Person", "name": "Jiachen Yang" }, { "@type": "Person", "name": "Ang Li" }, { "@type": "Person", "name": "Xin Eric Wang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Agents" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "On the Reliability of Computer Use Agents", "item": "https://sciencetostartup.com/paper/on-the-reliability-of-computer-use-agents" } ] } ] }

Competitive landscape

This paper analyzes the sources of unreliability in computer-use agents, focusing on stochasticity, ambiguity, and behavioral variability.

Segment

AI Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

On the Reliability of Computer Use Agents

On the Reliability of Computer Use Agents

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline