ARXIV:2605.02503 · AGENTS · SUBMITTED 05 MAY · 20:27 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

DataClaw: A Process-Oriented Agent Benchmark for Exploratory Real-World Data Analysis

Qiaohong Zhang · Weihao Ye · Jialong Chen · Yi Luo · BoYuan Li · Bowen Deng · +4 at arXiv

A benchmark for evaluating autonomous data analysis agents on real-world data exploration tasks, revealing current limitations and distinct strategies.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A benchmark for evaluating autonomous data analysis agents on real-world data exploration tasks, revealing current limitations and distinct strategies.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A benchmark for evaluating autonomous data analysis agents on real-world data exploration tasks, revealing current limitations and distinct strategies. However, many existing benchmarks emphasize final answer accuracy in prior-guided data settings and provide limited…

METHOD

Full abstract

Evaluating autonomous data analysis agents requires testing their ability to perform exploratory analysis in underexplored data environments. However, many existing benchmarks emphasize final answer accuracy in prior-guided data settings and provide limited support for reasoning process evaluation. We introduce DataClaw, a process-oriented benchmark for exploratory real-world data analysis. DataClaw contains approximately 2.06 million real-world records across enterprise, industry and policy domains, with native data noise preserved. It further includes 492 cross-domain tasks derived from think-tank consulting scenarios, each annotated with intermediate milestones for process-level evaluation. These annotations allow DataClaw to measure how far an agent progresses and where its reasoning breaks down. Experiments with eight advanced LLMs show that current agents remain far from reliable in this setting, with seven models achieving below 50% overall accuracy. Process analysis further reveals partial progress hidden behind wrong answers and distinct exploration strategies across models. Overall, DataClaw provides a less data constrained diagnostic testbed for probing the capability boundaries of autonomous data-analysis agents.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. However, many existing benchmarks emphasize final answer accuracy in prior-guided data settings and provide limited support for reasoning process evaluation. Code availability is flagged…

WHY NOW

Agents moved forward this cycle; last verified May 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA benchmark for evaluating autonomous data analysis agents on real-world data exploration tasks, revealing current limitations and distinct strategies.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A benchmark for evaluating autonomous data analysis agents on real-world data exploration tasks, revealing current limitations and distinct strategies.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A benchmark for evaluating autonomous data analysis agents on real-world data exploration tasks, revealing current limitations and distinct strategies.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d306d254-c8f1-49c0-9a53-5dc182194a0b", "arxiv_id": "2605.02503", "canonical_route": "/paper/dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis", "endpoints": { "paper_pack": "/api/v1/paper/dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis/paper-pack", "build_passport": "/api/v1/paper/dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "DataClaw: A Process-Oriented Agent Benchmark for Exploratory Real-World Data Analysis", "normalized_query": "2605.02503", "route": "/paper/dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis", "paper_ref": "dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis#webpage", "url": "https://sciencetostartup.com/paper/dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis", "name": "DataClaw: A Process-Oriented Agent Benchmark for Exploratory Real-World Data Analysis", "description": "A benchmark for evaluating autonomous data analysis agents on real-world data exploration tasks, revealing current limitations and distinct strategies.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis#scholarlyArticle", "headline": "DataClaw: A Process-Oriented Agent Benchmark for Exploratory Real-World Data Analysis", "description": "A benchmark for evaluating autonomous data analysis agents on real-world data exploration tasks, revealing current limitations and distinct strategies.", "url": "https://sciencetostartup.com/paper/dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis", "sameAs": "https://arxiv.org/abs/2605.02503", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.02503" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-04T11:57:09.000Z", "author": [ { "@type": "Person", "name": "Qiaohong Zhang" }, { "@type": "Person", "name": "Weihao Ye" }, { "@type": "Person", "name": "Jialong Chen" }, { "@type": "Person", "name": "Yi Luo" }, { "@type": "Person", "name": "BoYuan Li" }, { "@type": "Person", "name": "Bowen Deng" }, { "@type": "Person", "name": "Zibin Zheng" }, { "@type": "Person", "name": "Jianhao Lin" }, { "@type": "Person", "name": "Wei-Shi Zheng" }, { "@type": "Person", "name": "Chuan Chen" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "DataClaw: A Process-Oriented Agent Benchmark for Exploratory", "item": "https://sciencetostartup.com/paper/dataclaw-a-process-oriented-agent-benchmark-for-exploratory-real-world-data-analysis" } ] } ] }

Competitive landscape

A benchmark for evaluating autonomous data analysis agents on real-world data exploration tasks, revealing current limitations and distinct strategies.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

DataClaw: A Process-Oriented Agent Benchmark for Exploratory Real-World Data Analysis

DataClaw: A Process-Oriented Agent Benchmark for Exploratory Real-World Data Analysis

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline