ARXIV:2604.13888 · AGENTS · SUBMITTED 16 APR · 18:19 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis

Bo Yu · Cheng Yang · Dongyang Hou · Chengfu Liu · Jiayao Liu · Chi Wang · +3 at arXiv

A dynamic benchmark and agent architecture for evaluating and improving tool-augmented LLMs in complex spatial analysis tasks.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A dynamic benchmark and agent architecture for evaluating and improving tool-augmented LLMs in complex spatial analysis tasks.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A dynamic benchmark and agent architecture for evaluating and improving tool-augmented LLMs in complex spatial analysis tasks. However, evaluating these LLM-based agents remains challenging due to the complex, multi-step nature of geospatial workflows.

METHOD

Full abstract

The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift toward autonomous spatial analysis. However, evaluating these LLM-based agents remains challenging due to the complex, multi-step nature of geospatial workflows. Existing benchmarks primarily rely on static text or code matching, neglecting dynamic runtime feedback and the multimodal nature of spatial outputs. To address this gap, we introduce GeoAgentBench (GABench), a dynamic and interactive evaluation benchmark tailored for tool-augmented GIS agents. GABench provides a realistic execution sandbox integrating 117 atomic GIS tools, encompassing 53 typical spatial analysis tasks across 6 core GIS domains. Recognizing that precise parameter configuration is the primary determinant of execution success in dynamic GIS environments, we designed the Parameter Execution Accuracy (PEA) metric, which utilizes a "Last-Attempt Alignment" strategy to quantify the fidelity of implicit parameter inference. Complementing this, a Vision-Language Model (VLM) based verification is proposed to assess data-spatial accuracy and cartographic style adherence. Furthermore, to address the frequent task failures caused by parameter misalignments and runtime anomalies, we developed a novel agent architecture, Plan-and-React, that mimics expert cognitive workflows by decoupling global orchestration from step-wise reactive execution. Extensive experiments with seven representative LLMs demonstrate that the Plan-and-React paradigm significantly outperforms traditional frameworks, achieving the optimal balance between logical rigor and execution robustness, particularly in multi-step reasoning and error recovery. Our findings highlight current capability boundaries and establish a robust standard for assessing and advancing the next generation of autonomous GeoAI.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Extensive experiments with seven representative LLMs demonstrate that the Plan-and-React paradigm significantly outperforms traditional frameworks, achieving the optimal balance between logical rigor and execution…

WHY NOW

Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA dynamic benchmark and agent architecture for evaluating and improving tool-augmented LLMs in complex spatial analysis tasks.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A dynamic benchmark and agent architecture for evaluating and improving tool-augmented LLMs in complex spatial analysis tasks.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A dynamic benchmark and agent architecture for evaluating and improving tool-augmented LLMs in complex spatial analysis tasks.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "24015e9e-b633-45dc-8374-9d93af44d92a", "arxiv_id": "2604.13888", "canonical_route": "/paper/geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis", "endpoints": { "paper_pack": "/api/v1/paper/geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis/paper-pack", "build_passport": "/api/v1/paper/geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis", "normalized_query": "2604.13888", "route": "/paper/geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis", "paper_ref": "geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis#webpage", "url": "https://sciencetostartup.com/paper/geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis", "name": "GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis", "description": "A dynamic benchmark and agent architecture for evaluating and improving tool-augmented LLMs in complex spatial analysis tasks.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis#scholarlyArticle", "headline": "GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis", "description": "A dynamic benchmark and agent architecture for evaluating and improving tool-augmented LLMs in complex spatial analysis tasks.", "url": "https://sciencetostartup.com/paper/geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis", "sameAs": "https://arxiv.org/abs/2604.13888", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.13888" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-15T13:55:34.000Z", "author": [ { "@type": "Person", "name": "Bo Yu", "affiliation": { "@type": "Organization", "name": "Central South University" } }, { "@type": "Person", "name": "Cheng Yang", "affiliation": { "@type": "Organization", "name": "Central South University" } }, { "@type": "Person", "name": "Dongyang Hou", "affiliation": { "@type": "Organization", "name": "Central South University" } }, { "@type": "Person", "name": "Chengfu Liu", "affiliation": { "@type": "Organization", "name": "Central South University" } }, { "@type": "Person", "name": "Jiayao Liu", "affiliation": { "@type": "Organization", "name": "Central South University" } }, { "@type": "Person", "name": "Chi Wang", "affiliation": { "@type": "Organization", "name": "Central South University" } }, { "@type": "Person", "name": "Zhiming Zhang", "affiliation": { "@type": "Organization", "name": "Central South University" } }, { "@type": "Person", "name": "Haifeng Li", "affiliation": { "@type": "Organization", "name": "Central South University" } }, { "@type": "Person", "name": "Wentao Yang", "affiliation": { "@type": "Organization", "name": "Hunan University of Science and Technology" } } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmen", "item": "https://sciencetostartup.com/paper/geoagentbench-a-dynamic-execution-benchmark-for-tool-augmented-agents-in-spatial-analysis" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmen\"?", "acceptedAnswer": { "@type": "Answer", "text": "GeoAgentBench offers an interactive AI benchmarking tool for evaluating tool-augmented agents in geospatial analysis." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "GeoAI developers and GIS software companies can integrate this benchmark to test tool-augmented agent performance, improving product quality and offering a competitive edge." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "Develop a commercial GIS tool that integrates GeoAgentBench's framework to improve AI-driven planning and analysis for urban developers and environmental agencies." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "This benchmark tool can replace static evaluation methods in GIScience, offering dynamic, real-time interaction that improves AI reliability and effectiveness in spatial analysis." } } ] } ] }

Competitive landscape

A dynamic benchmark and agent architecture for evaluating and improving tool-augmented LLMs in complex spatial analysis tasks.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis

GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline