ARXIV:2603.17441 · GUI GROUNDING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement

arXiv

AdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain AdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

AdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models. However, grounding on GUI screenshots remains challenging due to high-resolution images, small UI elements, and ambiguous user instructions.

METHOD

Full abstract

GUI grounding is a critical capability for vision-language models (VLMs) that enables automated interaction with graphical user interfaces by locating target elements from natural language instructions. However, grounding on GUI screenshots remains challenging due to high-resolution images, small UI elements, and ambiguous user instructions. In this work, we propose AdaZoom-GUI, an adaptive zoom-based GUI grounding framework that improves both localization accuracy and instruction understanding. Our approach introduces an instruction refinement module that rewrites natural language commands into explicit and detailed descriptions, allowing the grounding model to focus on precise element localization. In addition, we design a conditional zoom-in strategy that selectively performs a second-stage inference on predicted small elements, improving localization accuracy while avoiding unnecessary computation and context loss on simpler cases. To support this framework, we construct a high-quality GUI grounding dataset and train the grounding model using Group Relative Policy Optimization (GRPO), enabling the model to predict both click coordinates and element bounding boxes. Experiments on public benchmarks demonstrate that our method achieves state-of-the-art performance among models with comparable or even larger parameter sizes, highlighting its effectiveness for high-resolution GUI understanding and practical GUI agent deployment.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. GUI grounding is a critical capability for vision-language models (VLMs) that enables automated interaction with graphical user interfaces by locating target elements from natural…

WHY NOW

GUI Grounding moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

AdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

AdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models.

Segment

GUI Grounding

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "c9a9abf0-2e72-49c6-a95e-6a5e7e8abf44", "arxiv_id": "2603.17441", "canonical_route": "/paper/adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement", "endpoints": { "paper_pack": "/api/v1/paper/adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement/paper-pack", "build_passport": "/api/v1/paper/adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement", "normalized_query": "2603.17441", "route": "/paper/adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement", "paper_ref": "adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement#webpage", "url": "https://sciencetostartup.com/paper/adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement", "name": "AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement", "description": "AdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement#scholarlyArticle", "headline": "AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement", "description": "AdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models.", "url": "https://sciencetostartup.com/paper/adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement", "sameAs": "https://arxiv.org/abs/2603.17441", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.17441" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-18T07:26:18.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "GUI Grounding" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "GUI Grounding", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruct", "item": "https://sciencetostartup.com/paper/adazoom-gui-adaptive-zoom-based-gui-grounding-with-instruction-refinement" } ] } ] }

Competitive landscape

AdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models.

Segment

GUI Grounding

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement

AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline