ARXIV:2604.21268 · GUI GROUNDING · SUBMITTED 24 APR · 20:25 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Wenkai Wang · Xiyun Li · Hongcan Guo · Wenhao Yu · Tianqing Fang · Haitao Mi · +2 at arXiv

A reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs.

Ship in 2-4 weeks›Score8.0Evidence unverified

Opportunity summary

Pain A reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs. However, due to visually homogeneous elements and dense layouts, models typically grasp semantic…

METHOD

Full abstract

Graphical User Interface (GUI) grounding requires mapping natural language instructions to precise pixel coordinates. However, due to visually homogeneous elements and dense layouts, models typically grasp semantic intent yet struggle with achieving precise localization. While scaling sampling attempts (Pass@k) reveals potential gains, static self-consistency strategies derived from geometric clustering often yield limited improvements, as the model's predictions tend to be spatially dispersed. In this paper, we propose replacing static consistency strategies with a learnable selection mechanism that selects the optimal target by critiquing its own proposals rendered on the screenshot. Given the significant disparity between the model's grounding and critiquing capabilities, we propose a co-evolving Propose-then-Critic framework. To jointly optimize these, we introduce a maturity-aware adaptive co-evolutionary reinforcement learning paradigm. This approach dynamically balances the training objectives of proposer and critic, where the diversity of the proposer's outputs enhances critic robustness, while the critic's maturing discrimination capability conversely unlocks the proposer's potential for extensive spatial exploration, fostering the mutual reinforcement and co-evolution of both capabilities, thereby ensuring generalizability to adapt to diverse and complex interface layouts. Extensive experiments over 6 benchmarks show that our method significantly enhances both grounding accuracy and critic reliability.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. Extensive experiments over 6 benchmarks show that our method significantly enhances both grounding accuracy and critic reliability. Code availability is flagged in the production…

WHY NOW

GUI Grounding moved forward this cycle; last verified April 2026. Public score 8.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainA reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs.

Segment

GUI Grounding

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "520aee21-6016-48a8-9870-d38f8c7c352a", "arxiv_id": "2604.21268", "canonical_route": "/paper/measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding", "endpoints": { "paper_pack": "/api/v1/paper/measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding/paper-pack", "build_passport": "/api/v1/paper/measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding", "normalized_query": "2604.21268", "route": "/paper/measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding", "paper_ref": "measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding#webpage", "url": "https://sciencetostartup.com/paper/measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding", "name": "Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding", "description": "A reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding#scholarlyArticle", "headline": "Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding", "description": "A reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs.", "url": "https://sciencetostartup.com/paper/measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding", "sameAs": "https://arxiv.org/abs/2604.21268", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.21268" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-23T04:23:31.000Z", "author": [ { "@type": "Person", "name": "Wenkai Wang" }, { "@type": "Person", "name": "Xiyun Li" }, { "@type": "Person", "name": "Hongcan Guo" }, { "@type": "Person", "name": "Wenhao Yu" }, { "@type": "Person", "name": "Tianqing Fang" }, { "@type": "Person", "name": "Haitao Mi" }, { "@type": "Person", "name": "Dong Yu" }, { "@type": "Person", "name": "Shengyu Zhang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "GUI Grounding" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "GUI Grounding", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Measure Twice, Click Once: Co-evolving Proposer and Visual C", "item": "https://sciencetostartup.com/paper/measure-twice-click-once-co-evolving-proposer-and-visual-critic-via-reinforcement-learning-for-gui-grounding" } ] } ] }

Competitive landscape

A reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs.

Segment

GUI Grounding

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline