ARXIV:2605.15836 · ROBOTICS · SUBMITTED 18 MAY · 20:28 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks

Davide Buoso · Andrea Protopapa · Stefano Di Carlo · Francesca Pistilli · Giuseppe Averta · arXiv

A pre-training method for robotic manipulation that improves data efficiency and robustness by learning geometric anchors from object masks.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A pre-training method for robotic manipulation that improves data efficiency and robustness by learning geometric anchors from object masks.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A pre-training method for robotic manipulation that improves data efficiency and robustness by learning geometric anchors from object masks. A primary hurdle lies in distilling high-dimensional RGB representations into control-relevant geometry without overfitting.

METHOD

Full abstract

Learning visuomotor policies from scarce expert demonstrations remains a core challenge in robotic manipulation. A primary hurdle lies in distilling high-dimensional RGB representations into control-relevant geometry without overfitting. While using frozen pre-trained Vision Foundation Models (VFMs) improves data efficiency, it also shifts most task adaptation onto a small spatial pooling module, which can latch onto task-irrelevant shortcuts and lose geometric grounding when finetuned with few data samples. More broadly, pre-trained visual representations used for policy learning have been observed to struggle under even minor scene perturbations, highlighting the need for robustness-oriented inductive biases. We propose Geometric Anchor Pre-training (GAP), a simple, action-free warm-up stage that regularizes the spatial adapter before downstream imitation learning. GAP pre-trains the pooling layer on a lightweight simulated proxy task where object masks are available at no cost, encouraging the adapter to produce keypoints that lie on the object, cover its spatial extent, and remain sharp and repeatable over time. This yields stable geometric anchors that provide a reliable coordinate interface for few-shot policy learning, while keeping the VFM frozen. We evaluate GAP on RoboMimic and ManiSkill under severe data scarcity (15-50 demonstrations) and domain shift. A simple adapter regularized with GAP consistently outperforms stronger attention-based poolers and end-to-end fine-tuning, achieving 62% success on RoboMimic Can with 15 demonstrations (+16% over AFA), 63% on the long-horizon high-precision Tool Hang task with 50 demonstrations, and 61% on ManiSkill StackCube with 30 demonstrations (+11% over full fine-tuning). The proxy stage is lightweight and fully decoupled from downstream tasks, making it practical to reuse across environments and manipulation skills.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. While using frozen pre-trained Vision Foundation Models (VFMs) improves data efficiency, it also shifts most task adaptation onto a small spatial pooling module, which…

WHY NOW

Robotics moved forward this cycle; last verified May 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA pre-training method for robotic manipulation that improves data efficiency and robustness by learning geometric anchors from object masks.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A pre-training method for robotic manipulation that improves data efficiency and robustness by learning geometric anchors from object masks.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A pre-training method for robotic manipulation that improves data efficiency and robustness by learning geometric anchors from object masks.

Segment

Robotics

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "a3976aa7-cd9f-4ed2-824a-6e6dbc5558ce", "arxiv_id": "2605.15836", "canonical_route": "/paper/gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks", "endpoints": { "paper_pack": "/api/v1/paper/gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks/paper-pack", "build_passport": "/api/v1/paper/gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks", "normalized_query": "2605.15836", "route": "/paper/gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks", "paper_ref": "gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks#webpage", "url": "https://sciencetostartup.com/paper/gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks", "name": "GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks", "description": "A pre-training method for robotic manipulation that improves data efficiency and robustness by learning geometric anchors from object masks.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks#scholarlyArticle", "headline": "GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks", "description": "A pre-training method for robotic manipulation that improves data efficiency and robustness by learning geometric anchors from object masks.", "url": "https://sciencetostartup.com/paper/gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks", "sameAs": "https://arxiv.org/abs/2605.15836", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.15836" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-15T10:48:30.000Z", "author": [ { "@type": "Person", "name": "Davide Buoso" }, { "@type": "Person", "name": "Andrea Protopapa" }, { "@type": "Person", "name": "Stefano Di Carlo" }, { "@type": "Person", "name": "Francesca Pistilli" }, { "@type": "Person", "name": "Giuseppe Averta" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Robotics" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Robotics", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "GAP: Geometric Anchor Pre-training for Data-Efficient Visuom", "item": "https://sciencetostartup.com/paper/gap-geometric-anchor-pre-training-for-data-efficient-visuomotor-learning-of-manipulation-tasks" } ] } ] }

Competitive landscape

A pre-training method for robotic manipulation that improves data efficiency and robustness by learning geometric anchors from object masks.

Segment

Robotics

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks

GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline