ARXIV:2603.26127 · COMPUTER VISION · SUBMITTED 30 MAR · 21:54 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Finding Distributed Object-Centric Properties in Self-Supervised Transformers

Samyak Rawlekar · Amitabh Swain · Yujun Cai · Yiwei Wang · Ming-Hsuan Yang · Narendra Ahuja · arXiv

A training-free method to extract distributed object-centric information from self-supervised transformers for improved object discovery and grounding in multimodal models.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A training-free method to extract distributed object-centric information from self-supervised transformers for improved object discovery and grounding in multimodal models.

Evidence 53 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A training-free method to extract distributed object-centric information from self-supervised transformers for improved object discovery and grounding in multimodal models. However, these maps often contain spurious activations resulting in poor localization of objects.

METHOD

Full abstract

Self-supervised Vision Transformers (ViTs) like DINO show an emergent ability to discover objects, typically observed in [CLS] token attention maps of the final layer. However, these maps often contain spurious activations resulting in poor localization of objects. This is because the [CLS] token, trained on an image-level objective, summarizes the entire image instead of focusing on objects. This aggregation dilutes the object-centric information existing in the local, patch-level interactions. We analyze this by computing inter-patch similarity using patch-level attention components (query, key, and value) across all layers. We find that: (1) Object-centric properties are encoded in the similarity maps derived from all three components ($q, k, v$), unlike prior work that uses only key features or the [CLS] token. (2) This object-centric information is distributed across the network, not just confined to the final layer. Based on these insights, we introduce Object-DINO, a training-free method that extracts this distributed object-centric information. Object-DINO clusters attention heads across all layers based on the similarities of their patches and automatically identifies the object-centric cluster corresponding to all objects. We demonstrate Object-DINO's effectiveness on two applications: enhancing unsupervised object discovery (+3.6 to +12.4 CorLoc gains) and mitigating object hallucination in Multimodal Large Language Models by providing visual grounding. Our results demonstrate that using this distributed object-centric information improves downstream tasks without additional training.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Self-supervised Vision Transformers (ViTs) like DINO show an emergent ability to discover objects, typically observed in [CLS] token attention maps of the final layer.…

WHY NOW

Computer Vision moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA training-free method to extract distributed object-centric information from self-supervised transformers for improved object discovery and grounding in multimodal models.

Evidence53 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A training-free method to extract distributed object-centric information from self-supervised transformers for improved object discovery and grounding in multimodal models.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A training-free method to extract distributed object-centric information from self-supervised transformers for improved object discovery and grounding in multimodal models.

Segment

Computer Vision

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "3b0bda12-be82-4173-b465-29f4a8e623f3", "arxiv_id": "2603.26127", "canonical_route": "/paper/finding-distributed-object-centric-properties-in-self-supervised-transformers", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "finding-distributed-object-centric-properties-in-self-supervised-transformers", "endpoints": { "paper_pack": "/api/v1/paper/finding-distributed-object-centric-properties-in-self-supervised-transformers/paper-pack", "build_passport": "/api/v1/paper/finding-distributed-object-centric-properties-in-self-supervised-transformers/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Finding Distributed Object-Centric Properties in Self-Supervised Transformers", "normalized_query": "2603.26127", "route": "/paper/finding-distributed-object-centric-properties-in-self-supervised-transformers", "paper_ref": "finding-distributed-object-centric-properties-in-self-supervised-transformers", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/finding-distributed-object-centric-properties-in-self-supervised-transformers#webpage", "url": "https://sciencetostartup.com/paper/finding-distributed-object-centric-properties-in-self-supervised-transformers", "name": "Finding Distributed Object-Centric Properties in Self-Supervised Transformers", "description": "A training-free method to extract distributed object-centric information from self-supervised transformers for improved object discovery and grounding in multimodal models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/finding-distributed-object-centric-properties-in-self-supervised-transformers#scholarlyArticle", "headline": "Finding Distributed Object-Centric Properties in Self-Supervised Transformers", "description": "A training-free method to extract distributed object-centric information from self-supervised transformers for improved object discovery and grounding in multimodal models.", "url": "https://sciencetostartup.com/paper/finding-distributed-object-centric-properties-in-self-supervised-transformers", "sameAs": "https://arxiv.org/abs/2603.26127", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26127" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T07:22:04.000Z", "author": [ { "@type": "Person", "name": "Samyak Rawlekar" }, { "@type": "Person", "name": "Amitabh Swain" }, { "@type": "Person", "name": "Yujun Cai" }, { "@type": "Person", "name": "Yiwei Wang" }, { "@type": "Person", "name": "Ming-Hsuan Yang" }, { "@type": "Person", "name": "Narendra Ahuja" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Computer Vision" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Computer Vision", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Finding Distributed Object-Centric Properties in Self-Superv", "item": "https://sciencetostartup.com/paper/finding-distributed-object-centric-properties-in-self-supervised-transformers" } ] } ] }

Competitive landscape

A training-free method to extract distributed object-centric information from self-supervised transformers for improved object discovery and grounding in multimodal models.

Segment

Computer Vision

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Finding Distributed Object-Centric Properties in Self-Supervised Transformers

Finding Distributed Object-Centric Properties in Self-Supervised Transformers

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline