ARXIV:2603.08898 · VISUAL QUERY SEGMENTATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Towards Visual Query Segmentation in the Wild

arXiv

A novel visual query segmentation method that enables precise pixel-level localization of objects in untrimmed videos.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain A novel visual query segmentation method that enables precise pixel-level localization of objects in untrimmed videos.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A novel visual query segmentation method that enables precise pixel-level localization of objects in untrimmed videos. Compared to existing VQL locating only the last appearance of a target using bounding boxes, VQS enables more…

METHOD

Full abstract

In this paper, we introduce visual query segmentation (VQS), a new paradigm of visual query localization (VQL) that aims to segment all pixel-level occurrences of an object of interest in an untrimmed video, given an external visual query. Compared to existing VQL locating only the last appearance of a target using bounding boxes, VQS enables more comprehensive (i.e., all object occurrences) and precise (i.e., pixel-level masks) localization, making it more practical for real-world scenarios. To foster research on this task, we present VQS-4K, a large-scale benchmark dedicated to VQS. Specifically, VQS-4K contains 4,111 videos with more than 1.3 million frames and covers a diverse set of 222 object categories. Each video is paired with a visual query defined by a frame outside the search video and its target mask, and annotated with spatial-temporal masklets corresponding to the queried target. To ensure high quality, all videos in VQS-4K are manually labeled with meticulous inspection and iterative refinement. To the best of our knowledge, VQS-4K is the first benchmark specifically designed for VQS. Furthermore, to stimulate future research, we present a simple yet effective method, named VQ-SAM, which extends SAM 2 by leveraging target-specific and background distractor cues from the video to progressively evolve the memory through a novel multi-stage framework with an adaptive memory generation (AMG) module for VQS, significantly improving the performance. In our extensive experiments on VQS-4K, VQ-SAM achieves promising results and surpasses all existing approaches, demonstrating its effectiveness. With the proposed VQS-4K and VQ-SAM, we expect to go beyond the current VQL paradigm and inspire more future research and practical applications on VQS. Our benchmark, code, and results will be made publicly available.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Compared to existing VQL locating only the last appearance of a target using bounding boxes, VQS enables more comprehensive (i.e., all object occurrences) and…

WHY NOW

Visual Query Segmentation moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA novel visual query segmentation method that enables precise pixel-level localization of objects in untrimmed videos.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

A novel visual query segmentation method that enables precise pixel-level localization of objects in untrimmed videos.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

A novel visual query segmentation method that enables precise pixel-level localization of objects in untrimmed videos.

Segment

Visual Query Segmentation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "e082ae26-d726-4255-8bd5-8cdef56358b3", "arxiv_id": "2603.08898", "canonical_route": "/paper/towards-visual-query-segmentation-in-the-wild", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "towards-visual-query-segmentation-in-the-wild", "endpoints": { "paper_pack": "/api/v1/paper/towards-visual-query-segmentation-in-the-wild/paper-pack", "build_passport": "/api/v1/paper/towards-visual-query-segmentation-in-the-wild/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Towards Visual Query Segmentation in the Wild", "normalized_query": "2603.08898", "route": "/paper/towards-visual-query-segmentation-in-the-wild", "paper_ref": "towards-visual-query-segmentation-in-the-wild", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/towards-visual-query-segmentation-in-the-wild#webpage", "url": "https://sciencetostartup.com/paper/towards-visual-query-segmentation-in-the-wild", "name": "Towards Visual Query Segmentation in the Wild", "description": "A novel visual query segmentation method that enables precise pixel-level localization of objects in untrimmed videos.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/towards-visual-query-segmentation-in-the-wild#scholarlyArticle", "headline": "Towards Visual Query Segmentation in the Wild", "description": "A novel visual query segmentation method that enables precise pixel-level localization of objects in untrimmed videos.", "url": "https://sciencetostartup.com/paper/towards-visual-query-segmentation-in-the-wild", "sameAs": "https://arxiv.org/abs/2603.08898", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.08898" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-09T20:09:04.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Visual Query Segmentation" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Visual Query Segmentation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Towards Visual Query Segmentation in the Wild", "item": "https://sciencetostartup.com/paper/towards-visual-query-segmentation-in-the-wild" } ] } ] }

Competitive landscape

A novel visual query segmentation method that enables precise pixel-level localization of objects in untrimmed videos.

Segment

Visual Query Segmentation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Towards Visual Query Segmentation in the Wild

Towards Visual Query Segmentation in the Wild

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline