ARXIV:2604.08008 · AUTONOMOUS DRIVING DATASET · SUBMITTED 10 APR · 20:18 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving

Felix Embacher · Jonas Uhrig · Marius Cordts · Markus Enzweiler · arXiv

SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.

Ship in 2-4 weeks›Score7.0Evidence verified

Opportunity summary

Pain SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence verified

Open Build Read PDF Signal Canvas Track

PROBLEM

SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research. As dataset sizes continue to grow, the key challenge shifts from collecting more data to…

METHOD

Full abstract

Retrieving rare and safety-critical driving scenarios from large-scale datasets is essential for building robust autonomous driving (AD) systems. As dataset sizes continue to grow, the key challenge shifts from collecting more data to efficiently identifying the most relevant samples. We introduce SearchAD, a large-scale rare image retrieval dataset for AD containing over 423k frames drawn from 11 established datasets. SearchAD provides high-quality manual annotations of more than 513k bounding boxes covering 90 rare categories. It specifically targets the needle-in-a-haystack problem of locating extremely rare classes, with some appearing fewer than 50 times across the entire dataset. Unlike existing benchmarks, which focused on instance-level retrieval, SearchAD emphasizes semantic image retrieval with a well-defined data split, enabling text-to-image and image-to-image retrieval, few-shot learning, and fine-tuning of multi-modal retrieval models. Comprehensive evaluations show that text-based methods outperform image-based ones due to stronger inherent semantic grounding. While models directly aligning spatial visual features with language achieve the best zero-shot results, and our fine-tuning baseline significantly improves performance, absolute retrieval capabilities remain unsatisfactory. With a held-out test set on a public benchmark server, SearchAD establishes the first large-scale dataset for retrieval-driven data curation and long-tail perception research in AD: https://iis-esslingen.github.io/searchad/

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Comprehensive evaluations show that text-based methods outperform image-based ones due to stronger inherent semantic grounding. A public repository is linked, so build verification can…

WHY NOW

Autonomous Driving Dataset moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainSearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Competitive landscape

SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.

Segment

Autonomous Driving Dataset

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "fd3dabe9-7a89-42fc-88b4-342d896c1c74", "arxiv_id": "2604.08008", "canonical_route": "/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving", "endpoints": { "paper_pack": "/api/v1/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving/paper-pack", "build_passport": "/api/v1/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving", "normalized_query": "2604.08008", "route": "/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving", "paper_ref": "searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving#webpage", "url": "https://sciencetostartup.com/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving", "name": "SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving", "description": "SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving#scholarlyArticle", "headline": "SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving", "description": "SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.", "url": "https://sciencetostartup.com/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving", "sameAs": "https://arxiv.org/abs/2604.08008", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.08008" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-09T09:10:41.000Z", "author": [ { "@type": "Person", "name": "Felix Embacher" }, { "@type": "Person", "name": "Jonas Uhrig" }, { "@type": "Person", "name": "Marius Cordts" }, { "@type": "Person", "name": "Markus Enzweiler" } ], "codeRepository": "https://github.com/cvpr-org/author-kit", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Autonomous Driving Dataset" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving#software", "name": "SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving - Source Code", "description": "SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.", "codeRepository": "https://github.com/cvpr-org/author-kit", "url": "https://github.com/cvpr-org/author-kit" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Autonomous Driving Dataset", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "SearchAD: Large-Scale Rare Image Retrieval Dataset for Auton", "item": "https://sciencetostartup.com/paper/searchad-large-scale-rare-image-retrieval-dataset-for-autonomous-driving" } ] } ] }

Competitive landscape

SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.

Segment

Autonomous Driving Dataset

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving

SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline