ARXIV:2606.06615 · MUSIC RETRIEVAL · SUBMITTED 08 JUN · 20:17 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

FIGMA: Towards FIne-Grained Music retrievAl

Nishit Anand · Ashish Seth · Sreyan Ghosh · Dinesh Manocha · Ramani Duraiswami · arXiv

FIGMA is a multi-view contrastive architecture and dataset for fine-grained music retrieval using detailed natural language descriptions.

Ship in 2-4 weeks›Score8.0Evidence unverified

Opportunity summary

Pain FIGMA is a multi-view contrastive architecture and dataset for fine-grained music retrieval using detailed natural language descriptions.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

FIGMA is a multi-view contrastive architecture and dataset for fine-grained music retrieval using detailed natural language descriptions. When descriptions specify fine-grained musical attributes such as tempo, key, chord progression, or rhythmic structure, existing models…

METHOD

Full abstract

Retrieving music using natural language descriptions has improved with contrastive audio-text models such as CLAP, but current systems remain limited to coarse semantic queries. When descriptions specify fine-grained musical attributes such as tempo, key, chord progression, or rhythmic structure, existing models often fail to retrieve the correct audio. We show that this limitation stems from the contrastive learning objective itself: despite being trained on long captions, CLAP-based models effectively utilize only the first few tokens, discarding much of the information encoded in detailed prompts. Then, we propose FIGMA (FIne-Grained Music RetrievAl), a multi-view contrastive architecture that addresses this limitation by jointly optimizing global audio-text alignment and frame-level, token-wise alignment. This design enables FIGMA to capture both high-level semantic context and fine-grained musical attributes within a unified representation space. Moreover, we formalize the task of Fine-Grained Music Retrieval and construct Fine-Grained Music Caption dataset (FGMCaps), a large-scale dataset of 380K music-caption pairs for training along with a 10K test set, both annotated with tempo, key, chord progression, beat count, as well as genre and mood. Extensive experiments demonstrate that FIGMA consistently outperforms existing CLAP-based music retrieval models across multiple music retrieval benchmarks, including out-of-domain evaluations, with relative improvements of up to 73.3%.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. We show that this limitation stems from the contrastive learning objective itself: despite being trained on long captions, CLAP-based models effectively utilize only the…

WHY NOW

Music Retrieval moved forward this cycle; last verified June 2026. Public score 8.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainFIGMA is a multi-view contrastive architecture and dataset for fine-grained music retrieval using detailed natural language descriptions.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

FIGMA is a multi-view contrastive architecture and dataset for fine-grained music retrieval using detailed natural language descriptions.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

FIGMA is a multi-view contrastive architecture and dataset for fine-grained music retrieval using detailed natural language descriptions.

Segment

Music Retrieval

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "bb2cb7a7-7ee7-4cfa-a645-989bb70dbf09", "arxiv_id": "2606.06615", "canonical_route": "/paper/figma-towards-fine-grained-music-retrieval", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "figma-towards-fine-grained-music-retrieval", "endpoints": { "paper_pack": "/api/v1/paper/figma-towards-fine-grained-music-retrieval/paper-pack", "build_passport": "/api/v1/paper/figma-towards-fine-grained-music-retrieval/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "FIGMA: Towards FIne-Grained Music retrievAl", "normalized_query": "2606.06615", "route": "/paper/figma-towards-fine-grained-music-retrieval", "paper_ref": "figma-towards-fine-grained-music-retrieval", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/figma-towards-fine-grained-music-retrieval#webpage", "url": "https://sciencetostartup.com/paper/figma-towards-fine-grained-music-retrieval", "name": "FIGMA: Towards FIne-Grained Music retrievAl", "description": "FIGMA is a multi-view contrastive architecture and dataset for fine-grained music retrieval using detailed natural language descriptions.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/figma-towards-fine-grained-music-retrieval#scholarlyArticle", "headline": "FIGMA: Towards FIne-Grained Music retrievAl", "description": "FIGMA is a multi-view contrastive architecture and dataset for fine-grained music retrieval using detailed natural language descriptions.", "url": "https://sciencetostartup.com/paper/figma-towards-fine-grained-music-retrieval", "sameAs": "https://arxiv.org/abs/2606.06615", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2606.06615" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-06-04T18:05:39.000Z", "author": [ { "@type": "Person", "name": "Nishit Anand" }, { "@type": "Person", "name": "Ashish Seth" }, { "@type": "Person", "name": "Sreyan Ghosh" }, { "@type": "Person", "name": "Dinesh Manocha" }, { "@type": "Person", "name": "Ramani Duraiswami" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Music Retrieval" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Music Retrieval", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "FIGMA: Towards FIne-Grained Music retrievAl", "item": "https://sciencetostartup.com/paper/figma-towards-fine-grained-music-retrieval" } ] } ] }

Competitive landscape

FIGMA is a multi-view contrastive architecture and dataset for fine-grained music retrieval using detailed natural language descriptions.

Segment

Music Retrieval

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

FIGMA: Towards FIne-Grained Music retrievAl

FIGMA: Towards FIne-Grained Music retrievAl

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline