ARXIV:2605.03361 · MULTIMODAL RETRIEVAL · SUBMITTED 06 MAY · 20:25 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval

Honglei Zhang · Yuting Chen · Chenpeng Hu · Siyue Zhang · Yilei Shi · arXiv

A new benchmark for text-audio retrieval that evaluates advanced reasoning capabilities beyond simple semantic matching.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A new benchmark for text-audio retrieval that evaluates advanced reasoning capabilities beyond simple semantic matching.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A new benchmark for text-audio retrieval that evaluates advanced reasoning capabilities beyond simple semantic matching. However, most existing benchmarks concentrate on semantic matching and fail to capture the fact that real-world queries often demand…

METHOD

Full abstract

As multimodal content continues to expand at a rapid pace, audio retrieval has emerged as a key enabling technology for media search, content organization, and intelligent assistants. However, most existing benchmarks concentrate on semantic matching and fail to capture the fact that real-world queries often demand advanced reasoning abilities, including negation understanding, temporal ordering, concurrent event recognition, and duration discrimination. To address this gap, we introduce ReasonAudio, the first reasoning-intensive benchmark for Text-Audio Retrieval, comprising 1,000 queries and 10,000 composite audio clips across five fundamental reasoning tasks: Negation, Order, Overlap, Duration, and Mix. Despite their intuitive nature for humans and straightforward construction, these tasks pose significant challenges to current models. Our evaluation of ten state-of-the-art models reveals the following findings: All models struggle with reasoning-intensive audio retrieval, performing particularly poorly on Negation and Duration while showing relatively better results on Overlap and Order. Moreover, Multimodal Large Language Model-based embedding models fail to inherit the reasoning capabilities of their backbones through contrastive fine-tuning, suggesting that current training paradigms are insufficient to preserve reasoning capacity in retrieval settings

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Our evaluation of ten state-of-the-art models reveals the following findings: All models struggle with reasoning-intensive audio retrieval, performing particularly poorly on Negation and Duration…

WHY NOW

Multimodal Retrieval moved forward this cycle; last verified May 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA new benchmark for text-audio retrieval that evaluates advanced reasoning capabilities beyond simple semantic matching.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A new benchmark for text-audio retrieval that evaluates advanced reasoning capabilities beyond simple semantic matching.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A new benchmark for text-audio retrieval that evaluates advanced reasoning capabilities beyond simple semantic matching.

Segment

Multimodal Retrieval

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2a5528d8-3d0b-45e2-b817-f449214ef34a", "arxiv_id": "2605.03361", "canonical_route": "/paper/reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval", "endpoints": { "paper_pack": "/api/v1/paper/reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval/paper-pack", "build_passport": "/api/v1/paper/reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval", "normalized_query": "2605.03361", "route": "/paper/reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval", "paper_ref": "reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval#webpage", "url": "https://sciencetostartup.com/paper/reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval", "name": "ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval", "description": "A new benchmark for text-audio retrieval that evaluates advanced reasoning capabilities beyond simple semantic matching.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval#scholarlyArticle", "headline": "ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval", "description": "A new benchmark for text-audio retrieval that evaluates advanced reasoning capabilities beyond simple semantic matching.", "url": "https://sciencetostartup.com/paper/reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval", "sameAs": "https://arxiv.org/abs/2605.03361", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.03361" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-05T04:44:51.000Z", "author": [ { "@type": "Person", "name": "Honglei Zhang" }, { "@type": "Person", "name": "Yuting Chen" }, { "@type": "Person", "name": "Chenpeng Hu" }, { "@type": "Person", "name": "Siyue Zhang" }, { "@type": "Person", "name": "Yilei Shi" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Multimodal Retrieval" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Multimodal Retrieval", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Mat", "item": "https://sciencetostartup.com/paper/reasonaudio-a-benchmark-for-evaluating-reasoning-beyond-matching-in-text-audio-retrieval" } ] } ] }

Competitive landscape

A new benchmark for text-audio retrieval that evaluates advanced reasoning capabilities beyond simple semantic matching.

Segment

Multimodal Retrieval

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval

ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline