ARXIV:2606.03116 · AUDIO AI · SUBMITTED 03 JUN · 20:33 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following

Haitao Li · Tian Tan · Yuguang Yang · Shan Yang · Xie Chen · arXiv

An audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment.

Ship in 2-4 weeks›Score7.0Evidence partial

Opportunity summary

Pain An audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment.

Evidence 0 refs | 4 sources | 83% coverage

Blocker Evidence partial

Open Build Read PDF Signal Canvas Track

PROBLEM

An audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment. Current automated evaluation methods heavily rely on holistic scoring from general-purpose large language models, which…

METHOD

Full abstract

The rapid advancement of instruction-guided audio generation has highlighted the critical need for robust alignment evaluation. Current automated evaluation methods heavily rely on holistic scoring from general-purpose large language models, which struggle to decouple complex instructions, lack interpretability, and fail to capture fine-grained attribute mismatches. To address this, we introduce a novel dynamic rubric-based evaluation paradigm that adaptively decomposes complex audio captions into a variable number of independent, verifiable binary rubric items. To rigorously benchmark this capability, we propose the AnyAudio-Judge Bench, a comprehensive, bilingual benchmark comprising 7,920 meticulously curated samples across four diverse audio domains (speech, sound, music, and mixed), featuring deliberately constructed hard negatives. Furthermore, we construct a large-scale corpus of 105K samples with explicit Chain-of-Thought (CoT) rationales to train our dedicated evaluator, the AnyAudio-Judge model. By employing a training pipeline that combines Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO), our model successfully aligns its reasoning paths with the rubric-based scoring mechanism. Extensive experiments demonstrate that AnyAudio-Judge not only significantly enhances zero-shot alignment detection compared to state-of-the-art baselines, but also provides precise and interpretable reward signals that substantially improve instruction alignment in downstream reinforcement learning for audio generation.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Extensive experiments demonstrate that AnyAudio-Judge not only significantly enhances zero-shot alignment detection compared to state-of-the-art baselines, but also provides precise and interpretable reward signals…

WHY NOW

Audio AI moved forward this cycle; last verified June 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAn audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment.

Evidence0 refs | 4 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

An audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

Competitive landscape

An audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment.

Segment

Audio AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "dad0afa0-245d-4211-930f-1c6166d6888c", "arxiv_id": "2606.03116", "canonical_route": "/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following", "endpoints": { "paper_pack": "/api/v1/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following/paper-pack", "build_passport": "/api/v1/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following", "normalized_query": "2606.03116", "route": "/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following", "paper_ref": "anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following#webpage", "url": "https://sciencetostartup.com/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following", "name": "AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following", "description": "An audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following#scholarlyArticle", "headline": "AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following", "description": "An audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment.", "url": "https://sciencetostartup.com/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following", "sameAs": "https://arxiv.org/abs/2606.03116", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2606.03116" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-06-02T04:00:32.000Z", "author": [ { "@type": "Person", "name": "Haitao Li" }, { "@type": "Person", "name": "Tian Tan" }, { "@type": "Person", "name": "Yuguang Yang" }, { "@type": "Person", "name": "Shan Yang" }, { "@type": "Person", "name": "Xie Chen" } ], "codeRepository": "https://github.com/CuCl-2/AnyAudio-Judge", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Audio AI" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following#software", "name": "AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following - Source Code", "description": "An audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment.", "codeRepository": "https://github.com/CuCl-2/AnyAudio-Judge", "url": "https://github.com/CuCl-2/AnyAudio-Judge" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Audio AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluat", "item": "https://sciencetostartup.com/paper/anyaudio-judge-a-dynamic-rubric-based-benchmark-and-evaluator-for-audio-instruction-following" } ] } ] }

Competitive landscape

An audio instruction following benchmark and evaluator that uses dynamic rubrics to provide precise, interpretable feedback for better alignment.

Segment

Audio AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following

AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline