ARXIV:2603.08823 · TEXT-TO-SPEECH · SUBMITTED 19 MAR · 21:31 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: partial proof status

Fish Audio S2 Technical Report

arXiv

Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation.

Blocked on Code›Score9.0Evidence partial

Opportunity summary

Pain Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence partial

Open Build Read PDF Signal Canvas Track

PROBLEM

Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation. To scale training, we develop a multi-stage training recipe together with a staged data pipeline covering video captioning and speech…

METHOD

Full abstract

We introduce Fish Audio S2, an open-sourced text-to-speech system featuring multi-speaker, multi-turn generation, and, most importantly, instruction-following control via natural-language descriptions. To scale training, we develop a multi-stage training recipe together with a staged data pipeline covering video captioning and speech captioning, voice-quality assessment, and reward modeling. To push the frontier of open-source TTS, we release our model weights, fine-tuning code, and an SGLang-based inference engine. The inference engine is production-ready for streaming, achieving an RTF of 0.195 and a time-to-first-audio below 100 ms.Our code and weights are available on GitHub (https://github.com/fishaudio/fish-speech) and Hugging Face (https://huggingface.co/fishaudio/s2-pro). We highly encourage readers to visit https://fish.audio to try custom voices.

RESULT

ScienceToStartup currently rates this 9.0/10 on the public viability pass. We highly encourage readers to visit https://fish.audio to try custom voices.

WHY NOW

Text-to-Speech moved forward this cycle; last verified April 2026. Public score 9.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score9.0

PainFish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation.

Evidence0 refs | 0 sources | 33% coverage

Blockermissing authors

Analysis summary

Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: partial proof status

Competitive landscape

Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation.

Segment

Text-to-Speech

Adoption evidence

No public code link in the paper record yet

Commercial read

9.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "3e29f61d-ec7d-4665-bf77-5f214f9e107e", "arxiv_id": "2603.08823", "canonical_route": "/paper/fish-audio-s2-technical-report", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "fish-audio-s2-technical-report", "endpoints": { "paper_pack": "/api/v1/paper/fish-audio-s2-technical-report/paper-pack", "build_passport": "/api/v1/paper/fish-audio-s2-technical-report/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Fish Audio S2 Technical Report", "normalized_query": "2603.08823", "route": "/paper/fish-audio-s2-technical-report", "paper_ref": "fish-audio-s2-technical-report", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/fish-audio-s2-technical-report#webpage", "url": "https://sciencetostartup.com/paper/fish-audio-s2-technical-report", "name": "Fish Audio S2 Technical Report", "description": "Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/fish-audio-s2-technical-report#scholarlyArticle", "headline": "Fish Audio S2 Technical Report", "description": "Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation.", "url": "https://sciencetostartup.com/paper/fish-audio-s2-technical-report", "sameAs": "https://arxiv.org/abs/2603.08823", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.08823" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-09T18:34:33.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 9 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Text-to-Speech" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Text-to-Speech", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Fish Audio S2 Technical Report", "item": "https://sciencetostartup.com/paper/fish-audio-s2-technical-report" } ] } ] }

Competitive landscape

Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation.

Segment

Text-to-Speech

Adoption evidence

No public code link in the paper record yet

Commercial read

9.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Fish Audio S2 Technical Report

Fish Audio S2 Technical Report

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline