ARXIV:2603.28086 · SPEECH SYNTHESIS · SUBMITTED 31 MAR · 20:21 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions

Kexin Huang · Liwei Fan · Botian Jiang · Yaozhou Jiang · Qian Tu · Jie Zhu · +8 at arXiv

Generate realistic, expressive voices from natural language descriptions for applications like storytelling and game dubbing.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain Generate realistic, expressive voices from natural language descriptions for applications like storytelling and game dubbing.

Evidence 38 refs | 9 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Generate realistic, expressive voices from natural language descriptions for applications like storytelling and game dubbing. Such controllable voice creation benefits a wide range of downstream applications-including storytelling, game dubbing, role-play agents, and conversational assistants,…

METHOD

Full abstract

Voice design from natural language aims to generate speaker timbres directly from free-form textual descriptions, allowing users to create voices tailored to specific roles, personalities, and emotions. Such controllable voice creation benefits a wide range of downstream applications-including storytelling, game dubbing, role-play agents, and conversational assistants, making it a significant task for modern Text-to-Speech models. However, existing models are largely trained on carefully recorded studio data, which produces speech that is clean and well-articulated, yet lacks the lived-in qualities of real human voices. To address these limitations, we present MOSS-VoiceGenerator, an open-source instruction-driven voice generation model that creates new timbres directly from natural language prompts. Motivated by the hypothesis that exposure to real-world acoustic variation produces more perceptually natural voices, we train on large-scale expressive speech data sourced from cinematic content. Subjective preference studies demonstrate its superiority in overall performance, instruction-following, and naturalness compared to other voice design models.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Subjective preference studies demonstrate its superiority in overall performance, instruction-following, and naturalness compared to other voice design models. Code availability is flagged in the…

WHY NOW

Speech Synthesis moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainGenerate realistic, expressive voices from natural language descriptions for applications like storytelling and game dubbing.

Evidence38 refs | 9 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Generate realistic, expressive voices from natural language descriptions for applications like storytelling and game dubbing.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Generate realistic, expressive voices from natural language descriptions for applications like storytelling and game dubbing.

Segment

Speech Synthesis

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "82b8d768-276f-466f-a07b-8903c3c3d2ab", "arxiv_id": "2603.28086", "canonical_route": "/paper/moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions", "endpoints": { "paper_pack": "/api/v1/paper/moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions/paper-pack", "build_passport": "/api/v1/paper/moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions", "normalized_query": "2603.28086", "route": "/paper/moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions", "paper_ref": "moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions#webpage", "url": "https://sciencetostartup.com/paper/moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions", "name": "MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions", "description": "Generate realistic, expressive voices from natural language descriptions for applications like storytelling and game dubbing.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions#scholarlyArticle", "headline": "MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions", "description": "Generate realistic, expressive voices from natural language descriptions for applications like storytelling and game dubbing.", "url": "https://sciencetostartup.com/paper/moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions", "sameAs": "https://arxiv.org/abs/2603.28086", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.28086" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-30T06:40:59.000Z", "author": [ { "@type": "Person", "name": "Kexin Huang" }, { "@type": "Person", "name": "Liwei Fan" }, { "@type": "Person", "name": "Botian Jiang" }, { "@type": "Person", "name": "Yaozhou Jiang" }, { "@type": "Person", "name": "Qian Tu" }, { "@type": "Person", "name": "Jie Zhu" }, { "@type": "Person", "name": "Yuqian Zhang" }, { "@type": "Person", "name": "Yiwei Zhao" }, { "@type": "Person", "name": "Chenchen Yang" }, { "@type": "Person", "name": "Zhaoye Fei" }, { "@type": "Person", "name": "Shimin Li" }, { "@type": "Person", "name": "Xiaogui Yang" }, { "@type": "Person", "name": "Qinyuan Cheng" }, { "@type": "Person", "name": "Xipeng Qiu" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Speech Synthesis" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Speech Synthesis", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "MOSS-VoiceGenerator: Create Realistic Voices with Natural La", "item": "https://sciencetostartup.com/paper/moss-voicegenerator-create-realistic-voices-with-natural-language-descriptions" } ] } ] }

Competitive landscape

Generate realistic, expressive voices from natural language descriptions for applications like storytelling and game dubbing.

Segment

Speech Synthesis

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions

MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline