ARXIV:2605.16026 · SPEECH TRANSLATION · SUBMITTED 18 MAY · 20:32 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation

Yu Pan · Yang Hou · Xiongfei Wu · Liang Zhang · Yves Le Traon · Lei Ma · +1 at arXiv

A multilingual speech-to-speech translation system that uses linguistic typological priors for improved data efficiency.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain A multilingual speech-to-speech translation system that uses linguistic typological priors for improved data efficiency.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A multilingual speech-to-speech translation system that uses linguistic typological priors for improved data efficiency. However, existing S2ST systems often either neglect source-language information or encode it through a language-as-label paradigm, representing each source language…

METHOD

Full abstract

Compositional speech-to-speech translation (S2ST) systems built upon speech large language models (SpeechLLMs) have recently shown promising performance. However, existing S2ST systems often either neglect source-language information or encode it through a language-as-label paradigm, representing each source language as an independent flat embedding. Such a design overlooks systematic linguistic structure shared across languages, which may limit data-efficient multilingual adaptation when supervised S2ST data are scarce. To address this issue, we propose S2ST-Omni 2, a many-to-one compositional S2ST framework that systematically reformulates multilingual language conditioning from flat language labels to structured typological priors. Specifically, S2ST-Omni 2 revisits language conditioning at three levels: typology-informed hierarchical language encoding for structured source-language representation, dynamically-gated language-aware Dual-CTC for content-adaptive acoustic modulation, and typology-aware LLM prompting for decoder-side linguistic guidance. Experiments on CVSS-C show that S2ST-Omni 2 achieves superior average performance among representative S2ST approaches across BLEU, COMET, ASR-BLEU, and BLASER 2.0 under the adopted evaluation protocol. Ablation studies indicate that the proposed representation-level, acoustic-level, and decoding-level strategies provide complementary benefits. Moreover, controlled data-budget analyses and a Japanese-to-English evaluation using only approximately 3 hours of supervised training data suggest that explicit typological priors provide useful inductive biases for data-efficient multilingual S2ST.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. Experiments on CVSS-C show that S2ST-Omni 2 achieves superior average performance among representative S2ST approaches across BLEU, COMET, ASR-BLEU, and BLASER 2.0 under the…

WHY NOW

Speech Translation moved forward this cycle; last verified May 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA multilingual speech-to-speech translation system that uses linguistic typological priors for improved data efficiency.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A multilingual speech-to-speech translation system that uses linguistic typological priors for improved data efficiency.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A multilingual speech-to-speech translation system that uses linguistic typological priors for improved data efficiency.

Segment

Speech Translation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "b6904c56-a6fb-4a5e-9473-c8d2da7c10f5", "arxiv_id": "2605.16026", "canonical_route": "/paper/from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans", "endpoints": { "paper_pack": "/api/v1/paper/from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans/paper-pack", "build_passport": "/api/v1/paper/from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation", "normalized_query": "2605.16026", "route": "/paper/from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans", "paper_ref": "from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans#webpage", "url": "https://sciencetostartup.com/paper/from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans", "name": "From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation", "description": "A multilingual speech-to-speech translation system that uses linguistic typological priors for improved data efficiency.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans#scholarlyArticle", "headline": "From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation", "description": "A multilingual speech-to-speech translation system that uses linguistic typological priors for improved data efficiency.", "url": "https://sciencetostartup.com/paper/from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans", "sameAs": "https://arxiv.org/abs/2605.16026", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.16026" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-15T15:01:45.000Z", "author": [ { "@type": "Person", "name": "Yu Pan" }, { "@type": "Person", "name": "Yang Hou" }, { "@type": "Person", "name": "Xiongfei Wu" }, { "@type": "Person", "name": "Liang Zhang" }, { "@type": "Person", "name": "Yves Le Traon" }, { "@type": "Person", "name": "Lei Ma" }, { "@type": "Person", "name": "Jianjun Zhao" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Speech Translation" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Speech Translation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "From Flat Language Labels to Typological Priors: Structured ", "item": "https://sciencetostartup.com/paper/from-flat-language-labels-to-typological-priors-structured-language-conditioning-for-multilingual-speech-to-speech-trans" } ] } ] }

Competitive landscape

A multilingual speech-to-speech translation system that uses linguistic typological priors for improved data efficiency.

Segment

Speech Translation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation

From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline