ARXIV:2604.02881 · LLM TRAINING · SUBMITTED 06 APR · 20:17 UTC · FRESHNESS UNKNOWN

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

Baban Gain · Asif Ekbal · Trilok Nath Singh · arXiv

This research explains why merging fine-tuned language models fails for multilingual translation, identifying representational divergence as the cause.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain This research explains why merging fine-tuned language models fails for multilingual translation, identifying representational divergence as the cause.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

This research explains why merging fine-tuned language models fails for multilingual translation, identifying representational divergence as the cause. While merging succeeds in multitask settings, its behavior in multilingual contexts remains poorly understood.

METHOD

Full abstract

Weight-space model merging combines independently fine-tuned models without accessing original training data, offering a practical alternative to joint training. While merging succeeds in multitask settings, its behavior in multilingual contexts remains poorly understood. We systematically study weight-space merging for multilingual machine translation by fully fine-tuning language model on large-scale bilingual corpora and evaluating standard merging strategies. Our experiments reveal that merging degrades performance, especially when target languages differ. To explain this failure, we analyze internal representations using span-conditioned neuron selectivity and layer-wise centered kernel alignment. We find that language-specific neurons concentrate in embedding layers and upper transformer blocks, while intermediate layers remain largely shared across languages. Critically, fine-tuning redistributes rather than sharpens language selectivity: neurons for supervised and related languages become less exclusive, while those for unsupervised languages grow more isolated. This redistribution increases representational divergence in higher layers that govern generation. These findings suggest that multilingual fine-tuning may reshape geometry in ways that reduce compatibility with standard weight-space merging assumptions. Our work thus provides an explanation for why merging fails in multilingual translation scenarios.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. Our work thus provides an explanation for why merging fails in multilingual translation scenarios.

WHY NOW

LLM Training moved forward this cycle; last verified April 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainThis research explains why merging fine-tuned language models fails for multilingual translation, identifying representational divergence as the cause.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

This research explains why merging fine-tuned language models fails for multilingual translation, identifying representational divergence as the cause.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

This research explains why merging fine-tuned language models fails for multilingual translation, identifying representational divergence as the cause.

Segment

LLM Training

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "49bdc59f-8196-47ed-b3ea-cc2a8df788b8", "arxiv_id": "2604.02881", "canonical_route": "/paper/one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging", "endpoints": { "paper_pack": "/api/v1/paper/one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging/paper-pack", "build_passport": "/api/v1/paper/one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging", "normalized_query": "2604.02881", "route": "/paper/one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging", "paper_ref": "one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging#webpage", "url": "https://sciencetostartup.com/paper/one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging", "name": "One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging", "description": "This research explains why merging fine-tuned language models fails for multilingual translation, identifying representational divergence as the cause.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging#scholarlyArticle", "headline": "One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging", "description": "This research explains why merging fine-tuned language models fails for multilingual translation, identifying representational divergence as the cause.", "url": "https://sciencetostartup.com/paper/one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging", "sameAs": "https://arxiv.org/abs/2604.02881", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.02881" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-03T08:45:26.000Z", "author": [ { "@type": "Person", "name": "Baban Gain" }, { "@type": "Person", "name": "Asif Ekbal" }, { "@type": "Person", "name": "Trilok Nath Singh" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Training" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Training", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "One Model to Translate Them All? A Journey to Mount Doom for", "item": "https://sciencetostartup.com/paper/one-model-to-translate-them-all-a-journey-to-mount-doom-for-multilingual-model-merging" } ] } ] }

Competitive landscape

This research explains why merging fine-tuned language models fails for multilingual translation, identifying representational divergence as the cause.

Segment

LLM Training

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline