ARXIV:2604.00421 · LLM TRAINING · SUBMITTED 02 APR · 20:56 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Self-Routing: Parameter-Free Expert Routing from Hidden States

Jama Hussein Mohamud · Drew Wagner · Mirco Ravanelli · arXiv

A parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain A parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router. In this work, we ask whether a dedicated learned router is strictly necessary in the MoE settings we study.

METHOD

Full abstract

Mixture-of-Experts (MoE) layers increase model capacity by activating only a small subset of experts per token, and typically rely on a learned router to map hidden states to expert assignments. In this work, we ask whether a dedicated learned router is strictly necessary in the MoE settings we study. We propose Self-Routing, a parameter-free routing mechanism that uses a designated subspace of the token hidden state directly as expert logits, eliminating the router projection entirely while leaving the rest of the MoE layer unchanged. We evaluate Self-Routing on GPT-2-scale language modeling and ImageNet-1K classification by comparing it against a standard learned router, random-routing baselines, and dense non-MoE baselines. Our results show that Self-Routing remains competitive with the learned-router baseline while removing all dedicated routing parameters, and yields more balanced expert utilization, with about 17 % higher average normalized routing entropy and no explicit load-balancing loss. On ImageNet-1K with DeiT-S/16, Self-Routing also slightly improves over the corresponding learned-router MoE. These findings suggest that effective MoE routing can emerge from the hidden representation itself without requiring a separate learned router module.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. Our results show that Self-Routing remains competitive with the learned-router baseline while removing all dedicated routing parameters, and yields more balanced expert utilization, with…

WHY NOW

LLM Training moved forward this cycle; last verified April 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainA parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router.

Evidence0 refs | 0 sources | 17% coverage

Blockerno shell-level blocker reported

Analysis summary

A parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router.

Segment

LLM Training

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "c31eea9e-8a9a-4df7-9b59-76888fdc48cb", "arxiv_id": "2604.00421", "canonical_route": "/paper/self-routing-parameter-free-expert-routing-from-hidden-states", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "self-routing-parameter-free-expert-routing-from-hidden-states", "endpoints": { "paper_pack": "/api/v1/paper/self-routing-parameter-free-expert-routing-from-hidden-states/paper-pack", "build_passport": "/api/v1/paper/self-routing-parameter-free-expert-routing-from-hidden-states/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Self-Routing: Parameter-Free Expert Routing from Hidden States", "normalized_query": "2604.00421", "route": "/paper/self-routing-parameter-free-expert-routing-from-hidden-states", "paper_ref": "self-routing-parameter-free-expert-routing-from-hidden-states", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/self-routing-parameter-free-expert-routing-from-hidden-states#webpage", "url": "https://sciencetostartup.com/paper/self-routing-parameter-free-expert-routing-from-hidden-states", "name": "Self-Routing: Parameter-Free Expert Routing from Hidden States", "description": "A parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/self-routing-parameter-free-expert-routing-from-hidden-states#scholarlyArticle", "headline": "Self-Routing: Parameter-Free Expert Routing from Hidden States", "description": "A parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router.", "url": "https://sciencetostartup.com/paper/self-routing-parameter-free-expert-routing-from-hidden-states", "sameAs": "https://arxiv.org/abs/2604.00421", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.00421" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-01T03:05:20.000Z", "author": [ { "@type": "Person", "name": "Jama Hussein Mohamud" }, { "@type": "Person", "name": "Drew Wagner" }, { "@type": "Person", "name": "Mirco Ravanelli" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Training" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Training", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Self-Routing: Parameter-Free Expert Routing from Hidden Stat", "item": "https://sciencetostartup.com/paper/self-routing-parameter-free-expert-routing-from-hidden-states" } ] } ] }

Competitive landscape

A parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router.

Segment

LLM Training

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Self-Routing: Parameter-Free Expert Routing from Hidden States

Self-Routing: Parameter-Free Expert Routing from Hidden States

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline