ARXIV:2605.20936 · LLM ARCHITECTURE SEARCH · SUBMITTED 21 MAY · 20:29 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

Weizhe Chen · Miao Zhang · Junpeng Jiang · Yaping Li · Weili Guan · Liqiang Nie · arXiv

A fast, differentiable framework for designing efficient hybrid attention architectures in Large Language Models within minutes.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A fast, differentiable framework for designing efficient hybrid attention architectures in Large Language Models within minutes.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A fast, differentiable framework for designing efficient hybrid attention architectures in Large Language Models within minutes. Existing designs often rely on manual empirical rules or proxy-based selector signals for layer-wise operator allocation.

METHOD

Full abstract

Hybrid attention architectures are becoming an increasingly important paradigm for improving LLM inference efficiency while preserving model quality, making hybrid architecture design a central problem. Existing designs often rely on manual empirical rules or proxy-based selector signals for layer-wise operator allocation. Recent NAS-style systems such as Jet-Nemotron demonstrate the promise of automated hybrid architecture search. However, Jet-Nemotron's PostNAS search stages alone use 200B tokens, making such search pipelines difficult to use as routine methods for hybrid architecture design. We introduce DASH, a fast differentiable search framework for hybrid attention architecture design, which relaxes discrete layer-wise attention operator placement into continuous architecture logits, prepares reusable teacher-aligned linear candidates, and performs architecture-only search with model and operator weights frozen to significantly enhance search efficiency. On Qwen2.5-3B-Instruct, DASH consistently outperforms a comprehensive suite of existing selector-style hybrid attention design baselines, showing that direct differentiable search can discover stronger hybrid architectures. Moreover, DASH achieves stronger RULER performance than released Jet-Nemotron models while remaining competitive on overlapping short-context and general benchmarks. Notably, each DASH search run uses only 12.3M tokens and takes about 20 minutes on a single RTX Pro 6000 GPU, corresponding to merely 0.006% of the PostNAS search tokens reported by Jet-Nemotron. These results suggest that high-quality hybrid attention architectures can be obtained through minutes-level differentiable search, providing a promising direction for hybrid architecture design.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Recent NAS-style systems such as Jet-Nemotron demonstrate the promise of automated hybrid architecture search. Code availability is flagged in the production record; the public…

WHY NOW

LLM Architecture Search moved forward this cycle; last verified May 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA fast, differentiable framework for designing efficient hybrid attention architectures in Large Language Models within minutes.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A fast, differentiable framework for designing efficient hybrid attention architectures in Large Language Models within minutes.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A fast, differentiable framework for designing efficient hybrid attention architectures in Large Language Models within minutes.

Segment

LLM Architecture Search

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "13300464-7f1f-477a-89bc-5254d00833b4", "arxiv_id": "2605.20936", "canonical_route": "/paper/dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu", "endpoints": { "paper_pack": "/api/v1/paper/dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu/paper-pack", "build_passport": "/api/v1/paper/dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU", "normalized_query": "2605.20936", "route": "/paper/dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu", "paper_ref": "dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu#webpage", "url": "https://sciencetostartup.com/paper/dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu", "name": "DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU", "description": "A fast, differentiable framework for designing efficient hybrid attention architectures in Large Language Models within minutes.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu#scholarlyArticle", "headline": "DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU", "description": "A fast, differentiable framework for designing efficient hybrid attention architectures in Large Language Models within minutes.", "url": "https://sciencetostartup.com/paper/dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu", "sameAs": "https://arxiv.org/abs/2605.20936", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.20936" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-20T09:21:22.000Z", "author": [ { "@type": "Person", "name": "Weizhe Chen" }, { "@type": "Person", "name": "Miao Zhang" }, { "@type": "Person", "name": "Junpeng Jiang" }, { "@type": "Person", "name": "Yaping Li" }, { "@type": "Person", "name": "Weili Guan" }, { "@type": "Person", "name": "Liqiang Nie" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Architecture Search" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Architecture Search", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "DASH: Fast Differentiable Architecture Search for Hybrid Att", "item": "https://sciencetostartup.com/paper/dash-fast-differentiable-architecture-search-for-hybrid-attention-in-minutes-on-a-single-gpu" } ] } ] }

Competitive landscape

A fast, differentiable framework for designing efficient hybrid attention architectures in Large Language Models within minutes.

Segment

LLM Architecture Search

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline