ARXIV:2605.06885 · LLM TRAINING · SUBMITTED 11 MAY · 20:36 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

Fred Zhangzhi Peng · Alexis Fox · Anru R. Zhang · Alexander Tong · arXiv

Accelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x.

Ship in 2-4 weeks›Score7.0Evidence verified

Opportunity summary

Pain Accelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x.

Evidence 0 refs | 4 sources | 83% coverage

Blocker Evidence verified

Open Build Read PDF Signal Canvas Track

PROBLEM

Accelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x. Although recent work has shown that pretrained autoregressive checkpoints can be converted into diffusion language…

METHOD

Full abstract

Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive checkpoints can be converted into diffusion language models, existing recipes primarily transfer parameters through continued denoising training with objective- and attention-level modifications. We instead ask whether the internal representation geometry learned by next-token prediction can be explicitly preserved during AR-to-DLM conversion. We hypothesize that much of the semantic structure learned by AR pretraining can transfer across generation orders, and thus DLM training should be viewed as relearning the decoding path rather than relearning language representations. To investigate this, we introduce REPR-ALIGN, a representation alignment objective that adapts a bidirectional masked diffusion model to reuse representations from a pretrained AR model of identical architecture. Concretely, we align the hidden states of the DLM to the frozen AR model at every layer using cosine similarity, while optimizing the standard masked denoising objective. This simple alignment, with no adapters and no architectural changes beyond the attention mask, yields up to 4x training acceleration in our setting and is particularly effective in low-data regimes. Our results suggest that linguistic representations can transfer across generation order, and that representation alignment provides a simple and effective technique for training diffusion language models. Code is available at https://github.com/pengzhangzhi/Open-dLLM.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Our results suggest that linguistic representations can transfer across generation order, and that representation alignment provides a simple and effective technique for training diffusion…

WHY NOW

LLM Training moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAccelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x.

Evidence0 refs | 4 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

Accelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Competitive landscape

Accelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x.

Segment

LLM Training

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "97e83bfe-cc7b-4623-a8d0-06f45c322740", "arxiv_id": "2605.06885", "canonical_route": "/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment", "endpoints": { "paper_pack": "/api/v1/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment/paper-pack", "build_passport": "/api/v1/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment", "normalized_query": "2605.06885", "route": "/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment", "paper_ref": "don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment#webpage", "url": "https://sciencetostartup.com/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment", "name": "Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment", "description": "Accelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment#scholarlyArticle", "headline": "Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment", "description": "Accelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x.", "url": "https://sciencetostartup.com/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment", "sameAs": "https://arxiv.org/abs/2605.06885", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.06885" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-07T19:35:48.000Z", "author": [ { "@type": "Person", "name": "Fred Zhangzhi Peng" }, { "@type": "Person", "name": "Alexis Fox" }, { "@type": "Person", "name": "Anru R. Zhang" }, { "@type": "Person", "name": "Alexander Tong" } ], "codeRepository": "https://github.com/pengzhangzhi/Open-dLLM", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Training" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment#software", "name": "Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment - Source Code", "description": "Accelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x.", "codeRepository": "https://github.com/pengzhangzhi/Open-dLLM", "url": "https://github.com/pengzhangzhi/Open-dLLM" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Training", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Don't Retrain, Align: Adapting Autoregressive LMs to Diffusi", "item": "https://sciencetostartup.com/paper/don-t-retrain-align-adapting-autoregressive-lms-to-diffusion-lms-via-representation-alignment" } ] } ] }

Competitive landscape

Accelerate diffusion language model training by aligning representations with existing autoregressive models, reducing training time by up to 4x.

Segment

LLM Training

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline