ARXIV:2604.08826 · LLM TRAINING EFFICIENCY · SUBMITTED 13 APR · 20:28 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

Mehran Taghian · Yunke Peng · Xing Huang · Yao Wang · Yaoyuan Wang · Wei Guo · +19 at arXiv

Investigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency.

Ship in 2-4 weeks›Score3.0Evidence unverified

Opportunity summary

Pain Investigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Investigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency. However, training and deploying such models incur substantial computational and memory costs, motivating the development of low-precision training…

METHOD

Full abstract

Large foundation models have become central to modern machine learning, with performance scaling predictably with model size and data. However, training and deploying such models incur substantial computational and memory costs, motivating the development of low-precision training techniques. Recent work has demonstrated that 4-bit floating-point (FP4) formats--such as MXFP4 and NVFP4--can be successfully applied to linear GEMM operations in large language models (LLMs), achieving up to 4x improvements in compute throughput and memory efficiency compared to higher-precision baselines. In this work, we investigate the recently proposed HiFloat4 FP4 format for Huawei Ascend NPUs and systematically compare it with MXFP4 in large-scale training settings. All experiments are conducted on Ascend NPU clusters, with linear and expert GEMM operations performed entirely in FP4 precision. We evaluate both dense architectures (e.g., Pangu and LLaMA-style models) and mixture-of-experts (MoE) models, where both standard linear layers and expert-specific GEMMs operate in FP4. Furthermore, we explore stabilization techniques tailored to FP4 training that significantly reduce numerical degradation, maintaining relative error within 1% of full-precision baselines while preserving the efficiency benefits of 4-bit computation. Our results provide a comprehensive empirical study of FP4 training on NPUs and highlight the practical trade-offs between FP4 formats in large-scale dense and MoE models.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. Our results provide a comprehensive empirical study of FP4 training on NPUs and highlight the practical trade-offs between FP4 formats in large-scale dense and…

WHY NOW

LLM Training Efficiency moved forward this cycle; last verified April 2026. Public score 3.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainInvestigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Investigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Investigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency.

Segment

LLM Training Efficiency

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "253949c3-254e-440e-8c81-7c1240f86541", "arxiv_id": "2604.08826", "canonical_route": "/paper/hifloat4-format-for-language-model-pre-training-on-ascend-npus", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "hifloat4-format-for-language-model-pre-training-on-ascend-npus", "endpoints": { "paper_pack": "/api/v1/paper/hifloat4-format-for-language-model-pre-training-on-ascend-npus/paper-pack", "build_passport": "/api/v1/paper/hifloat4-format-for-language-model-pre-training-on-ascend-npus/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "HiFloat4 Format for Language Model Pre-training on Ascend NPUs", "normalized_query": "2604.08826", "route": "/paper/hifloat4-format-for-language-model-pre-training-on-ascend-npus", "paper_ref": "hifloat4-format-for-language-model-pre-training-on-ascend-npus", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/hifloat4-format-for-language-model-pre-training-on-ascend-npus#webpage", "url": "https://sciencetostartup.com/paper/hifloat4-format-for-language-model-pre-training-on-ascend-npus", "name": "HiFloat4 Format for Language Model Pre-training on Ascend NPUs", "description": "Investigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/hifloat4-format-for-language-model-pre-training-on-ascend-npus#scholarlyArticle", "headline": "HiFloat4 Format for Language Model Pre-training on Ascend NPUs", "description": "Investigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency.", "url": "https://sciencetostartup.com/paper/hifloat4-format-for-language-model-pre-training-on-ascend-npus", "sameAs": "https://arxiv.org/abs/2604.08826", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.08826" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-09T23:50:56.000Z", "author": [ { "@type": "Person", "name": "Mehran Taghian" }, { "@type": "Person", "name": "Yunke Peng" }, { "@type": "Person", "name": "Xing Huang" }, { "@type": "Person", "name": "Yao Wang" }, { "@type": "Person", "name": "Yaoyuan Wang" }, { "@type": "Person", "name": "Wei Guo" }, { "@type": "Person", "name": "Yuanyong Luo" }, { "@type": "Person", "name": "Tianchi Hu" }, { "@type": "Person", "name": "Junsong Wang" }, { "@type": "Person", "name": "Xin Wang" }, { "@type": "Person", "name": "Hu Liu" }, { "@type": "Person", "name": "Yu Cheng" }, { "@type": "Person", "name": "Ziwei Yu" }, { "@type": "Person", "name": "Hongliang Li" }, { "@type": "Person", "name": "Mehdi Rahimifar" }, { "@type": "Person", "name": "Lei Yan" }, { "@type": "Person", "name": "Xuefei Wang" }, { "@type": "Person", "name": "Zhuang Ma" }, { "@type": "Person", "name": "Lei Liu" }, { "@type": "Person", "name": "Hui Yu" }, { "@type": "Person", "name": "Anandharaju Durai Raju" }, { "@type": "Person", "name": "Hoang Le" }, { "@type": "Person", "name": "Hei Yi Mak" }, { "@type": "Person", "name": "Tanzila Rahman" }, { "@type": "Person", "name": "Shadan Golestan" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Training Efficiency" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Training Efficiency", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "HiFloat4 Format for Language Model Pre-training on Ascend NP", "item": "https://sciencetostartup.com/paper/hifloat4-format-for-language-model-pre-training-on-ascend-npus" } ] } ] }

Competitive landscape

Investigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency.

Segment

LLM Training Efficiency

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline