ARXIV:2604.17789 · LLM QUANTIZATION · SUBMITTED 21 APR · 02:40 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

Haokun Lin · Xinle Jia · Haobo Xu · Bingchen Yao · Xianglong Guo · Yichen Wu · +4 at arXiv

Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration.

Evidence 0 refs | 5 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration. However, activation outliers pose a unique challenge under this format: a single outlier inflates the shared block scale, compressing the effective dynamic range…

METHOD

Full abstract

The MXFP4 microscaling format, which partitions tensors into blocks of 32 elements sharing an E8M0 scaling factor, has emerged as a promising substrate for efficient LLM inference, backed by native hardware support on NVIDIA Blackwell Tensor Cores. However, activation outliers pose a unique challenge under this format: a single outlier inflates the shared block scale, compressing the effective dynamic range of the remaining elements and causing significant quantization error. Existing rotation-based remedies, including randomized Hadamard and learnable rotations, are data-agnostic and therefore unable to specifically target the channels where outliers concentrate. We propose DuQuant++, which adapts the outlier-aware fine-grained rotation of DuQuant to the MXFP4 format by aligning the rotation block size with the microscaling group size (B{=}32). Because each MXFP4 group possesses an independent scaling factor, the cross-block variance issue that necessitates dual rotations and a zigzag permutation in the original DuQuant becomes irrelevant, enabling DuQuant++ to replace the entire pipeline with a single outlier-aware rotation, which halves the online rotation cost while simultaneously smoothing the weight distribution. Extensive experiments on the LLaMA-3 family under MXFP4 W4A4 quantization show that DuQuant++ consistently achieves state-of-the-art performance. Our code is available at https://github.com/Hsu1023/DuQuant++.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. The MXFP4 microscaling format, which partitions tensors into blocks of 32 elements sharing an E8M0 scaling factor, has emerged as a promising substrate for…

WHY NOW

LLM Quantization moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainOptimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration.

Evidence0 refs | 5 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration.

Segment

LLM Quantization

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "07848f13-7a83-45e1-8cce-87906bed3817", "arxiv_id": "2604.17789", "canonical_route": "/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization", "endpoints": { "paper_pack": "/api/v1/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization/paper-pack", "build_passport": "/api/v1/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization", "normalized_query": "2604.17789", "route": "/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization", "paper_ref": "duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization#webpage", "url": "https://sciencetostartup.com/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization", "name": "DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization", "description": "Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization#scholarlyArticle", "headline": "DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization", "description": "Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration.", "url": "https://sciencetostartup.com/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization", "sameAs": "https://arxiv.org/abs/2604.17789", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.17789" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-20T04:27:28.000Z", "author": [ { "@type": "Person", "name": "Haokun Lin", "affiliation": { "@type": "Organization", "name": "CASIA" } }, { "@type": "Person", "name": "Xinle Jia", "affiliation": { "@type": "Organization", "name": "NJU" } }, { "@type": "Person", "name": "Haobo Xu", "affiliation": { "@type": "Organization", "name": "THU" } }, { "@type": "Person", "name": "Bingchen Yao", "affiliation": { "@type": "Organization", "name": "ZJU" } }, { "@type": "Person", "name": "Xianglong Guo", "affiliation": { "@type": "Organization", "name": "CASIA" } }, { "@type": "Person", "name": "Yichen Wu", "affiliation": { "@type": "Organization", "name": "Harvard" } }, { "@type": "Person", "name": "Zhichao Lu", "affiliation": { "@type": "Organization", "name": "CityU" } }, { "@type": "Person", "name": "Ying Wei", "affiliation": { "@type": "Organization", "name": "ZJU" } }, { "@type": "Person", "name": "Qingfu Zhang", "affiliation": { "@type": "Organization", "name": "CityU" } }, { "@type": "Person", "name": "Zhenan Sun", "affiliation": { "@type": "Organization", "name": "CASIA" } } ], "codeRepository": "https://github.com/Hsu1023/DuQuant", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Quantization" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization#software", "name": "DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization - Source Code", "description": "Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration.", "codeRepository": "https://github.com/Hsu1023/DuQuant", "url": "https://github.com/Hsu1023/DuQuant" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Quantization", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Q", "item": "https://sciencetostartup.com/paper/duquant-fine-grained-rotation-enhances-microscaling-fp4-quantization" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Q\"?", "acceptedAnswer": { "@type": "Answer", "text": "Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "DuQuant++ can be packaged as a backend service or library for optimizing inference in AI deployments, ideally suited for technology firms deploying AI solutions across various industries that require low latency and efficient compute." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "Integrate DuQuant++ into a SaaS platform for optimizing machine learning model deployments, targeting companies that need high-efficiency AI model operations on edge devices or data centers with limited GPU capabilities." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "DuQuant++ can replace less efficient quantization techniques and the need for large infrastructure investments in AI deployments, offering a more software-centric solution to performance optimization." } } ] } ] }

Competitive landscape

Optimize LLM inference with DuQuant++, enabling efficient quantization for hardware acceleration.

Segment

LLM Quantization

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline