ARXIV:2603.28342 · AI FOR SYSTEM OPTIMIZATION · SUBMITTED 31 MAR · 20:16 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

He Du · Qiming Ge · Jiakai Hu · Aijun Yang · Zheng Cai · Zixian Huang · +15 at arXiv

Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods.

Ship in 2-4 weeks›Score8.0Evidence unverified

Opportunity summary

Pain Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods.

Evidence 64 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing…

METHOD

Full abstract

We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together with structured execution feedback on compilation, correctness, and speedup. To make this search reliable, we build backend-specific evaluation services for Triton on NVIDIA GPUs and Maca on MetaX GPUs. On the training side, we convert long-horizon evolution trajectories into step-centric supervision and reinforcement learning signals by retaining correctness-preserving, high-gain revisions, so that the model is optimized as a strong local improver inside the evolutionary loop rather than as a one-shot generator. Under a unified evolutionary protocol, Kernel-Smith-235B-RL achieves state-of-the-art overall performance on KernelBench with Nvidia Triton backend, attaining the best average speedup ratio and outperforming frontier proprietary models including Gemini-3.0-pro and Claude-4.6-opus. We further validate the framework on the MetaX MACA backend, where our Kernel-Smith-MACA-30B surpasses large-scale counterparts such as DeepSeek-V3.2-think and Qwen3-235B-2507-think, highlighting potential for seamless adaptation across heterogeneous platforms. Beyond benchmark results, the same workflow produces upstream contributions to production systems including SGLang and LMDeploy, demonstrating that LLM-driven kernel optimization can transfer from controlled evaluation to practical deployment.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together…

WHY NOW

AI for System Optimization moved forward this cycle; last verified April 2026. Public score 8.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainKernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods.

Evidence64 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods.

Segment

AI for System Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "c5dd7435-85e0-4a74-897f-4844e8a7f803", "arxiv_id": "2603.28342", "canonical_route": "/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization", "endpoints": { "paper_pack": "/api/v1/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization/paper-pack", "build_passport": "/api/v1/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization", "normalized_query": "2603.28342", "route": "/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization", "paper_ref": "kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization#webpage", "url": "https://sciencetostartup.com/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization", "name": "Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization", "description": "Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization#scholarlyArticle", "headline": "Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization", "description": "Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods.", "url": "https://sciencetostartup.com/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization", "sameAs": "https://arxiv.org/abs/2603.28342", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.28342" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-30T12:12:49.000Z", "author": [ { "@type": "Person", "name": "He Du", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Qiming Ge", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Jiakai Hu", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Aijun Yang", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Zheng Cai", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Zixian Huang", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Sheng Yuan", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Qinxiu Cheng", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Xinchen Xie", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Yicheng Chen", "affiliation": { "@type": "Organization", "name": "Fudan University" } }, { "@type": "Person", "name": "Yining Li", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Jiaxing Xie", "affiliation": { "@type": "Organization", "name": "MetaX" } }, { "@type": "Person", "name": "Huanan Dong", "affiliation": { "@type": "Organization", "name": "MetaX" } }, { "@type": "Person", "name": "Yaguang Wu", "affiliation": { "@type": "Organization", "name": "MetaX" } }, { "@type": "Person", "name": "Xiangjun Huang", "affiliation": { "@type": "Organization", "name": "MetaX" } }, { "@type": "Person", "name": "Jian Yang", "affiliation": { "@type": "Organization", "name": "MetaX" } }, { "@type": "Person", "name": "Hui Wang", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Bowen Zhou", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Bowen Li", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Qipeng Guo", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } }, { "@type": "Person", "name": "Kai Chen", "affiliation": { "@type": "Organization", "name": "Shanghai AI Laboratory" } } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI for System Optimization" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI for System Optimization", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optim", "item": "https://sciencetostartup.com/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optim\"?", "acceptedAnswer": { "@type": "Answer", "text": "Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Productize as an API or integrated module that can automatically optimize and deploy efficient GPU kernels for cloud service providers and enterprise AI users." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "A platform for optimizing GPU operations in data centers, enabling faster AI model training and execution, thereby reducing energy consumption and operational costs." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "Kernel-Smith could replace traditional manual optimization techniques and be incorporated into existing AI model development workflows, improving efficiency and performance." } } ] } ] }

Competitive landscape

Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art methods.

Segment

AI for System Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline