ARXIV:2604.01489 · GPU KERNEL GENERATION · SUBMITTED 03 APR · 20:17 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

Tara Saba · Anne Ouyang · Xujie Si · Fan Long · arXiv

An LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain An LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

An LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation. Recent work has explored using large language models (LLMs) to generate GPU kernels automatically,…

METHOD

Full abstract

High-performance GPU kernels are critical to modern machine learning systems, yet developing efficient implementations remains a challenging, expert-driven process due to the tight coupling between algorithmic structure, memory hierarchy usage, and hardware-specific optimizations. Recent work has explored using large language models (LLMs) to generate GPU kernels automatically, but generated implementations often struggle to maintain correctness and achieve competitive performance across iterative refinements. We present CuTeGen, an agentic framework for automated generation and optimization of GPU kernels that treats kernel development as a structured generate--test--refine workflow. Unlike approaches that rely on one-shot generation or large-scale search over candidate implementations, CuTeGen focuses on progressive refinement of a single evolving kernel through execution-based validation, structured debugging, and staged optimization. A key design choice is to generate kernels using the CuTe abstraction layer, which exposes performance-critical structures such as tiling and data movement while providing a more stable representation for iterative modification. To guide performance improvement, CuTeGen incorporates workload-aware optimization prompts and delayed integration of profiling feedback. Experimental results on matrix multiplication and activation workloads demonstrate that the framework produces functionally correct kernels and achieves competitive performance relative to optimized library implementations.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Recent work has explored using large language models (LLMs) to generate GPU kernels automatically, but generated implementations often struggle to maintain correctness and achieve…

WHY NOW

GPU Kernel Generation moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAn LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation.

Evidence0 refs | 0 sources | 17% coverage

Blockerno shell-level blocker reported

Analysis summary

An LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

An LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation.

Segment

GPU Kernel Generation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "1ccb55f2-1b60-4a69-9d8a-62df53ae1b51", "arxiv_id": "2604.01489", "canonical_route": "/paper/cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute", "endpoints": { "paper_pack": "/api/v1/paper/cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute/paper-pack", "build_passport": "/api/v1/paper/cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe", "normalized_query": "2604.01489", "route": "/paper/cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute", "paper_ref": "cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute#webpage", "url": "https://sciencetostartup.com/paper/cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute", "name": "CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe", "description": "An LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute#scholarlyArticle", "headline": "CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe", "description": "An LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation.", "url": "https://sciencetostartup.com/paper/cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute", "sameAs": "https://arxiv.org/abs/2604.01489", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.01489" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-01T23:55:23.000Z", "author": [ { "@type": "Person", "name": "Tara Saba" }, { "@type": "Person", "name": "Anne Ouyang" }, { "@type": "Person", "name": "Xujie Si" }, { "@type": "Person", "name": "Fan Long" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "GPU Kernel Generation" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "GPU Kernel Generation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "CuTeGen: An LLM-Based Agentic Framework for Generation and O", "item": "https://sciencetostartup.com/paper/cutegen-an-llm-based-agentic-framework-for-generation-and-optimization-of-high-performance-gpu-kernels-using-cute" } ] } ] }

Competitive landscape

An LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation.

Segment

GPU Kernel Generation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline