ARXIV:2606.03489 · SECURE LLMS · SUBMITTED 03 JUN · 20:32 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs

Wenqi Chen · Ziyan Zhang · Bing Wang · Lin Liu · Hengheng Zhang · Zhengsu Chen · arXiv

A tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation.

Ship in 2-4 weeks›Score8.0Evidence partial

Opportunity summary

Pain A tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation.

Evidence 0 refs | 4 sources | 83% coverage

Blocker Evidence partial

Open Build Read PDF Signal Canvas Track

PROBLEM

A tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation. Current alignment techniques, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), typically apply coarse-grained…

METHOD

Full abstract

While Large Language Models (LLMs) excel in code generation, they remain prone to replicating subtle yet critical vulnerabilities endemic to their training data. Current alignment techniques, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), typically apply coarse-grained optimization at the sequence level. This approach often fails to address the localized nature of security flaws, where a single incorrect token choice can compromise an entire program. To bridge this gap, we introduce Tree-like Self-Play (TSP), a framework that reframes secure code generation as a fine-grained sequential decision process. Unlike standard methods that blindly maximize likelihood, TSP constructs a decision tree where the model explores branching trajectories--generating both secure "golden paths" and vulnerable variants. By treating code generation as a self-play game, the model learns to strictly discriminate against its own localized errors. This provides a dense, on-policy learning signal that forces self-correction precisely at the critical decision nodes where vulnerabilities typically emerge. Our experiments demonstrate that TSP fundamentally enhances model reliability. In Python security benchmarks, TSP boosts CodeLlama-7B's pass rate (SPR@1) to 75.8%, significantly outperforming SFT (57.0%) and unstructured self-play baselines. Crucially, TSP induces robust out-of-distribution generalization: the model not only reduces vulnerabilities in unseen categories (CWEs) by 24.5% but also successfully transfers security principles learned from C/C++ to diverse languages, including Python, Go, and JavaScript. This suggests that TSP does not merely memorize patches, but internalizes abstract, language-agnostic security logic.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. Our experiments demonstrate that TSP fundamentally enhances model reliability. A public repository is linked, so build verification can inspect implementation evidence instead of treating…

WHY NOW

Secure LLMs moved forward this cycle; last verified June 2026. Public score 8.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainA tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation.

Evidence0 refs | 4 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

A tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

Competitive landscape

A tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation.

Segment

Secure LLMs

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "35725469-d874-44bb-b1a7-0e40141b13d0", "arxiv_id": "2606.03489", "canonical_route": "/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms", "endpoints": { "paper_pack": "/api/v1/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms/paper-pack", "build_passport": "/api/v1/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs", "normalized_query": "2606.03489", "route": "/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms", "paper_ref": "learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms#webpage", "url": "https://sciencetostartup.com/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms", "name": "Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs", "description": "A tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms#scholarlyArticle", "headline": "Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs", "description": "A tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation.", "url": "https://sciencetostartup.com/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms", "sameAs": "https://arxiv.org/abs/2606.03489", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2606.03489" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-06-02T11:07:20.000Z", "author": [ { "@type": "Person", "name": "Wenqi Chen" }, { "@type": "Person", "name": "Ziyan Zhang" }, { "@type": "Person", "name": "Bing Wang" }, { "@type": "Person", "name": "Lin Liu" }, { "@type": "Person", "name": "Hengheng Zhang" }, { "@type": "Person", "name": "Zhengsu Chen" } ], "codeRepository": "https://github.com/Easonnoway/TSP", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Secure LLMs" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms#software", "name": "Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs - Source Code", "description": "A tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation.", "codeRepository": "https://github.com/Easonnoway/TSP", "url": "https://github.com/Easonnoway/TSP" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Secure LLMs", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Learn from Your Mistakes: Tree-like Self-Play for Secure Cod", "item": "https://sciencetostartup.com/paper/learn-from-your-mistakes-tree-like-self-play-for-secure-code-llms" } ] } ] }

Competitive landscape

A tree-like self-play framework that trains LLMs to discriminate against their own localized security errors for robust code generation.

Segment

Secure LLMs

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs

Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline