ARXIV:2605.09959 · LLM TRAINING/ALIGNMENT · SUBMITTED 12 MAY · 20:16 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Chengsong Huang · Haolin Liu · Tong Zheng · Runpeng Dai · Langlin Huang · Jinyuan Li · +4 at arXiv

A framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement.

Ship in 2-4 weeks›Score3.0Evidence unverified

Opportunity summary

Pain A framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement. To overcome this, we introduce G-Zero, a verifier-free, co-evolutionary framework for autonomous self-improvement.

METHOD

Full abstract

Self-evolving LLMs excel in verifiable domains but struggle in open-ended tasks, where reliance on proxy LLM judges introduces capability bottlenecks and reward hacking. To overcome this, we introduce G-Zero, a verifier-free, co-evolutionary framework for autonomous self-improvement. Our core innovation is Hint-$δ$, an intrinsic reward that quantifies the predictive shift between a Generator model's unassisted response and its response conditioned on a self-generated hint. Using this signal, a Proposer model is trained via GRPO to continuously target the Generator's blind spots by synthesizing challenging queries and informative hints. The Generator is concurrently optimized via DPO to internalize these hint-guided improvements. Theoretically, we prove a best-iterate suboptimality guarantee for an idealized standard-DPO version of G-Zero, provided that the Proposer induces sufficient exploration coverage and the data filteration keeps pseudo-label score noise low. By deriving supervision entirely from internal distributional dynamics, G-Zero bypasses the capability ceilings of external judges, providing a scalable, robust pathway for continuous LLM self-evolution across unverifiable domains.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. By deriving supervision entirely from internal distributional dynamics, G-Zero bypasses the capability ceilings of external judges, providing a scalable, robust pathway for continuous LLM…

WHY NOW

LLM Training/Alignment moved forward this cycle; last verified May 2026. Public score 3.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainA framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

A framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement.

Segment

LLM Training/Alignment

Adoption evidence

Public code linked for build inspection

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d022cdd5-6a8c-4255-93ea-99395d290968", "arxiv_id": "2605.09959", "canonical_route": "/paper/g-zero-self-play-for-open-ended-generation-from-zero-data", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "g-zero-self-play-for-open-ended-generation-from-zero-data", "endpoints": { "paper_pack": "/api/v1/paper/g-zero-self-play-for-open-ended-generation-from-zero-data/paper-pack", "build_passport": "/api/v1/paper/g-zero-self-play-for-open-ended-generation-from-zero-data/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "G-Zero: Self-Play for Open-Ended Generation from Zero Data", "normalized_query": "2605.09959", "route": "/paper/g-zero-self-play-for-open-ended-generation-from-zero-data", "paper_ref": "g-zero-self-play-for-open-ended-generation-from-zero-data", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/g-zero-self-play-for-open-ended-generation-from-zero-data#webpage", "url": "https://sciencetostartup.com/paper/g-zero-self-play-for-open-ended-generation-from-zero-data", "name": "G-Zero: Self-Play for Open-Ended Generation from Zero Data", "description": "A framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/g-zero-self-play-for-open-ended-generation-from-zero-data#scholarlyArticle", "headline": "G-Zero: Self-Play for Open-Ended Generation from Zero Data", "description": "A framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement.", "url": "https://sciencetostartup.com/paper/g-zero-self-play-for-open-ended-generation-from-zero-data", "sameAs": "https://arxiv.org/abs/2605.09959", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.09959" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-11T04:12:34.000Z", "author": [ { "@type": "Person", "name": "Chengsong Huang" }, { "@type": "Person", "name": "Haolin Liu" }, { "@type": "Person", "name": "Tong Zheng" }, { "@type": "Person", "name": "Runpeng Dai" }, { "@type": "Person", "name": "Langlin Huang" }, { "@type": "Person", "name": "Jinyuan Li" }, { "@type": "Person", "name": "Zongxia Li" }, { "@type": "Person", "name": "Zhepei Wei" }, { "@type": "Person", "name": "Yu Meng" }, { "@type": "Person", "name": "Jiaxin Huang" } ], "codeRepository": "https://github.com/Chengsong-Huang/G-Zero", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Training/Alignment" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/g-zero-self-play-for-open-ended-generation-from-zero-data#software", "name": "G-Zero: Self-Play for Open-Ended Generation from Zero Data - Source Code", "description": "A framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement.", "codeRepository": "https://github.com/Chengsong-Huang/G-Zero", "url": "https://github.com/Chengsong-Huang/G-Zero" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Training/Alignment", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "G-Zero: Self-Play for Open-Ended Generation from Zero Data", "item": "https://sciencetostartup.com/paper/g-zero-self-play-for-open-ended-generation-from-zero-data" } ] } ] }

Competitive landscape

A framework for self-evolving LLMs in open-ended tasks without external judges, using intrinsic rewards for continuous improvement.

Segment

LLM Training/Alignment

Adoption evidence

Public code linked for build inspection

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

G-Zero: Self-Play for Open-Ended Generation from Zero Data

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline