ARXIV:2605.23491 · LLM CODE GENERATION · SUBMITTED 25 MAY · 20:33 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

Zhangyi Hu · Chenhui Liu · Tian Huang · Jindong Li · Yang Yang · Jiemin Wu · +3 at arXiv

A framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data.

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data. Yet Ground-Truth Unit Tests (GT UTs) remain a bottleneck: SOTA RLVR methods require them…

METHOD

Full abstract

Recently, Reinforcement Learning with Verifiable Rewards (RLVR) and Test-Time Scaling (TTS) have advanced LLM code generation through executable verification. Yet Ground-Truth Unit Tests (GT UTs) remain a bottleneck: SOTA RLVR methods require them for costly training, while existing TTS methods lose competitiveness without them. This motivates GT-free TTS, where existing methods directly use self-generated UTs to refine and select code candidates. Yet such UTs are often noisy or spuriously coupled with wrong code, and UT quality in turn cannot be validated without reliable code. The key challenge is therefore to jointly improve both. To this end, we present CoSPlay, a GT-free, training-free framework that jointly improves codes and UTs through cooperative self-play. It first explores diverse solution ideas and identifies their potential failure modes to produce discriminative UT ideas. It then uses bidirectional pass-count signals from the Code-UT execution matrix to iteratively prune or fix weak codes and refresh or replace unreliable UTs, letting the two pools co-evolve. Finally, when multiple codes remain tied at the highest pass count, it picks the final code from the largest output-consensus cluster, since correct codes agree on the same inputs while wrong codes diverge. Experiments on four challenging benchmarks show that CoSPlay on Qwen2.5-7B-Instruct improves average BoN from 22.1% to 33.2% and UT accuracy from 14.6% to 78.3%, matching or surpassing the RLVR model CURE-7B. When applied to CURE-7B, it further improves BoN by 5.7%. CoSPlay also generalizes across diverse backbones and outperforms GT-free TTS baselines under comparable token budgets, with continued gains as the budget scales up. These results suggest a scalable inference strategy for competitive code generation without any GT data.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. The key challenge is therefore to jointly improve both. A public repository is linked, so build verification can inspect implementation evidence instead of treating…

WHY NOW

LLM Code Generation moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

A framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data.

Segment

LLM Code Generation

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "a897e481-ce34-4db9-a590-346a7c9ac89b", "arxiv_id": "2605.23491", "canonical_route": "/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test", "endpoints": { "paper_pack": "/api/v1/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test/paper-pack", "build_passport": "/api/v1/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test", "normalized_query": "2605.23491", "route": "/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test", "paper_ref": "cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test#webpage", "url": "https://sciencetostartup.com/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test", "name": "CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test", "description": "A framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test#scholarlyArticle", "headline": "CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test", "description": "A framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data.", "url": "https://sciencetostartup.com/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test", "sameAs": "https://arxiv.org/abs/2605.23491", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.23491" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-22T10:53:17.000Z", "author": [ { "@type": "Person", "name": "Zhangyi Hu" }, { "@type": "Person", "name": "Chenhui Liu" }, { "@type": "Person", "name": "Tian Huang" }, { "@type": "Person", "name": "Jindong Li" }, { "@type": "Person", "name": "Yang Yang" }, { "@type": "Person", "name": "Jiemin Wu" }, { "@type": "Person", "name": "Zining Zhong" }, { "@type": "Person", "name": "Menglin Yang" }, { "@type": "Person", "name": "Yutao Yue" } ], "codeRepository": "https://github.com/sanae-ai/CosPlay", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Code Generation" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test#software", "name": "CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test - Source Code", "description": "A framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data.", "codeRepository": "https://github.com/sanae-ai/CosPlay", "url": "https://github.com/sanae-ai/CosPlay" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Code Generation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "CoSPlay: Cooperative Self-Play at Test-Time with Self-Genera", "item": "https://sciencetostartup.com/paper/cosplay-cooperative-self-play-at-test-time-with-self-generated-code-and-unit-test" } ] } ] }

Competitive landscape

A framework that jointly improves LLM-generated code and unit tests through cooperative self-play, enabling competitive code generation without ground-truth data.

Segment

LLM Code Generation

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline