ARXIV:2604.27977 · AI AGENTS · SUBMITTED 01 MAY · 15:04 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

Hanane Nour Moussa · Yifei Li · Zhuoyang Li · Yankai Yang · Cheng Tang · Tianshu Zhang · +4 at arXiv

D3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain D3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance.

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

D3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance. D3-Gym comprises (1) 565 tasks sourced from 239 real scientific repositories across four disciplines…

METHOD

Full abstract

Despite recent progress in language models and agents for scientific data-driven discovery, further advancing their capabilities is held back by the absence of verifiable environments representing real-world scientific tasks.To fill this gap, we introduce D3-Gym, the first automatically constructed dataset with verifiable environments for scientific Data-Driven Discovery. D3-Gym comprises (1) 565 tasks sourced from 239 real scientific repositories across four disciplines where (2) each task is equipped with a natural language instruction, an executable environment with pre-installed dependencies, input dataset and artifact previews, a reference code solution, and an automatically synthesized evaluation script. Rigorous evaluation of the quality of the verification signal in D3-Gym confirms that our evaluation scripts achieve 87.5% agreement with human-annotated gold standards and strong alignment in domain-specific evaluation logic, showing their scientific soundness. Further, training on trajectories sampled from D3-Gym yields consistent and substantial gains across Qwen3 models of varying sizes on ScienceAgentBench, boosting Qwen3-32B by 7.8 absolute points and substantially shrinking the gap with strong proprietary models. All D3-Gym artifacts (environments, creation workflow, trajectories, and models) can be found at https://github.com/OSU-NLP-Group/D3-Gym.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Rigorous evaluation of the quality of the verification signal in D3-Gym confirms that our evaluation scripts achieve 87.5% agreement with human-annotated gold standards and…

WHY NOW

AI Agents moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainD3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

D3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

D3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance.

Segment

AI Agents

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "0b8b3513-84b5-4075-bcc1-02ca833db3dd", "arxiv_id": "2604.27977", "canonical_route": "/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery", "endpoints": { "paper_pack": "/api/v1/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery/paper-pack", "build_passport": "/api/v1/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery", "normalized_query": "2604.27977", "route": "/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery", "paper_ref": "d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery#webpage", "url": "https://sciencetostartup.com/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery", "name": "D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery", "description": "D3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery#scholarlyArticle", "headline": "D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery", "description": "D3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance.", "url": "https://sciencetostartup.com/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery", "sameAs": "https://arxiv.org/abs/2604.27977", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.27977" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-30T15:06:56.000Z", "author": [ { "@type": "Person", "name": "Hanane Nour Moussa" }, { "@type": "Person", "name": "Yifei Li" }, { "@type": "Person", "name": "Zhuoyang Li" }, { "@type": "Person", "name": "Yankai Yang" }, { "@type": "Person", "name": "Cheng Tang" }, { "@type": "Person", "name": "Tianshu Zhang" }, { "@type": "Person", "name": "Nesreen K. Ahmed" }, { "@type": "Person", "name": "Ali Payani" }, { "@type": "Person", "name": "Ziru Chen" }, { "@type": "Person", "name": "Huan Sun" } ], "codeRepository": "https://github.com/OSU-NLP-Group/D3-Gym", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery#software", "name": "D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery - Source Code", "description": "D3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance.", "codeRepository": "https://github.com/OSU-NLP-Group/D3-Gym", "url": "https://github.com/OSU-NLP-Group/D3-Gym" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "D3-Gym: Constructing Real-World Verifiable Environments for ", "item": "https://sciencetostartup.com/paper/d3-gym-constructing-real-world-verifiable-environments-for-data-driven-discovery" } ] } ] }

Competitive landscape

D3-Gym provides verifiable environments and a dataset for training AI agents on real-world scientific discovery tasks, significantly improving model performance.

Segment

AI Agents

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline