ARXIV:2605.13171 · AI FOR MATHEMATICS · SUBMITTED 14 MAY · 20:10 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics

Moritz Firsching · Paul Lezeau · Salvatore Mercuri · Miklós Z. Horváth · Yaël Dillies · Calle Sönne · +5 at arXiv

An open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain An open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

An open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries. To address this, we present Formal Conjectures, an evolving benchmark of…

METHOD

Full abstract

As automated reasoning systems advance rapidly, there is a growing need for research-level formal mathematical problems to accurately evaluate their capabilities. To address this, we present Formal Conjectures, an evolving benchmark of currently 2615 mathematical problem statements formalized in Lean 4. Sourced from areas of active mathematical research, the dataset features 1029 open research conjectures providing a zero-contamination benchmark for mathematical proof discovery, and 836 solved problems for proof autoformalization. Notably, the repository provides a structured interface connecting mathematicians who formalize and clarify problems with the AI systems and humans attempting to solve them. Demonstrating its immediate utility, the benchmark has already been leveraged to make new mathematical discoveries, including the resolution of open research conjectures. We describe our approach to ensuring the correctness of these formalizations in a collaborative open-source project where contributions stem from an active community. In this framework, AI-generated proofs and disproofs serve as a valuable auditing mechanism to iteratively improve the fidelity of the benchmark. Finally, we provide a standardized evaluation setup and report baseline results on frozen evaluation subsets, demonstrating a climbable signal that measures the current frontier of automated reasoning on research-level mathematics.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. In this framework, AI-generated proofs and disproofs serve as a valuable auditing mechanism to iteratively improve the fidelity of the benchmark. A public repository…

WHY NOW

AI for Mathematics moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAn open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

An open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

An open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries.

Segment

AI for Mathematics

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "4cc5e0bf-f6cb-465a-baf2-7eace09c62bf", "arxiv_id": "2605.13171", "canonical_route": "/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics", "endpoints": { "paper_pack": "/api/v1/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics/paper-pack", "build_passport": "/api/v1/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics", "normalized_query": "2605.13171", "route": "/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics", "paper_ref": "formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics#webpage", "url": "https://sciencetostartup.com/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics", "name": "Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics", "description": "An open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics#scholarlyArticle", "headline": "Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics", "description": "An open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries.", "url": "https://sciencetostartup.com/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics", "sameAs": "https://arxiv.org/abs/2605.13171", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.13171" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-13T08:33:15.000Z", "author": [ { "@type": "Person", "name": "Moritz Firsching" }, { "@type": "Person", "name": "Paul Lezeau" }, { "@type": "Person", "name": "Salvatore Mercuri" }, { "@type": "Person", "name": "Miklós Z. Horváth" }, { "@type": "Person", "name": "Yaël Dillies" }, { "@type": "Person", "name": "Calle Sönne" }, { "@type": "Person", "name": "Eric Wieser" }, { "@type": "Person", "name": "Fred Zhang" }, { "@type": "Person", "name": "Thomas Hubert" }, { "@type": "Person", "name": "Blaise Agüera y Arcas" }, { "@type": "Person", "name": "Pushmeet Kohli" } ], "codeRepository": "https://github.com/google-deepmind/formal-conjectures", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI for Mathematics" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics#software", "name": "Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics - Source Code", "description": "An open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries.", "codeRepository": "https://github.com/google-deepmind/formal-conjectures", "url": "https://github.com/google-deepmind/formal-conjectures" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI for Mathematics", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Formal Conjectures: An Open and Evolving Benchmark for Verif", "item": "https://sciencetostartup.com/paper/formal-conjectures-an-open-and-evolving-benchmark-for-verified-discovery-in-mathematics" } ] } ] }

Competitive landscape

An open and evolving benchmark of formalized mathematical problems in Lean 4, enabling evaluation of automated reasoning systems and driving new mathematical discoveries.

Segment

AI for Mathematics

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics

Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline