ARXIV:2605.14061 · AI FOR MATHEMATICS · SUBMITTED 15 MAY · 20:14 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

MathAtlas: A Benchmark for Autoformalization in the Wild

Nilay Patel · Noah Arias · Davit Babayan · Victoria Cochran · Timothy Libman · Hafsah Mahmood · +4 at arXiv

MathAtlas, a large-scale benchmark for autoformalization of graduate-level mathematics, revealing significant challenges for current models.

Ship in 2-4 weeks›Score4.0Evidence unverified

Opportunity summary

Pain MathAtlas, a large-scale benchmark for autoformalization of graduate-level mathematics, revealing significant challenges for current models.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

MathAtlas, a large-scale benchmark for autoformalization of graduate-level mathematics, revealing significant challenges for current models. In this paper, we introduce MathAtlas, the first large-scale autoformalization benchmark of in the wild graduate-level mathematics, containing ~52k…

METHOD

Full abstract

Current autoformalization benchmarks are largely focused on olympiad or undergraduate mathematics, while graduate and research-level mathematics remains underexplored. In this paper, we introduce MathAtlas, the first large-scale autoformalization benchmark of in the wild graduate-level mathematics, containing ~52k theorems, definitions, exercises, examples, and proofs extracted from 103 graduate mathematics textbooks. MathAtlas is enriched with a mathematical dependency graph containing ~178k relations, and is the first autoformalization benchmark to include such relations, facilitating evaluation and development of dependency-aware autoformalization systems. Our extensive experiments show that MathAtlas is high quality but extremely challenging: strong baselines achieve at most 9.8% correctness on theorem statements and 16.7% on definitions. Furthermore, we find performance of state-of-the-art models degrades substantially with dependency depth: on MA-Hard, a subset of 700 entities with the deepest dependency trees, the best model achieves only 2.6% correctness for autoformalization on this challenging dataset. We release MathAtlas to the community as a benchmark set for large-scale autoformalization of graduate-level mathematics in the wild.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. Our extensive experiments show that MathAtlas is high quality but extremely challenging: strong baselines achieve at most 9.8% correctness on theorem statements and 16.7%…

WHY NOW

AI for Mathematics moved forward this cycle; last verified May 2026. Public score 4.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainMathAtlas, a large-scale benchmark for autoformalization of graduate-level mathematics, revealing significant challenges for current models.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

MathAtlas, a large-scale benchmark for autoformalization of graduate-level mathematics, revealing significant challenges for current models.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

MathAtlas, a large-scale benchmark for autoformalization of graduate-level mathematics, revealing significant challenges for current models.

Segment

AI for Mathematics

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "b3b5f50b-905d-44ef-9084-96511d95a427", "arxiv_id": "2605.14061", "canonical_route": "/paper/mathatlas-a-benchmark-for-autoformalization-in-the-wild", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "mathatlas-a-benchmark-for-autoformalization-in-the-wild", "endpoints": { "paper_pack": "/api/v1/paper/mathatlas-a-benchmark-for-autoformalization-in-the-wild/paper-pack", "build_passport": "/api/v1/paper/mathatlas-a-benchmark-for-autoformalization-in-the-wild/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "MathAtlas: A Benchmark for Autoformalization in the Wild", "normalized_query": "2605.14061", "route": "/paper/mathatlas-a-benchmark-for-autoformalization-in-the-wild", "paper_ref": "mathatlas-a-benchmark-for-autoformalization-in-the-wild", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/mathatlas-a-benchmark-for-autoformalization-in-the-wild#webpage", "url": "https://sciencetostartup.com/paper/mathatlas-a-benchmark-for-autoformalization-in-the-wild", "name": "MathAtlas: A Benchmark for Autoformalization in the Wild", "description": "MathAtlas, a large-scale benchmark for autoformalization of graduate-level mathematics, revealing significant challenges for current models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/mathatlas-a-benchmark-for-autoformalization-in-the-wild#scholarlyArticle", "headline": "MathAtlas: A Benchmark for Autoformalization in the Wild", "description": "MathAtlas, a large-scale benchmark for autoformalization of graduate-level mathematics, revealing significant challenges for current models.", "url": "https://sciencetostartup.com/paper/mathatlas-a-benchmark-for-autoformalization-in-the-wild", "sameAs": "https://arxiv.org/abs/2605.14061", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.14061" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-13T19:35:46.000Z", "author": [ { "@type": "Person", "name": "Nilay Patel" }, { "@type": "Person", "name": "Noah Arias" }, { "@type": "Person", "name": "Davit Babayan" }, { "@type": "Person", "name": "Victoria Cochran" }, { "@type": "Person", "name": "Timothy Libman" }, { "@type": "Person", "name": "Hafsah Mahmood" }, { "@type": "Person", "name": "Liam McCarty" }, { "@type": "Person", "name": "Soli Munoz" }, { "@type": "Person", "name": "Laurel Willey" }, { "@type": "Person", "name": "Jeffrey Flanigan" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI for Mathematics" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI for Mathematics", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "MathAtlas: A Benchmark for Autoformalization in the Wild", "item": "https://sciencetostartup.com/paper/mathatlas-a-benchmark-for-autoformalization-in-the-wild" } ] } ] }

Competitive landscape

MathAtlas, a large-scale benchmark for autoformalization of graduate-level mathematics, revealing significant challenges for current models.

Segment

AI for Mathematics

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

MathAtlas: A Benchmark for Autoformalization in the Wild

MathAtlas: A Benchmark for Autoformalization in the Wild

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline