ARXIV:2603.18652 · DOCUMENT AI · SUBMITTED 20 MAR · 21:29 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation

Pius Horn · Janis Keuper · arXiv

A new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection.

Ship in 2-4 weeks›Score7.0Evidence verified

Opportunity summary

Pain A new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection.

Evidence 0 refs | 0 sources | 50% coverage

Blocker Evidence verified

Open Build Read PDF Signal Canvas Track

PROBLEM

A new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection. We present a benchmarking framework based on synthetically generated PDFs with precise LaTeX ground…

METHOD

Full abstract

Reliably extracting tables from PDFs is essential for large-scale scientific data mining and knowledge base construction, yet existing evaluation approaches rely on rule-based metrics that fail to capture semantic equivalence of table content. We present a benchmarking framework based on synthetically generated PDFs with precise LaTeX ground truth, using tables sourced from arXiv to ensure realistic complexity and diversity. As our central methodological contribution, we apply LLM-as-a-judge for semantic table evaluation, integrated into a matching pipeline that accommodates inconsistencies in parser outputs. Through a human validation study comprising over 1,500 quality judgments on extracted table pairs, we show that LLM-based evaluation achieves substantially higher correlation with human judgment (Pearson r=0.93) compared to Tree Edit Distance-based Similarity (TEDS, r=0.68) and Grid Table Similarity (GriTS, r=0.70). Evaluating 21 contemporary PDF parsers across 100 synthetic documents containing 451 tables reveals significant performance disparities. Our results offer practical guidance for selecting parsers for tabular data extraction and establish a reproducible, scalable evaluation methodology for this critical task. Code and data: https://github.com/phorn1/pdf-parse-bench Metric study and human evaluation: https://github.com/phorn1/table-metric-study

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Through a human validation study comprising over 1,500 quality judgments on extracted table pairs, we show that LLM-based evaluation achieves substantially higher correlation with…

WHY NOW

Document AI moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection.

Evidence0 refs | 0 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Competitive landscape

A new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection.

Segment

Document AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "e0018a94-d091-4510-a165-fc8b4de3080d", "arxiv_id": "2603.18652", "canonical_route": "/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation", "endpoints": { "paper_pack": "/api/v1/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation/paper-pack", "build_passport": "/api/v1/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation", "normalized_query": "2603.18652", "route": "/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation", "paper_ref": "benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation#webpage", "url": "https://sciencetostartup.com/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation", "name": "Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation", "description": "A new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation#scholarlyArticle", "headline": "Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation", "description": "A new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection.", "url": "https://sciencetostartup.com/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation", "sameAs": "https://arxiv.org/abs/2603.18652", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.18652" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-19T09:17:21.000Z", "author": [ { "@type": "Person", "name": "Pius Horn" }, { "@type": "Person", "name": "Janis Keuper" } ], "codeRepository": "https://github.com/phorn1/pdf-parse-bench", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Document AI" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation#software", "name": "Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation - Source Code", "description": "A new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection.", "codeRepository": "https://github.com/phorn1/pdf-parse-bench", "url": "https://github.com/phorn1/pdf-parse-bench" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Document AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Benchmarking PDF Parsers on Table Extraction with LLM-based ", "item": "https://sciencetostartup.com/paper/benchmarking-pdf-parsers-on-table-extraction-with-llm-based-semantic-evaluation" } ] } ] }

Competitive landscape

A new LLM-based evaluation framework for PDF table extraction that significantly outperforms existing metrics, providing practical guidance for parser selection.

Segment

Document AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation

Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline