ARXIV:2604.25231 · VISUAL REASONING BENCHMARK · SUBMITTED 29 APR · 02:44 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

Anirudh Iyengar Kaniyar Narayana Iyengar · Tampu Ravi Kumar · Gaurav Najpande · Manan Suri · Dinesh Manocha · Puneet Mathur · +1 at arXiv

DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models. Recent vision-language models (VLMs) often achieve high answer accuracy on these tasks, yet correct…

METHOD

Full abstract

Diagram question answering (DQA) requires models to interpret structured visual representations such as charts, maps, infographics, circuit schematics, and scientific diagrams. Recent vision-language models (VLMs) often achieve high answer accuracy on these tasks, yet correct answers do not guarantee that models ground their reasoning in the diagram regions that support the prediction. Models may instead rely on textual correlations or dataset artifacts without identifying the visual evidence required to verify the answer. This limitation prevents reliable evaluation of diagram reasoning and reduces interpretability. We introduce DRAGON, a benchmark for evaluating evidence-grounded visual reasoning in diagrams. Given a diagram, a question, and the correct answer, a model must predict bounding boxes that correspond to the visual elements required to justify the answer. These evidence regions may include answer-bearing components, textual labels, legends, axes, connectors, and other supporting structures involved in the reasoning process. The DRAGON dataset contains 11,664 annotated question instances collected from six diagram QA datasets: ChartQA, Circuit-VQA, InfographicsVQA, MapIQ, MapWise, and AI2D. We release a 2,445-instance benchmark test set with human-verified reasoning evidence annotations and a standardized evaluation framework. We evaluate eight recent VLMs and analyze their ability to localize reasoning evidence across diverse diagram domains. DRAGON enables systematic evaluation of diagram reasoning and supports future research on models that ground their predictions in visual evidence.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Recent vision-language models (VLMs) often achieve high answer accuracy on these tasks, yet correct answers do not guarantee that models ground their reasoning in…

WHY NOW

Visual Reasoning Benchmark moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainDRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

Anirudh Iyengar Kaniyar Narayana Iyengar · Tampu Ravi Kumar · Gaurav Najpande · Manan Suri · Dinesh Manocha · Puneet Mathur · +1 at arXiv

DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.

Competitive landscape

DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.

Segment

Visual Reasoning Benchmark

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "02f38194-d023-4ea0-9feb-db7f62472fdc", "arxiv_id": "2604.25231", "canonical_route": "/paper/dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams", "endpoints": { "paper_pack": "/api/v1/paper/dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams/paper-pack", "build_passport": "/api/v1/paper/dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams", "normalized_query": "2604.25231", "route": "/paper/dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams", "paper_ref": "dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams#webpage", "url": "https://sciencetostartup.com/paper/dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams", "name": "DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams", "description": "DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams#scholarlyArticle", "headline": "DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams", "description": "DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.", "url": "https://sciencetostartup.com/paper/dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams", "sameAs": "https://arxiv.org/abs/2604.25231", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.25231" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-28T05:24:05.000Z", "author": [ { "@type": "Person", "name": "Anirudh Iyengar Kaniyar Narayana Iyengar" }, { "@type": "Person", "name": "Tampu Ravi Kumar" }, { "@type": "Person", "name": "Gaurav Najpande" }, { "@type": "Person", "name": "Manan Suri" }, { "@type": "Person", "name": "Dinesh Manocha" }, { "@type": "Person", "name": "Puneet Mathur" }, { "@type": "Person", "name": "Vivek Gupta" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Visual Reasoning Benchmark" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Visual Reasoning Benchmark", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning o", "item": "https://sciencetostartup.com/paper/dragon-a-benchmark-for-evidence-grounded-visual-reasoning-over-diagrams" } ] } ] }

Competitive landscape

DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.

Segment

Visual Reasoning Benchmark

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline