ARXIV:2605.20525 · MEDICAL AI · SUBMITTED 21 MAY · 20:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

Mohammad H. Abbasi · Favour Nerrise · Shaurnav Ghosh · Ridvan Yesiloglu · Yuncong Mao · Bailey Trang · +9 at arXiv

A large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering.

Evidence 0 refs | 5 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering. It spans ages 5-104 and five clinical domains: Alzheimer's, Parkinson's, tumors, white matter disease, and neurodevelopment.

METHOD

Full abstract

We present NeuroQA, a large-scale benchmark for visual question answering in 3D brain magnetic resonance imaging (MRI), with 56,953 QA pairs from 12,977 subjects across 12 datasets. It spans ages 5-104 and five clinical domains: Alzheimer's, Parkinson's, tumors, white matter disease, and neurodevelopment. Unlike prior medical Visual Question Answering (VQA) efforts that operate on 2D slices or rely on narrow diagnostic labels, NeuroQA pairs every item with a full 3D volume. It evaluates 11 clinically grounded reasoning skills across Yes/No, multiple-choice, and open-ended formats. Of the 203 templates, 131 are image-grounded (answerable from a 3-plane viewer) and 72 are image-informed (ground truth from quantitative volumetry or clinical instruments). To remove text-only shortcuts, we apply answer-distribution refinement, reducing closed-format text-only accuracy from $>$80% to 44.6%; image necessity is assessed separately through an image-grounding protocol released with the benchmark. A 38-rule deterministic pipeline and two rounds of expert review verify every QA pair against FreeSurfer measurements, metadata, or radiology report fields, with zero same-subject contradictions across templates. We conduct a clinician evaluation in which two clinicians independently assess 100 frozen test items on a three-plane viewer. On closed-format (Yes/No + multiple-choice) test-public items, the best zero-shot vision-language model and a supervised 3D CNN baseline reach 47.5% and 43.7% accuracy respectively, both below the 49.4% text-only majority-template floor. NeuroQA adopts a two-tier release with public QA pairs for open-access datasets and reproducible generation scripts for datasets restricted by data use agreements (DUAs), plus subject-level splits, a held-out private test set, and an online leaderboard.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. NeuroQA adopts a two-tier release with public QA pairs for open-access datasets and reproducible generation scripts for datasets restricted by data use agreements (DUAs),…

WHY NOW

Medical AI moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering.

Evidence0 refs | 5 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

A large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering.

Segment

Medical AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "14122719-faf4-421e-ba38-12832a26b4e4", "arxiv_id": "2605.20525", "canonical_route": "/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding", "endpoints": { "paper_pack": "/api/v1/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding/paper-pack", "build_passport": "/api/v1/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding", "normalized_query": "2605.20525", "route": "/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding", "paper_ref": "neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding#webpage", "url": "https://sciencetostartup.com/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding", "name": "NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding", "description": "A large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding#scholarlyArticle", "headline": "NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding", "description": "A large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering.", "url": "https://sciencetostartup.com/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding", "sameAs": "https://arxiv.org/abs/2605.20525", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.20525" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-19T21:54:12.000Z", "author": [ { "@type": "Person", "name": "Mohammad H. Abbasi" }, { "@type": "Person", "name": "Favour Nerrise" }, { "@type": "Person", "name": "Shaurnav Ghosh" }, { "@type": "Person", "name": "Ridvan Yesiloglu" }, { "@type": "Person", "name": "Yuncong Mao" }, { "@type": "Person", "name": "Bailey Trang" }, { "@type": "Person", "name": "Mohammad Asadi" }, { "@type": "Person", "name": "Merryn Daniel" }, { "@type": "Person", "name": "Gustavo Chau Loo Kung" }, { "@type": "Person", "name": "Ken Chang" }, { "@type": "Person", "name": "Pavan Pinkesh Shah" }, { "@type": "Person", "name": "Adam Turnbull" }, { "@type": "Person", "name": "Kyan Younes" }, { "@type": "Person", "name": "Seena Dehkharghani" }, { "@type": "Person", "name": "Ehsan Adeli" } ], "codeRepository": "https://github.com/mhabbasiit/neuroqa", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Medical AI" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding#software", "name": "NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding - Source Code", "description": "A large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering.", "codeRepository": "https://github.com/mhabbasiit/neuroqa", "url": "https://github.com/mhabbasiit/neuroqa" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Medical AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain", "item": "https://sciencetostartup.com/paper/neuroqa-a-large-scale-image-grounded-benchmark-for-3d-brain-mri-understanding" } ] } ] }

Competitive landscape

A large-scale benchmark for 3D brain MRI understanding with a focus on clinically grounded visual question answering.

Segment

Medical AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline