ARXIV:2603.11410 · VISION-LANGUAGE BENCHMARKING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs Supplementary

arXiv

DORI is a benchmark that isolates object orientation reasoning to improve multimodal AI understanding.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain DORI is a benchmark that isolates object orientation reasoning to improve multimodal AI understanding.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

DORI is a benchmark that isolates object orientation reasoning to improve multimodal AI understanding. Current vision-language benchmarks largely conflate orientation with position and general scene understanding.

METHOD

Full abstract

Humans learn object orientation progressively, from recognizing which way an object faces, to mentally rotating it, to reasoning about orientations between objects. Current vision-language benchmarks largely conflate orientation with position and general scene understanding. We introduce Discriminative Orientation Reasoning Intelligence (DORI), a cognitively grounded hierarchical benchmark that makes object orientation the primary target. Inspired by stages of human orientation cognition, DORI decomposes orientation into four dimensions, each evaluated at coarse (categorical) and granular (metric) levels. Composed from 13,652 images across 14 sources, DORI provides 33,656 multiple-choice questions covering 67 object categories in real-world and synthetic settings. Its coarse-to-granular design isolates orientation from confounds such as object recognition difficulty, scene clutter, and linguistic ambiguity via bounding-box isolation, standardized spatial reference frames, and structured prompts. Evaluating 24 state-of-the-art vision-language models shows a clear pattern: models that perform well on general spatial benchmarks are near-random on object-centric orientation tasks. The best models reach only 54.2% on coarse and 45.0% on granular judgments, with largest failures on compound rotations and shifts in inter-object reference frames. Large coarse-to-granular gaps reveal reliance on categorical heuristics rather than geometric reasoning, a limitation hidden by existing benchmarks. These results identify orientation understanding as an unsolved challenge for multimodal systems, with implications for robotic manipulation, 3D scene reconstruction, and human-AI interaction.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. Evaluating 24 state-of-the-art vision-language models shows a clear pattern: models that perform well on general spatial benchmarks are near-random on object-centric orientation tasks.

WHY NOW

Vision-Language Benchmarking moved forward this cycle; last verified April 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainDORI is a benchmark that isolates object orientation reasoning to improve multimodal AI understanding.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

DORI is a benchmark that isolates object orientation reasoning to improve multimodal AI understanding.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

DORI is a benchmark that isolates object orientation reasoning to improve multimodal AI understanding.

Segment

Vision-Language Benchmarking

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "4d4f4627-9635-4f26-9601-4543989e6f71", "arxiv_id": "2603.11410", "canonical_route": "/paper/seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary", "endpoints": { "paper_pack": "/api/v1/paper/seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary/paper-pack", "build_passport": "/api/v1/paper/seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs Supplementary", "normalized_query": "2603.11410", "route": "/paper/seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary", "paper_ref": "seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary#webpage", "url": "https://sciencetostartup.com/paper/seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary", "name": "Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs Supplementary", "description": "DORI is a benchmark that isolates object orientation reasoning to improve multimodal AI understanding.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary#scholarlyArticle", "headline": "Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs Supplementary", "description": "DORI is a benchmark that isolates object orientation reasoning to improve multimodal AI understanding.", "url": "https://sciencetostartup.com/paper/seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary", "sameAs": "https://arxiv.org/abs/2603.11410", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.11410" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-12T00:52:16.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision-Language Benchmarking" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision-Language Benchmarking", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Seeing Isn't Orienting: A Cognitively Grounded Benchmark Rev", "item": "https://sciencetostartup.com/paper/seeing-isn-t-orienting-a-cognitively-grounded-benchmark-reveals-systematic-orientation-failures-in-mllms-supplementary" } ] } ] }

Competitive landscape

DORI is a benchmark that isolates object orientation reasoning to improve multimodal AI understanding.

Segment

Vision-Language Benchmarking

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs Supplementary

Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs Supplementary

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline