ARXIV:2603.26653 · VIDEO REASONING BENCHMARK · SUBMITTED 30 MAR · 21:51 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Shaoxuan Li · Zhixuan Zhao · Hanze Deng · Zirun Ma · Shulin Tian · Zuyan Liu · +6 at arXiv

A new benchmark for complex video reasoning that pushes the limits of current multimodal LLMs, creating an opportunity for specialized perception-centric AI solutions.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A new benchmark for complex video reasoning that pushes the limits of current multimodal LLMs, creating an opportunity for specialized perception-centric AI solutions.

Evidence 20 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A new benchmark for complex video reasoning that pushes the limits of current multimodal LLMs, creating an opportunity for specialized perception-centric AI solutions. PerceptionComp is designed so that no single moment is sufficient: answering…

METHOD

Full abstract

We introduce PerceptionComp, a manually annotated benchmark for complex, long-horizon, perception-centric video reasoning. PerceptionComp is designed so that no single moment is sufficient: answering each question requires multiple temporally separated pieces of visual evidence and compositional constraints under conjunctive and sequential logic, spanning perceptual subtasks such as objects, attributes, relations, locations, actions, and events, and requiring skills including semantic recognition, visual correspondence, temporal reasoning, and spatial reasoning. The benchmark contains 1,114 highly complex questions on 279 videos from diverse domains including city walk tours, indoor villa tours, video games, and extreme outdoor sports, with 100% manual annotation. Human studies show that PerceptionComp requires substantial test-time thinking and repeated perception steps: participants take much longer than on prior benchmarks, and accuracy drops to near chance (18.97%) when rewatching is disallowed. State-of-the-art MLLMs also perform substantially worse on PerceptionComp than on existing benchmarks: the best model in our evaluation, Gemini-3-Flash, reaches only 45.96% accuracy in the five-choice setting, while open-source models remain below 40%. These results suggest that perception-centric long-horizon video reasoning remains a major bottleneck, and we hope PerceptionComp will help drive progress in perceptual reasoning.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Human studies show that PerceptionComp requires substantial test-time thinking and repeated perception steps: participants take much longer than on prior benchmarks, and accuracy drops…

WHY NOW

Video Reasoning Benchmark moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA new benchmark for complex video reasoning that pushes the limits of current multimodal LLMs, creating an opportunity for specialized perception-centric AI solutions.

Evidence20 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A new benchmark for complex video reasoning that pushes the limits of current multimodal LLMs, creating an opportunity for specialized perception-centric AI solutions.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A new benchmark for complex video reasoning that pushes the limits of current multimodal LLMs, creating an opportunity for specialized perception-centric AI solutions.

Segment

Video Reasoning Benchmark

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "9719995c-2ce4-47cd-8ea3-3a5b75fd186f", "arxiv_id": "2603.26653", "canonical_route": "/paper/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning", "endpoints": { "paper_pack": "/api/v1/paper/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning/paper-pack", "build_passport": "/api/v1/paper/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning", "normalized_query": "2603.26653", "route": "/paper/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning", "paper_ref": "perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning#webpage", "url": "https://sciencetostartup.com/paper/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning", "name": "PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning", "description": "A new benchmark for complex video reasoning that pushes the limits of current multimodal LLMs, creating an opportunity for specialized perception-centric AI solutions.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning#scholarlyArticle", "headline": "PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning", "description": "A new benchmark for complex video reasoning that pushes the limits of current multimodal LLMs, creating an opportunity for specialized perception-centric AI solutions.", "url": "https://sciencetostartup.com/paper/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning", "sameAs": "https://arxiv.org/abs/2603.26653", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26653" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T17:54:36.000Z", "author": [ { "@type": "Person", "name": "Shaoxuan Li" }, { "@type": "Person", "name": "Zhixuan Zhao" }, { "@type": "Person", "name": "Hanze Deng" }, { "@type": "Person", "name": "Zirun Ma" }, { "@type": "Person", "name": "Shulin Tian" }, { "@type": "Person", "name": "Zuyan Liu" }, { "@type": "Person", "name": "Yushi Hu" }, { "@type": "Person", "name": "Haoning Wu" }, { "@type": "Person", "name": "Yuhao Dong" }, { "@type": "Person", "name": "Benlin Liu" }, { "@type": "Person", "name": "Ziwei Liu" }, { "@type": "Person", "name": "Ranjay Krishna" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Video Reasoning Benchmark" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Video Reasoning Benchmark", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "PerceptionComp: A Video Benchmark for Complex Perception-Cen", "item": "https://sciencetostartup.com/paper/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning" } ] } ] }

Competitive landscape

A new benchmark for complex video reasoning that pushes the limits of current multimodal LLMs, creating an opportunity for specialized perception-centric AI solutions.

Segment

Video Reasoning Benchmark

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline