PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning | Signal Canvas | ScienceToStartup

Page Freshness

Signal Canvas proof surface

Canonical route: /signal-canvas/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning

stale

Proof freshness: stale
Proof status: unverified
Display score: 7/10
Last proof check: 2026-03-30
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 20
Source count: 3
Coverage: 50%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

Canonical ID perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning | Route /signal-canvas/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/signal-canvas/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning

MCP example

{
  "tool": "search_signal_canvas",
  "arguments": {
    "mode": "paper",
    "paper_ref": "perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning",
    "query_text": "Summarize PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning"
  }
}

source_context

{
  "surface": "signal_canvas",
  "mode": "paper",
  "query": "PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning",
  "normalized_query": "2603.26653",
  "route": "/signal-canvas/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning",
  "paper_ref": "perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Evidence Receipt

Route status: building

Claims: 12

References: 20

Proof: Verification pending

Freshness state: computing

Source paper: PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

PDF: https://arxiv.org/pdf/2603.26653v1

Source count: 3

Coverage: 50%

Last proof check: 2026-03-30T21:51:37.101Z

Signal Canvas receipt window

Watch and verify: PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

/buildability/perceptioncomp-a-video-benchmark-for-complex-perception-centric-reasoning

Watchwatch

Subject: PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Verdict

Watch

Verdict is Watch because viability or proof quality is intermediate and should be re-evaluated before execution.

Preparing verified analysis

GitHub Code Pulse

No public code linked for this paper yet.

Claim map

Strong 12Mixed 0Weak 0

Evidencepartial
We introduce PerceptionComp, a manually annotated benchmark for complex, long-horizon, perception-centric video reasoning.
Implicationpartial
This is explicitly stated in the first sentence of the abstract.
Verificationpartial
partial
Evidencepartial
PerceptionComp is designed so that no single moment is sufficient: answering each question requires multiple temporally separated pieces of visual evidence and compositional constraints under conjunctive and sequential logic, spanning perceptual subtasks such as objects, attributes, relations, locations, actions, and events, and requiring skills including semantic recognition, visual correspondence, temporal reasoning, and spatial reasoning.
Implicationpartial
This is directly stated in the abstract, detailing the complexity of the benchmark.
Verificationpartial
partial
Evidencepartial
The benchmark contains 1,114 highly complex questions on 279 videos from diverse domains including city walk tours, indoor villa tours, video games, and extreme outdoor sports, with 100% manual annotation.
Implicationpartial
The abstract provides the exact number of questions and videos, along with the diversity of domains.
Verificationpartial
partial
Evidencepartial
Human studies show that PerceptionComp requires substantial test-time thinking and repeated perception steps: participants take much longer than on prior benchmarks, and accuracy drops to near chance (18.97%) when rewatching is disallowed.
Implicationpartial
The abstract explicitly states the human performance metric under a specific condition.
Verificationpartial
partial
Evidencepartial
State-of-the-art MLLMs also perform substantially worse on PerceptionComp than on existing benchmarks: the best model in our evaluation, Gemini-3-Flash, reaches only 45.96% accuracy in the five-choice setting, while open-source models remain below 40%.
Implicationpartial
The abstract provides a specific accuracy score for a state-of-the-art model.
Verificationpartial
partial
Evidencepartial
These results suggest that perception-centric long-horizon video reasoning remains a major bottleneck, and we hope PerceptionComp will help drive progress in perceptual reasoning.
Implicationpartial
This is a concluding statement in the abstract summarizing the implications of the results.
Verificationpartial
partial
Evidencepartial
All videos are sourced from real recordings rather than synthetic renderings; while some categories (e.g., game livestreams) are screen-captured, the videos still exhibit rich, naturally occurring dynamics and clutter that make the tasks challenging and practically relevant.
Implicationpartial
The text states that videos are sourced from real recordings and explains the rationale behind this choice.
Verificationpartial
partial
Evidencepartial
We introduce PerceptionComp, a manually annotated benchmark for complex, long-horizon, perception-centric video reasoning.
Implicationpartial
This is a direct statement from the abstract defining the benchmark.
Verificationpartial
partial
Evidencepartial
PerceptionComp is designed so that no single moment is sufficient: answering each question requires multiple temporally separated pieces of visual evidence and compositional constraints under conjunctive and sequential logic
Implicationpartial
This is a direct statement from the abstract describing the nature of the questions within the benchmark.
Verificationpartial
partial
Evidencepartial
The benchmark contains 1,114 highly complex questions on 279 videos from diverse domains
Implicationpartial
This is a direct statement from the abstract providing quantitative details about the benchmark's content.
Verificationpartial
partial
Evidencepartial
accuracy drops to near chance (18.97%) when rewatching is disallowed.
Implicationpartial
This is a direct result reported in the abstract from human studies.
Verificationpartial
partial
Evidencepartial
the best model in our evaluation, Gemini-3-Flash, reaches only 45.96% accuracy in the five-choice setting
Implicationpartial
This is a direct result reported in the abstract regarding MLLM performance.
Verificationpartial
partial

Author intelligence and commercialization panels stay hidden until the proof receipt is verified, cites at least 3 references, includes at least 2 sources, and clears 50% coverage. The paper narrative and citation surfaces remain public while verification is pending.

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Use Signal Canvas as the narrative proof surface

Use this Signal Canvas via API or MCP

Signal Canvas proof surface