ARXIV:2606.03988 · MULTIMODAL REASONING · SUBMITTED 03 JUN · 20:32 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Mahtab Bigverdi · Lindsey Li · Weikai Huang · Yiming Liu · Jaemin Cho · Jieyu Zhang · +5 at arXiv

This research introduces a novel token-based approach to enhance spatial reasoning in vision-language models by externalizing imaginative perceptions, with demonstrated improvements on specific spatial tasks.

Ship in 2-4 weeks›Score7.0Evidence verified

Opportunity summary

Pain This research introduces a novel token-based approach to enhance spatial reasoning in vision-language models by externalizing imaginative perceptions, with demonstrated improvements on specific spatial tasks.

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence verified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

Vision language models (VLMs) excel at many tasks but still struggle with spatial reasoning when critical information is not directly observable. Many such problems require imaginative perception: inferring what would be seen from an unseen viewpoint, tracing paths through occluded spaces, or integrating partial observations into a coherent spatial representation. We introduce Imaginative Perception Tokens (IPT), intermediate perceptual representations that externalize what a VLM would perceive under alternative spatial configurations while remaining consistent with the observed input. To study this capability, we formulate three tasks, Perspective Taking (PET), Path Tracing (PT), and Multiview Counting (MVC), and construct datasets of approximately 20K examples with ground truth imaginations, answers, and evaluation benchmarks. Using the unified VLM BAGEL as the backbone, IPT supervision consistently improves spatial reasoning and often outperforms textual chain of thought training, even without generating images at inference time. On MVC, IPT improves accuracy by 3.4% and achieves competitive performance with strong closed-source models on PT. We further find that combining IPT and label-only supervision yields additional gains, whereas textual chain of thought can substantially degrade performance, suggesting a modality mismatch when spatial computation is forced through language. Overall, IPT provides a principled supervision signal for reasoning about unobserved spatial structure, improving generalization while producing interpretable intermediate representations.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Using the unified VLM BAGEL as the backbone, IPT supervision consistently improves spatial reasoning and often outperforms textual chain of thought training, even without…

WHY NOW

Multimodal Reasoning moved forward this cycle; last verified June 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainThis research introduces a novel token-based approach to enhance spatial reasoning in vision-language models by externalizing imaginative perceptions, with demonstrated improvements on specific spatial tasks.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

ARXIV:2606.03988 · MULTIMODAL REASONING · SUBMITTED 03 JUN · 20:32 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Mahtab Bigverdi · Lindsey Li · Weikai Huang · Yiming Liu · Jaemin Cho · Jieyu Zhang · +5 at arXiv

Ship in 2-4 weeks›Score7.0Evidence verified

Opportunity summary

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence verified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

RESULT

WHY NOW

Multimodal Reasoning moved forward this cycle; last verified June 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainThis research introduces a novel token-based approach to enhance spatial reasoning in vision-language models by externalizing imaginative perceptions, with demonstrated improvements on specific spatial tasks.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Paper Pack

10.48550/arXiv.2606.03988

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Abstract

Source availability

PDF linked

The paper record includes a public PDF URL.

Extraction status

Parse run linked

A document parse run is attached to this paper.

Proof status

verified

0 refs; 4 sources; 67% coverage.

What was readable

linkedon file18 anchors1 extracted37 indexednot indexed

Derived fallback: Estimated from adjacent evidence; not verified from source.

Viability

7.0

Time to MVP

MVP estimate missing

Commercial

coderepo url

Export

Preparing verified analysis

lens / founder

PROBLEM

METHOD

RESULT

WHY NOW

Multimodal Reasoning moved forward this cycle; last verified June 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Claim map

Strong 1Mixed 0Weak 0

Evidencepartial
{"file name": "input.pdf", "number of pages": 29, "author": "Mahtab Bigverdi; Lindsey Li; Weikai Huang; Yiming Liu; Jaemin Cho; Jieyu Zhang; Tuhin Kundu; Chris Dangjoo Kim; Zelun Luo; Linda Shapiro; Ranjay Krishna"
Implicationmissing
Implication not extracted yet.
Verificationpartial
partial

Constellation map

Paper-native neighborhood for concepts, methods, materials, markets, and competitors. Missing lanes stay labeled instead of disappearing behind commercialization gates.

Open full Signal Canvas

Concepts

not indexed

Methods

Materials

PDF linkedDocument parse run

Markets

Multimodal Reasoning

Competitors

not indexed

Competitive landscape

Segment

Multimodal Reasoning

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Buzz

No indexed public discussion is attached to 2606.03988 yet. That is a visibility signal, not a blank module: the monitor is watching the public channels below.

Hacker News

Not indexed yet

Bluesky

Not indexed yet

PDF

Preview the source document here, or use the hero PDF action for a new tab.

References(37)

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

2026Chun-Hsiao Yeh, Chenyu Wang et al.

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

2026Pingyue Zhang, Zihan Huang et al.

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

2026Christopher Clark, Jieyu Zhang et al.

Mull-Tokens: Modality-Agnostic Latent Thinking

2025Arijit Ray, A. Abdelkader et al.

Qwen3-VL Technical Report

2025Shuai Bai, Yuxuan Cai et al.

Visual Spatial Tuning

2025Rui Yang, Ziyu Zhu et al.

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

2025Ellis Brown, Jihan Yang et al.

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

2025Jiawei Gu, Yunzhuo Hao et al.

Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens

2025Zeyuan Yang, Xueyang Yu et al.

Show-o2: Improved Native Unified Multimodal Models

2025Jinheng Xie, Zhenheng Yang et al.

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

2025Sihan Yang, Runsen Xu et al.

Emerging Properties in Unified Multimodal Pretraining

2025Chaorui Deng, Deyao Zhu et al.

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

2025Michael Tschannen, Alexey Gritsenko et al.

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

2025Xiaokang Chen, Zhiyu Wu et al.

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

2025Chengzu Li, Wenshan Wu et al.

Spatial Mental Modeling from Limited Views

2025Baiqiao Yin, Qineng Wang et al.

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

2024Jihan Yang, Shusheng Yang et al.

3DSRBENCH: A Comprehensive 3D Spatial Reasoning Benchmark

2024Wufei Ma, Haoyu Chen et al.

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

2024Mahtab Bigverdi, Zelun Luo et al.

Counting Stacked Objects

2024Corentin Dumery, Noa Ett'e et al.

Showing 20 of 37 references

CITED BY

No citing papers are indexed in the public S2S graph yet. This is an explicit zero-signal state, not a hidden lookup.

Foundation

Prior WorkPerceptio: Perception Enhanced Vision Language Models via Spatial Token Generation

7.0

Prior Work3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models

7.0

Prior WorkSeeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

7.0

Prior WorkMultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model

7.0

Extension

Builds On ThisThe Dual Mechanisms of Spatial Reasoning in Vision-Language Models

3.0

Builds On ThisGazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoning

6.0

Builds On ThisRieMind: Geometry-Grounded Spatial Agent for Scene Understanding

3.0

Builds On ThisESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models

4.0

Commercially relevant

Higher ViabilityPerception-Aware Multimodal Spatial Reasoning from Monocular Images

8.0

Higher ViabilityCognitively-Inspired Tokens Overcome Egocentric Bias in Multimodal Models

8.0

Conflicting

none indexed

Related Resources

Owned Distribution

Subscribe to the weekly brief

Get the weekly shortlist of commercializable papers, benchmark movers, and proof receipts that matter for product execution.

Agent drawer

5 surfaces preserved for agents. Humans can ignore.

Developer contracts, payload previews, evidence maps, and run controls stay here instead of the Read, Build, and Track workspace.

Run context

Paper: 2606.03988
Route: /paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models
Active tab: read
Artifact: imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models

Available agents

Read extractor
Build planner
Track monitor
Competitive mapper
Related-paper scout

API/MCP endpoints

REST paper pack API/api/v1/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models/paper-pack
REST build passport API/api/v1/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models/build-passport
REST OpenAPI/api/openapi.json
MCP descriptor/api/mcp
MCP resourcesciencetostartup://surfaces/paper-workspace

Tool contracts

paper_packbuild_passportopportunity_kernelforesightsource_proofevidence_state

Payload preview

Inspect payload

{
  "contract_version": "paper-r2",
  "paper_id": "03edbad1-8ff9-4783-8951-9ad574ed8470",
  "arxiv_id": "2606.03988",
  "canonical_route": "/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models",
  "active_tab": "synced from current hash by the drawer client",
  "selected_artifact": "imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models",
  "endpoints": {
    "paper_pack": "/api/v1/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models/paper-pack",
    "build_passport": "/api/v1/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models/build-passport",
    "mcp_resource": "sciencetostartup://surfaces/paper-workspace"
  }
}

Schema validation

paper-r2 contract: present
JSON-LD twin: SSR emitted
OpenAPI path parity: /api/openapi.json
MCP resource parity: paper-workspace

Job trace

queued: drawer opened by user action
running: inspect or copy payload
succeeded: payload available in SSR
failed: route errors appear in evidence cards

Evidence map

sources used: page freshness, source proof anchors, JSON-LD
missing sources: exposed by PaperPack and EvidenceState chips
derived fallbacks: marked unverified before handoff

Page Freshness

Canonical route, proof status, last verified, refs, sources, and coverage.

Page Freshness

Paper proof surface

Canonical route: /paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models

ready

Proof freshness: fresh
Proof status: verified
Display score: 7/10
Last proof check: 2026-06-03
Score updated: 2026-06-03
Score fresh until: 2026-07-03
References: 0
Source count: 4
Coverage: 67%

Page-specific freshness sourced from this paper's evidence receipt and score bundle.

OpenAlex: pending — this preprint is not yet indexed by OpenAlex.

Agent Handoff

Endpoint list, payload shape, route context, and copyable handoff data.

Agent Handoff

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Canonical ID imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models | Route /paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models

MCP example

{
  "tool": "get_paper",
  "arguments": {
    "arxiv_id": "2606.03988"
  }
}

source_context

{
  "surface": "paper",
  "mode": "paper",
  "query": "Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models",
  "normalized_query": "2606.03988",
  "route": "/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models",
  "paper_ref": "imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Buildability Receipt

Verdict, compute envelope, blockers, signature state, and receipt links.

Paper proof page receipt window

Ready for execution: Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

/buildability/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models

Build Nowready

Subject: Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Verdict

Build Now

Verdict is Build Now because viability and implementation proof cleared the Wave 1 scaffold thresholds.

Time to first demo

Insufficient data

No first-demo timestamp, owner estimate, or elapsed demo receipt is attached to this surface.

Compute envelope

Structured compute envelope

Insufficient data

No data, compute, hardware, memory, latency, dependency, or serving requirement receipt is attached.

Evidence ids

Receipt path

/buildability/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models

Paper ref

imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models

arXiv id

2606.03988

Freshness

Generated at

2026-06-03T20:32:57.805Z

Evidence freshness

fresh

Last verification

2026-06-03T20:32:57.805Z

Sources

References

Coverage

67%

Hash state

Lineage hash

9fdad02d8801a34a4cfa1abb7e6c53101f55a15e7a41371dffea125255eee465

Canonical opportunity-kernel lineage hash.

Signature state

External signature

unsigned_external

No founder, registry, pilot, or production-adoption signature is attached to this receipt.

Verification

not_verified

Verification is blocked until an external signature is provided.

Blockers

Missing: references
Missing: paper_extraction_scorecards

Pending verification refs / 4 sources / Verification pending

references

paper_extraction_scorecards

Missing proof, requirement, signature, approval, adoption, or telemetry fields are blockers and must not be inferred.

Open receipt API receipt Build Loop Signal Canvas Proof divergence Divergence API Brier outcomes API

Source Proof anchors

Visual citations from the paper document graph.

Source proof

Visual citation anchors from the paper document graph.

18 anchors

proof blockPage 568%

This equation captures one of the core mathematical components of the system. |A| X Llm = − i=1 log P(ai|C, Ugt, Ggt, a<i) Inference. At inference time, the mo

Page and bbox are available; crop image is pending.

proof blockPage 568%

This equation captures one of the core mathematical components of the system. i=1 log P(ai|C, Ugt, Ggt, a<i) Inference. At inference time, the model o

Page and bbox are available; crop image is pending.

proof blockPage 576%

This equation defines the loss the model is optimizing during training.

Page and bbox are available; crop image is pending.

JSON-LD twin

The application/ld+json payload rendered for agents.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "WebPage",
      "@id": "https://sciencetostartup.com/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models#webpage",
      "url": "https://sciencetostartup.com/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models",
      "name": "Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models",
      "description": "This research introduces a novel token-based approach to enhance spatial reasoning in vision-language models by externalizing imaginative perceptions, with demonstrated improvements on specific spatial tasks.",
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      }
    },
    {
      "@type": "ScholarlyArticle",
      "@id": "https://sciencetostartup.com/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models#scholarlyArticle",
      "headline": "Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models",
      "description": "This research introduces a novel token-based approach to enhance spatial reasoning in vision-language models by externalizing imaginative perceptions, with demonstrated improvements on specific spatial tasks.",
      "url": "https://sciencetostartup.com/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models",
      "sameAs": "https://arxiv.org/abs/2606.03988",
      "identifier": {
        "@type": "PropertyValue",
        "propertyID": "arXiv",
        "value": "2606.03988"
      },
      "isAccessibleForFree": true,
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      },
      "datePublished": "2026-06-02T17:59:17.000Z",
      "author": [
        {
          "@type": "Person",
          "name": "Mahtab Bigverdi"
        },
        {
          "@type": "Person",
          "name": "Lindsey Li"
        },
        {
          "@type": "Person",
          "name": "Weikai Huang"
        },
        {
          "@type": "Person",
          "name": "Yiming Liu"
        },
        {
          "@type": "Person",
          "name": "Jaemin Cho"
        },
        {
          "@type": "Person",
          "name": "Jieyu Zhang"
        },
        {
          "@type": "Person",
          "name": "Tuhin Kundu"
        },
        {
          "@type": "Person",
          "name": "Chris Dangjoo Kim"
        },
        {
          "@type": "Person",
          "name": "Zelun Luo"
        },
        {
          "@type": "Person",
          "name": "Linda Shapiro"
        },
        {
          "@type": "Person",
          "name": "Ranjay Krishna"
        }
      ],
      "citation": [
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "22cb569525194d67e964f61fa102506429cf85f4"
          },
          "url": "https://www.semanticscholar.org/paper/22cb569525194d67e964f61fa102506429cf85f4"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "349a6bbf00dcf13be0904a8253afd13dc36e8895"
          },
          "url": "https://www.semanticscholar.org/paper/349a6bbf00dcf13be0904a8253afd13dc36e8895"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "435cac4e656a585c209a229b2ef2fb7f49f4d702"
          },
          "url": "https://www.semanticscholar.org/paper/435cac4e656a585c209a229b2ef2fb7f49f4d702"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "59aa5ef698deb60834ce98ad3c3a90b2100a1f5a"
          },
          "url": "https://www.semanticscholar.org/paper/59aa5ef698deb60834ce98ad3c3a90b2100a1f5a"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "15538df854dd33351dfb5cefd7e8f3340c8936c3"
          },
          "url": "https://www.semanticscholar.org/paper/15538df854dd33351dfb5cefd7e8f3340c8936c3"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "5c93b01196374e36912c16cec0b3943ba4467559"
          },
          "url": "https://www.semanticscholar.org/paper/5c93b01196374e36912c16cec0b3943ba4467559"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "2154ab7d1741e093df8ed0361f63f636185d213f"
          },
          "url": "https://www.semanticscholar.org/paper/2154ab7d1741e093df8ed0361f63f636185d213f"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "45740e3f9a1ed8e358d4da7b859dc55127e06540"
          },
          "url": "https://www.semanticscholar.org/paper/45740e3f9a1ed8e358d4da7b859dc55127e06540"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "b36c3b79d678805b74025497bf3abc6cbe0ee1eb"
          },
          "url": "https://www.semanticscholar.org/paper/b36c3b79d678805b74025497bf3abc6cbe0ee1eb"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "3cdcb4dfc6b64ec9af9a419691070827217052d0"
          },
          "url": "https://www.semanticscholar.org/paper/3cdcb4dfc6b64ec9af9a419691070827217052d0"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "054b40c7582366866f0a35160469ead3750fcab1"
          },
          "url": "https://www.semanticscholar.org/paper/054b40c7582366866f0a35160469ead3750fcab1"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "406a577425c36e05163ee3bf448d65a6eb480ab3"
          },
          "url": "https://www.semanticscholar.org/paper/406a577425c36e05163ee3bf448d65a6eb480ab3"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "5bbb904e85ced452f44d9ed9559297334298c407"
          },
          "url": "https://www.semanticscholar.org/paper/5bbb904e85ced452f44d9ed9559297334298c407"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "3ef9dbd95a174d4c175fcd603f62b8cc197e8d7e"
          },
          "url": "https://www.semanticscholar.org/paper/3ef9dbd95a174d4c175fcd603f62b8cc197e8d7e"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "2d0b030d314a5aa8feaa03695e8471270130bdf9"
          },
          "url": "https://www.semanticscholar.org/paper/2d0b030d314a5aa8feaa03695e8471270130bdf9"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "376461a2c049f6fa51a4303853fdc672e4d07a0d"
          },
          "url": "https://www.semanticscholar.org/paper/376461a2c049f6fa51a4303853fdc672e4d07a0d"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "2e893960793a803d67cd70ec2cf100fdbf59654d"
          },
          "url": "https://www.semanticscholar.org/paper/2e893960793a803d67cd70ec2cf100fdbf59654d"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "b940d65aabe04876349c75498487d83db66a4081"
          },
          "url": "https://www.semanticscholar.org/paper/b940d65aabe04876349c75498487d83db66a4081"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "f9a6e79eefe0d14fb587bcbd167540fe10d812d5"
          },
          "url": "https://www.semanticscholar.org/paper/f9a6e79eefe0d14fb587bcbd167540fe10d812d5"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "3ea3e3153e1a4576676e85ac69bac8090c00a912"
          },
          "url": "https://www.semanticscholar.org/paper/3ea3e3153e1a4576676e85ac69bac8090c00a912"
        }
      ],
      "codeRepository": "https://github.com/cvpr-org/author-kit",
      "additionalProperty": [
        {
          "@type": "PropertyValue",
          "propertyID": "viabilityScore",
          "value": 7
        },
        {
          "@type": "PropertyValue",
          "propertyID": "researchDomain",
          "value": "Multimodal Reasoning"
        },
        {
          "@type": "PropertyValue",
          "propertyID": "commercialReadiness",
          "value": "code, repo url"
        }
      ]
    },
    {
      "@type": "SoftwareSourceCode",
      "@id": "https://sciencetostartup.com/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models#software",
      "name": "Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models - Source Code",
      "description": "This research introduces a novel token-based approach to enhance spatial reasoning in vision-language models by externalizing imaginative perceptions, with demonstrated improvements on specific spatial tasks.",
      "codeRepository": "https://github.com/cvpr-org/author-kit",
      "url": "https://github.com/cvpr-org/author-kit"
    },
    {
      "@type": "BreadcrumbList",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "name": "Home",
          "item": "https://sciencetostartup.com"
        },
        {
          "@type": "ListItem",
          "position": 2,
          "name": "Multimodal Reasoning",
          "item": "https://sciencetostartup.com/topics"
        },
        {
          "@type": "ListItem",
          "position": 3,
          "name": "Imaginative Perception Tokens Enhance Spatial Reasoning in M",
          "item": "https://sciencetostartup.com/paper/imaginative-perception-tokens-enhance-spatial-reasoning-in-multimodal-language-models"
        }
      ]
    }
  ]
}

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(37)

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(37)

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline