ARXIV:2604.13993 · VISION-LANGUAGE MODELS · SUBMITTED 16 APR · 18:19 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Reward Design for Physical Reasoning in Vision-Language Models

Derek Lilienthal · Manisha Mukherjee · Sameera Horawalavithana · arXiv

This research systematically investigates reward design for improving physical reasoning in vision-language models, demonstrating accuracy gains through targeted reward signals.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain This research systematically investigates reward design for improving physical reasoning in vision-language models, demonstrating accuracy gains through targeted reward signals.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

This research systematically investigates reward design for improving physical reasoning in vision-language models, demonstrating accuracy gains through targeted reward signals. Yet even state-of-the-art Vision Language Models (VLMs) fall far short of human performance on…

METHOD

Full abstract

Physical reasoning over visual inputs demands tight integration of visual perception, domain knowledge, and multi-step symbolic inference. Yet even state-of-the-art Vision Language Models (VLMs) fall far short of human performance on physics benchmarks. While post-training algorithms such as Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) have demonstrated strong reasoning gains in language models, how reward design shapes VLM physical reasoning behavior remains poorly understood. We present a systematic reward ablation study for GRPO-based VLM training on physical reasoning. We compare four reward signals of increasing semantic richness: format compliance, answer accuracy, a composite rubric reward (answer correctness, physics principle identification, and unit consistency), and a novel internal reward derived from model attention weights over input image regions. We evaluate on PhyX, a 3,000-problem benchmark spanning six physics domains and six reasoning types across multiple-choice and open-ended formats, using IBM Granite Vision 3.3 (2B). Across both formats, GRPO with accuracy-based rewards outperforms SFT on most domains, though gains vary substantially by reward type and domain. Reward design does not uniformly improve performance. Instead, it induces domain-specific reasoning behaviors. Accuracy-based rewards provide the strongest overall gains. Rubric rewards improve structured reasoning quality without consistent accuracy improvements. Attention-based rewards enhance spatial reasoning while degrading performance in symbolic domains. Our internal attention-weight reward requires no spatial annotations and improves spatial relation accuracy from 0.27 to 0.50, suggesting that supervising where the model attends during generation is a promising direction for visually grounded physical reasoning.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Reward design does not uniformly improve performance. Code availability is flagged in the production record; the public repository link still needs proof alignment.

WHY NOW

Vision-Language Models moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainThis research systematically investigates reward design for improving physical reasoning in vision-language models, demonstrating accuracy gains through targeted reward signals.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

This research systematically investigates reward design for improving physical reasoning in vision-language models, demonstrating accuracy gains through targeted reward signals.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

This research systematically investigates reward design for improving physical reasoning in vision-language models, demonstrating accuracy gains through targeted reward signals.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "36cf5ac2-e9a6-4c9b-8930-1ffa4fae1fdb", "arxiv_id": "2604.13993", "canonical_route": "/paper/reward-design-for-physical-reasoning-in-vision-language-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "reward-design-for-physical-reasoning-in-vision-language-models", "endpoints": { "paper_pack": "/api/v1/paper/reward-design-for-physical-reasoning-in-vision-language-models/paper-pack", "build_passport": "/api/v1/paper/reward-design-for-physical-reasoning-in-vision-language-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Reward Design for Physical Reasoning in Vision-Language Models", "normalized_query": "2604.13993", "route": "/paper/reward-design-for-physical-reasoning-in-vision-language-models", "paper_ref": "reward-design-for-physical-reasoning-in-vision-language-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/reward-design-for-physical-reasoning-in-vision-language-models#webpage", "url": "https://sciencetostartup.com/paper/reward-design-for-physical-reasoning-in-vision-language-models", "name": "Reward Design for Physical Reasoning in Vision-Language Models", "description": "This research systematically investigates reward design for improving physical reasoning in vision-language models, demonstrating accuracy gains through targeted reward signals.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/reward-design-for-physical-reasoning-in-vision-language-models#scholarlyArticle", "headline": "Reward Design for Physical Reasoning in Vision-Language Models", "description": "This research systematically investigates reward design for improving physical reasoning in vision-language models, demonstrating accuracy gains through targeted reward signals.", "url": "https://sciencetostartup.com/paper/reward-design-for-physical-reasoning-in-vision-language-models", "sameAs": "https://arxiv.org/abs/2604.13993", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.13993" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-15T15:36:26.000Z", "author": [ { "@type": "Person", "name": "Derek Lilienthal" }, { "@type": "Person", "name": "Manisha Mukherjee" }, { "@type": "Person", "name": "Sameera Horawalavithana" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision-Language Models" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision-Language Models", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Reward Design for Physical Reasoning in Vision-Language Mode", "item": "https://sciencetostartup.com/paper/reward-design-for-physical-reasoning-in-vision-language-models" } ] } ] }

Competitive landscape

This research systematically investigates reward design for improving physical reasoning in vision-language models, demonstrating accuracy gains through targeted reward signals.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Reward Design for Physical Reasoning in Vision-Language Models

Reward Design for Physical Reasoning in Vision-Language Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline