ARXIV:2603.28730 · ROBOTICS RL WITH VLMS · SUBMITTED 31 MAR · 20:16 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

Philip Schroeder · Thomas Weng · Karl Schmeckpeper · Eric Rosen · Stephen Hart · Ondrej Biza · arXiv

A video-language reasoning model that acts as the sole reward signal for robots, enabling them to learn new manipulation tasks without human supervision or ground truth rewards.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A video-language reasoning model that acts as the sole reward signal for robots, enabling them to learn new manipulation tasks without human supervision or ground truth rewards.

Evidence 96 refs | 4 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A video-language reasoning model that acts as the sole reward signal for robots, enabling them to learn new manipulation tasks without human supervision or ground truth rewards. However, when used as evaluators in reinforcement…

METHOD

Full abstract

Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enabling policies to exploit perceptual errors rather than solve the task. To address this limitation, we introduce SOLE-R1 (Self-Observing LEarner), a video-language reasoning model explicitly designed to serve as the sole reward signal for online RL. Given only raw video observations and a natural-language goal, SOLE-R1 performs per-timestep spatiotemporal chain-of-thought (CoT) reasoning and produces dense estimates of task progress that can be used directly as rewards. To train SOLE-R1, we develop a large-scale video trajectory and reasoning synthesis pipeline that generates temporally grounded CoT traces aligned with continuous progress supervision. This data is combined with foundational spatial and multi-frame temporal reasoning, and used to train the model with a hybrid framework that couples supervised fine-tuning with RL from verifiable rewards. Across four different simulation environments and a real-robot setting, SOLE-R1 enables zero-shot online RL from random initialization: robots learn previously unseen manipulation tasks without ground-truth rewards, success indicators, demonstrations, or task-specific tuning. SOLE-R1 succeeds on 24 unseen tasks and substantially outperforms strong vision-language rewarders, including GPT-5 and Gemini-3-Pro, while exhibiting markedly greater robustness to reward hacking.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Across four different simulation environments and a real-robot setting, SOLE-R1 enables zero-shot online RL from random initialization: robots learn previously unseen manipulation tasks without…

WHY NOW

Robotics RL with VLMs moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA video-language reasoning model that acts as the sole reward signal for robots, enabling them to learn new manipulation tasks without human supervision or ground truth rewards.

Evidence96 refs | 4 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A video-language reasoning model that acts as the sole reward signal for robots, enabling them to learn new manipulation tasks without human supervision or ground truth rewards.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A video-language reasoning model that acts as the sole reward signal for robots, enabling them to learn new manipulation tasks without human supervision or ground truth rewards.

Segment

Robotics RL with VLMs

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "994bb7d0-1087-4d73-8698-27e895284461", "arxiv_id": "2603.28730", "canonical_route": "/paper/sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning", "endpoints": { "paper_pack": "/api/v1/paper/sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning/paper-pack", "build_passport": "/api/v1/paper/sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning", "normalized_query": "2603.28730", "route": "/paper/sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning", "paper_ref": "sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning#webpage", "url": "https://sciencetostartup.com/paper/sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning", "name": "SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning", "description": "A video-language reasoning model that acts as the sole reward signal for robots, enabling them to learn new manipulation tasks without human supervision or ground truth rewards.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning#scholarlyArticle", "headline": "SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning", "description": "A video-language reasoning model that acts as the sole reward signal for robots, enabling them to learn new manipulation tasks without human supervision or ground truth rewards.", "url": "https://sciencetostartup.com/paper/sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning", "sameAs": "https://arxiv.org/abs/2603.28730", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.28730" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-30T17:46:31.000Z", "author": [ { "@type": "Person", "name": "Philip Schroeder" }, { "@type": "Person", "name": "Thomas Weng" }, { "@type": "Person", "name": "Karl Schmeckpeper" }, { "@type": "Person", "name": "Eric Rosen" }, { "@type": "Person", "name": "Stephen Hart" }, { "@type": "Person", "name": "Ondrej Biza" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Robotics RL with VLMs" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Robotics RL with VLMs", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "SOLE-R1: Video-Language Reasoning as the Sole Reward for On-", "item": "https://sciencetostartup.com/paper/sole-r1-video-language-reasoning-as-the-sole-reward-for-on-robot-reinforcement-learning" } ] } ] }

Competitive landscape

A video-language reasoning model that acts as the sole reward signal for robots, enabling them to learn new manipulation tasks without human supervision or ground truth rewards.

Segment

Robotics RL with VLMs

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline