ARXIV:2603.28053 · REINFORCEMENT LEARNING · SUBMITTED 31 MAR · 20:53 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL

Udita Ghosh · Dripta S. Raychaudhuri · Jiachen Li · Konstantinos Karydis · Amit Roy-Chowdhury · arXiv

A hybrid framework that reduces the cost of learning from human feedback in reinforcement learning by intelligently combining cheap vision-language embeddings with targeted expert queries.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A hybrid framework that reduces the cost of learning from human feedback in reinforcement learning by intelligently combining cheap vision-language embeddings with targeted expert queries.

Evidence 33 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A hybrid framework that reduces the cost of learning from human feedback in reinforcement learning by intelligently combining cheap vision-language embeddings with targeted expert queries. Lightweight vision-language embedding (VLE) models provide a cheaper alternative,…

METHOD

Full abstract

Preference-based reinforcement learning can learn effective reward functions from comparisons, but its scalability is constrained by the high cost of oracle feedback. Lightweight vision-language embedding (VLE) models provide a cheaper alternative, but their noisy outputs limit their effectiveness as standalone reward generators. To address this challenge, we propose ROVED, a hybrid framework that combines VLE-based supervision with targeted oracle feedback. Our method uses the VLE to generate segment-level preferences and defers to an oracle only for samples with high uncertainty, identified through a filtering mechanism. In addition, we introduce a parameter-efficient fine-tuning method that adapts the VLE with the obtained oracle feedback in order to improve the model over time in a synergistic fashion. This ensures the retention of the scalability of embeddings and the accuracy of oracles, while avoiding their inefficiencies. Across multiple robotic manipulation tasks, ROVED matches or surpasses prior preference-based methods while reducing oracle queries by up to 80%. Remarkably, the adapted VLE generalizes across tasks, yielding cumulative annotation savings of up to 90%, highlighting the practicality of combining scalable embeddings with precise oracle supervision for preference-based RL.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. In addition, we introduce a parameter-efficient fine-tuning method that adapts the VLE with the obtained oracle feedback in order to improve the model over…

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA hybrid framework that reduces the cost of learning from human feedback in reinforcement learning by intelligently combining cheap vision-language embeddings with targeted expert queries.

Evidence33 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A hybrid framework that reduces the cost of learning from human feedback in reinforcement learning by intelligently combining cheap vision-language embeddings with targeted expert queries.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A hybrid framework that reduces the cost of learning from human feedback in reinforcement learning by intelligently combining cheap vision-language embeddings with targeted expert queries.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2fb4f4ac-1b1a-4a73-8127-a7e401a1697b", "arxiv_id": "2603.28053", "canonical_route": "/paper/reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl", "endpoints": { "paper_pack": "/api/v1/paper/reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl/paper-pack", "build_passport": "/api/v1/paper/reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL", "normalized_query": "2603.28053", "route": "/paper/reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl", "paper_ref": "reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl#webpage", "url": "https://sciencetostartup.com/paper/reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl", "name": "Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL", "description": "A hybrid framework that reduces the cost of learning from human feedback in reinforcement learning by intelligently combining cheap vision-language embeddings with targeted expert queries.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl#scholarlyArticle", "headline": "Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL", "description": "A hybrid framework that reduces the cost of learning from human feedback in reinforcement learning by intelligently combining cheap vision-language embeddings with targeted expert queries.", "url": "https://sciencetostartup.com/paper/reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl", "sameAs": "https://arxiv.org/abs/2603.28053", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.28053" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-30T05:33:55.000Z", "author": [ { "@type": "Person", "name": "Udita Ghosh" }, { "@type": "Person", "name": "Dripta S. Raychaudhuri" }, { "@type": "Person", "name": "Jiachen Li" }, { "@type": "Person", "name": "Konstantinos Karydis" }, { "@type": "Person", "name": "Amit Roy-Chowdhury" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reinforcement Learning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Reducing Oracle Feedback with Vision-Language Embeddings for", "item": "https://sciencetostartup.com/paper/reducing-oracle-feedback-with-vision-language-embeddings-for-preference-based-rl" } ] } ] }

Competitive landscape

A hybrid framework that reduces the cost of learning from human feedback in reinforcement learning by intelligently combining cheap vision-language embeddings with targeted expert queries.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL

Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline