ARXIV:2601.22701 · AGENTS · SUBMITTED 19 MAR · 18:48 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Best-of-Q: Improving VLM agents with Q-function Action Ranking at Inference

arXiv

Enhance existing VLM-based agent policies at inference without retraining through Q-function action ranking.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain Enhance existing VLM-based agent policies at inference without retraining through Q-function action ranking.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Enhance existing VLM-based agent policies at inference without retraining through Q-function action ranking. However, these models suffer from inadaptability to fast-changing environments like the web, which can be alleviated by fine-tuning requiring expansive model…

METHOD

Full abstract

Vision-Language Models (VLMs) have become powerful backbones for agents to autonomously operate in digital environments like the web and operating systems. However, these models suffer from inadaptability to fast-changing environments like the web, which can be alleviated by fine-tuning requiring expansive model training and data collection. In this work, we introduce a novel paradigm for enhancing agentic VLM policies at inference without policy retraining. Fundamentally, our approach decouples the VLM's role as a high-capacity action proposer from the final action selection mechanism. We keep the VLM policy frozen and use it to generate a set of candidate actions for a given state. Then, a lightweight, offline-trained Q-function reranks these candidates, and the agent executes the action with the highest estimated value. The main contribution is to apply the Q-function directly during inference for immediate policy improvement, and not offline to relabel data for policy retraining. We demonstrate on the academic WebVoyager benchmark that our method significantly boosts agent success rates, improving a Qwen2.5-VL-7B agent from 38.8% to 55.7% and a proprietary GPT-4.1 agent from 82.4% to 88.8%.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. We demonstrate on the academic WebVoyager benchmark that our method significantly boosts agent success rates, improving a Qwen2.5-VL-7B agent from 38.8% to 55.7% and…

WHY NOW

Agents moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainEnhance existing VLM-based agent policies at inference without retraining through Q-function action ranking.

Evidence0 refs | 0 sources | 33% coverage

Blockermissing authors

Analysis summary

Enhance existing VLM-based agent policies at inference without retraining through Q-function action ranking.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

Enhance existing VLM-based agent policies at inference without retraining through Q-function action ranking.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "15002636-9c46-4279-a51f-eafff47c76a4", "arxiv_id": "2601.22701", "canonical_route": "/paper/best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference", "endpoints": { "paper_pack": "/api/v1/paper/best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference/paper-pack", "build_passport": "/api/v1/paper/best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Best-of-Q: Improving VLM agents with Q-function Action Ranking at Inference", "normalized_query": "2601.22701", "route": "/paper/best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference", "paper_ref": "best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference#webpage", "url": "https://sciencetostartup.com/paper/best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference", "name": "Best-of-Q: Improving VLM agents with Q-function Action Ranking at Inference", "description": "Enhance existing VLM-based agent policies at inference without retraining through Q-function action ranking.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference#scholarlyArticle", "headline": "Best-of-Q: Improving VLM agents with Q-function Action Ranking at Inference", "description": "Enhance existing VLM-based agent policies at inference without retraining through Q-function action ranking.", "url": "https://sciencetostartup.com/paper/best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference", "sameAs": "https://arxiv.org/abs/2601.22701", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2601.22701" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-01-30T08:22:18.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Agents" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Best-of-Q: Improving VLM agents with Q-function Action Ranki", "item": "https://sciencetostartup.com/paper/best-of-q-improving-vlm-agents-with-q-function-action-ranking-at-inference" } ] } ] }

Competitive landscape

Enhance existing VLM-based agent policies at inference without retraining through Q-function action ranking.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Best-of-Q: Improving VLM agents with Q-function Action Ranking at Inference

Best-of-Q: Improving VLM agents with Q-function Action Ranking at Inference

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline