ARXIV:2603.26666 · AI-ENHANCED ROBOTICS · SUBMITTED 30 MAR · 21:51 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

Q: What products could be built from this research?

A cloud-based solution providing VLA-OPD as a service for robotics companies to improve their training protocols, leveraging existing expert models to enhance policy learning in new robots.

Q: What are the practical use cases?

Develop a robotics software platform that uses VLA-OPD to streamline training of robots for various tasks, reducing training time and improving adaptability and efficiency.

Q: What industries could this research disrupt?

This method could replace existing reliance on extensive supervised datasets and inefficient RL processes, offering a more streamlined approach to developing robust robotic behaviors.

Zhide Zhong · Haodong Yan · Junfeng Li · Junjie He · Tianran Zhang · Haoang Li · arXiv

VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

Evidence 26 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation. However, standard offline Supervised Fine-Tuning (SFT) suffers from distribution shifts and catastrophic forgetting of pre-trained capabilities, while…

METHOD

Full abstract

Although pre-trained Vision-Language-Action (VLA) models exhibit impressive generalization in robotic manipulation, post-training remains crucial to ensure reliable performance during deployment. However, standard offline Supervised Fine-Tuning (SFT) suffers from distribution shifts and catastrophic forgetting of pre-trained capabilities, while online Reinforcement Learning (RL) struggles with sparse rewards and poor sample efficiency. In this paper, we propose On-Policy VLA Distillation (VLA-OPD), a framework bridging the efficiency of SFT with the robustness of RL. Instead of relying on sparse environmental rewards, VLA-OPD leverages an expert teacher to provide dense, token-level supervision on the student's self-generated trajectories. This enables active error correction on policy-induced states while preserving pre-trained general capabilities through gentle alignment. Crucially, we formulate VLA-OPD via a Reverse-KL objective. Unlike standard Forward-KL that induces mode-covering entropy explosion, or Hard-CE that causes premature entropy collapse, our bounded mode-seeking objective ensures stable policy learning by filtering out the teacher's epistemic uncertainty while maintaining action diversity. Experiments on LIBERO and RoboTwin2.0 benchmarks demonstrate that VLA-OPD significantly improves sample efficiency over RL and robustness over SFT, while effectively mitigating catastrophic forgetting during post-training.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. This enables active error correction on policy-induced states while preserving pre-trained general capabilities through gentle alignment. Code availability is flagged in the production record;…

WHY NOW

AI-Enhanced Robotics moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainVLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

Evidence26 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

ARXIV:2603.26666 · AI-ENHANCED ROBOTICS · SUBMITTED 30 MAR · 21:51 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

Zhide Zhong · Haodong Yan · Junfeng Li · Junjie He · Tianran Zhang · Haoang Li · arXiv

VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

Evidence 26 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

RESULT

WHY NOW

AI-Enhanced Robotics moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainVLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

Evidence26 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Paper Pack

10.48550/arXiv.2603.26666

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

Abstract

Source availability

PDF linked

The paper record includes a public PDF URL.

Extraction status

Parse run linked

A document parse run is attached to this paper.

Proof status

unverified

26 refs; 3 sources; 50% coverage.

What was readable

linkedon file20 anchors7 extracted30 indexednot indexed

Derived fallback: Estimated from adjacent evidence; not verified from source.

Viability

7.0

Time to MVP

MVP estimate missing

Commercial

code

Export

Preparing verified analysis

lens / founder

PROBLEM

METHOD

RESULT

WHY NOW

AI-Enhanced Robotics moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Claim map

Strong 7Mixed 0Weak 0

Evidencepartial
In this paper, we propose On-Policy VLA Distillation (VLA-OPD), a framework bridging the efficiency of SFT with the robustness of RL. Instead of relying on sparse environmental rewards, VLA-OPD leverages an expert teacher to provide dense, token-level supervision on the student's self-generated trajectories.
Implicationpartial
This is a core statement of the proposed method, clearly articulated in the abstract and introduction.
Verificationpartial
partial
Evidencepartial
Crucially, we formulate VLA-OPD via a Reverse-KL objective. Unlike standard Forward-KL that induces mode-covering entropy explosion, or Hard-CE that causes premature entropy collapse, our bounded mode-seeking objective ensures stable policy learning by filtering out the teacher's epistemic uncertainty while maintaining action diversity.
Implicationpartial
The abstract and introduction explicitly detail the use of Reverse-KL and its benefits compared to other objectives.
Verificationpartial
partial
Evidencepartial
Experiments on LIBERO and RoboTwin2.0 benchmarks demonstrate that VLA-OPD significantly improves sample efficiency over RL and robustness over SFT, while effectively mitigating catastrophic forgetting during post-training.
Implicationpartial
The abstract and analysis section explicitly state the experimental results on these benchmarks.
Verificationpartial
partial
Evidencepartial
Experiments on LIBERO and RoboTwin2.0 benchmarks demonstrate that VLA-OPD significantly improves sample efficiency over RL and robustness over SFT, while effectively mitigating catastrophic forgetting during post-training.
Implicationpartial
This is a key benefit highlighted in the abstract and introduction.
Verificationpartial
partial
Evidencepartial
Unlike standard Forward-KL that induces mode-covering entropy explosion, or Hard-CE that causes premature entropy collapse, our bounded mode-seeking objective ensures stable policy learning by filtering out the teacher's epistemic uncertainty while maintaining action diversity.
Implicationpartial
The abstract and introduction explain the mechanism and benefits of the Reverse-KL objective.
Verificationpartial
partial
Evidencepartial
This enables active error correction on policy-induced states while preserving pre-trained general capabilities through gentle alignment.
Implicationpartial
This describes the functional outcome of the proposed method, as stated in the abstract.
Verificationpartial
partial
Evidencepartial
Potential limitations include dependency on the availability of high-performing expert models and the applicability of VLA-OPD in highly dynamic or novel environments.
Implicationpartial
This is explicitly mentioned as a caveat in the provided analysis.
Verificationpartial
partial

Constellation map

Paper-native neighborhood for concepts, methods, materials, markets, and competitors. Missing lanes stay labeled instead of disappearing behind commercialization gates.

Open full Signal Canvas

Concepts

not indexed

Methods

Materials

PDF linkedDocument parse run

Markets

AI-Enhanced Robotics

Competitors

not indexed

Competitive landscape

VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.

Segment

AI-Enhanced Robotics

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Buzz

No indexed public discussion is attached to 2603.26666 yet. That is a visibility signal, not a blank module: the monitor is watching the public channels below.

Hacker News

Not indexed yet

Bluesky

Not indexed yet

PDF

Preview the source document here, or use the hero PDF action for a new tab.

References(30)

STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models

2025Feng Xu, Guangyao Zhai et al.

π*0.6: a VLA That Learns From Experience

2025Physical Intelligence, A. Amin et al.

The Path Not Taken: RLVR Provably Learns Off the Principals

2025Hanqing Zhu, Zhenyu (Allen) Zhang et al.

Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

2025Wenli Xiao, Haotian Lin et al.

Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey

2025Shuanghao Bai, Wenxuan Song et al.

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

2025Hengtao Li, Pengxiang Ding et al.

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

2025Haozhan Li, Yuxin Zuo et al.

RL's Razor: Why Online Reinforcement Learning Forgets Less

2025Idan Shenfeld, Jyothish Pari et al.

Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training

2025Song Lai, Haohan Zhao et al.

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

2025Tianxing Chen, Zanxin Chen et al.

What Can RL Bring to VLA Generalization? An Empirical Study

2025Jijia Liu, Feng Gao et al.

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

2025Guanxing Lu, Wenkai Guo et al.

Interactive Post-Training for Vision-Language-Action Models

2025Shuhan Tan, Kairan Dou et al.

NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

2025Chia-Yu Hung, Qi Sun et al.

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

2025Nvidia, Johan Bjorck et al.

PD-VLA: Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding

2025Wenxuan Song, Jiayi Chen et al.

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

2025Moo Jin Kim, Chelsea Finn et al.

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

2025Yuhui Chen, Shuai Tian et al.

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

2025Tianzhe Chu, Yuexiang Zhai et al.

FAST: Efficient Action Tokenization for Vision-Language-Action Models

2025Karl Pertsch, Kyle Stachowicz et al.

Showing 20 of 30 references

CITED BY

No citing papers are indexed in the public S2S graph yet. This is an explicit zero-signal state, not a hidden lookup.

Foundation

Prior WorkOffline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation

7.0

Prior WorkJump-Start Reinforcement Learning with Vision-Language-Action Regularization

7.0

Prior WorkAtomVLA: Scalable Post-Training for Robotic Manipulation via Predictive Latent World Models

7.0

Prior WorkLearning from Mistakes: Post-Training for Driving VLA with Takeover Data

7.0

Prior WorkSelf-Distilled RLVR

7.0

Prior WorkBORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

7.0

Prior WorkScaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

7.0

Extension

Builds On ThisRL-VLA$^3$: Reinforcement Learning VLA Accelerating via Full Asynchronism

5.0

Builds On ThisLoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models

3.0

Builds On ThisEXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models

0.0

Commercially relevant

none indexed

Conflicting

none indexed

Owned Distribution

Subscribe to the weekly brief

Get the weekly shortlist of commercializable papers, benchmark movers, and proof receipts that matter for product execution.

Agent drawer

5 surfaces preserved for agents. Humans can ignore.

Developer contracts, payload previews, evidence maps, and run controls stay here instead of the Read, Build, and Track workspace.

Run context

Paper: 2603.26666
Route: /paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation
Active tab: read
Artifact: vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation

Available agents

Read extractor
Build planner
Track monitor
Competitive mapper
Related-paper scout

API/MCP endpoints

REST paper pack API/api/v1/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation/paper-pack
REST build passport API/api/v1/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation/build-passport
REST OpenAPI/api/openapi.json
MCP descriptor/api/mcp
MCP resourcesciencetostartup://surfaces/paper-workspace

Tool contracts

paper_packbuild_passportopportunity_kernelforesightsource_proofevidence_state

Payload preview

Inspect payload

{
  "contract_version": "paper-r2",
  "paper_id": "25ef629d-db93-4c6c-90d9-6c305c08447c",
  "arxiv_id": "2603.26666",
  "canonical_route": "/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation",
  "active_tab": "synced from current hash by the drawer client",
  "selected_artifact": "vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation",
  "endpoints": {
    "paper_pack": "/api/v1/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation/paper-pack",
    "build_passport": "/api/v1/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation/build-passport",
    "mcp_resource": "sciencetostartup://surfaces/paper-workspace"
  }
}

Schema validation

paper-r2 contract: present
JSON-LD twin: SSR emitted
OpenAPI path parity: /api/openapi.json
MCP resource parity: paper-workspace

Job trace

queued: drawer opened by user action
running: inspect or copy payload
succeeded: payload available in SSR
failed: route errors appear in evidence cards

Evidence map

sources used: page freshness, source proof anchors, JSON-LD
missing sources: exposed by PaperPack and EvidenceState chips
derived fallbacks: marked unverified before handoff

Page Freshness

Canonical route, proof status, last verified, refs, sources, and coverage.

Page Freshness

Paper proof surface

Canonical route: /paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation

stale

Proof freshness: stale
Proof status: unverified
Display score: 7/10
Last proof check: 2026-03-30
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 26
Source count: 3
Coverage: 50%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

OpenAlex: pending — this preprint is not yet indexed by OpenAlex.

Agent Handoff

Endpoint list, payload shape, route context, and copyable handoff data.

Agent Handoff

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

Canonical ID vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation | Route /paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation

MCP example

{
  "tool": "get_paper",
  "arguments": {
    "arxiv_id": "2603.26666"
  }
}

source_context

{
  "surface": "paper",
  "mode": "paper",
  "query": "VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation",
  "normalized_query": "2603.26666",
  "route": "/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation",
  "paper_ref": "vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Buildability Receipt

Verdict, compute envelope, blockers, signature state, and receipt links.

Paper proof page receipt window

Watch and verify: VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

/buildability/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation

Watchwatch

Subject: VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

Verdict

Watch

Verdict is Watch because viability or proof quality is intermediate and should be re-evaluated before execution.

Time to first demo

Insufficient data

No first-demo timestamp, owner estimate, or elapsed demo receipt is attached to this surface.

Compute envelope

Structured compute envelope

Insufficient data

No data, compute, hardware, memory, latency, dependency, or serving requirement receipt is attached.

Evidence ids

Receipt path

/buildability/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation

Paper ref

vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation

arXiv id

2603.26666

Freshness

Generated at

2026-03-30T21:51:27.011Z

Evidence freshness

stale

Last verification

2026-03-30T21:51:27.011Z

Sources

References

Coverage

50%

Hash state

Lineage hash

757aca3e3d18cead10fbf5715c7c62689e1bcf0ef65f470c388bc25bb7f46da5

Canonical opportunity-kernel lineage hash.

Signature state

External signature

unsigned_external

No founder, registry, pilot, or production-adoption signature is attached to this receipt.

Verification

not_verified

Verification is blocked until an external signature is provided.

Blockers

Missing: repo_url
Missing: proof_status
Missing: distribution_readiness_scores
Unknown: distribution readiness has not been computed yet
Unknown: proof verification has not been recorded yet

26 refs / 3 sources / Verification pending

repo_url

proof_status

Missing proof, requirement, signature, approval, adoption, or telemetry fields are blockers and must not be inferred.

Open receipt API receipt Build Loop Signal Canvas Proof divergence Divergence API Brier outcomes API

Source Proof anchors

Visual citations from the paper document graph.

Source proof

Visual citation anchors from the paper document graph.

20 anchors

proof blockPage 382%

This equation captures one of the core mathematical components of the system. Standard VLA training typically starts with SFT on a static dataset of expert demonstrations Ddemo = {(τi)}.

Page and bbox are available; crop image is pending.

proof blockPage 382%

This equation captures one of the core mathematical components of the system. LSFT(θ) = −E(s,a)∼Ddemo [log πθ(a|s)] .

Page and bbox are available; crop image is pending.

proof blockPage 382%

This equation captures one of the core mathematical components of the system. JRL(θ) = Es∼D,τ∼πθold

Page and bbox are available; crop image is pending.

JSON-LD twin

The application/ld+json payload rendered for agents.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "WebPage",
      "@id": "https://sciencetostartup.com/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation#webpage",
      "url": "https://sciencetostartup.com/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation",
      "name": "VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation",
      "description": "VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.",
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      }
    },
    {
      "@type": "ScholarlyArticle",
      "@id": "https://sciencetostartup.com/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation#scholarlyArticle",
      "headline": "VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation",
      "description": "VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation.",
      "url": "https://sciencetostartup.com/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation",
      "sameAs": "https://arxiv.org/abs/2603.26666",
      "identifier": {
        "@type": "PropertyValue",
        "propertyID": "arXiv",
        "value": "2603.26666"
      },
      "isAccessibleForFree": true,
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      },
      "datePublished": "2026-03-27T17:59:33.000Z",
      "author": [
        {
          "@type": "Person",
          "name": "Zhide Zhong",
          "affiliation": {
            "@type": "Organization",
            "name": "HKUST (GZ)"
          }
        },
        {
          "@type": "Person",
          "name": "Haodong Yan",
          "affiliation": {
            "@type": "Organization",
            "name": "HKUST (GZ)"
          }
        },
        {
          "@type": "Person",
          "name": "Junfeng Li",
          "affiliation": {
            "@type": "Organization",
            "name": "HKUST (GZ)"
          }
        },
        {
          "@type": "Person",
          "name": "Junjie He",
          "affiliation": {
            "@type": "Organization",
            "name": "HKUST (GZ)"
          }
        },
        {
          "@type": "Person",
          "name": "Tianran Zhang",
          "affiliation": {
            "@type": "Organization",
            "name": "HKUST (GZ)"
          }
        },
        {
          "@type": "Person",
          "name": "Haoang Li",
          "affiliation": {
            "@type": "Organization",
            "name": "HKUST (GZ)"
          }
        }
      ],
      "citation": [
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "0df9124fcf22c3c62769a7cd20fc5cbb45e5c843"
          },
          "url": "https://www.semanticscholar.org/paper/0df9124fcf22c3c62769a7cd20fc5cbb45e5c843"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "319e005858f32a7b1eddb05a62b1652ca8ea4611"
          },
          "url": "https://www.semanticscholar.org/paper/319e005858f32a7b1eddb05a62b1652ca8ea4611"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "4e4f6095b9757504bdfde4c0187a45263f4ac334"
          },
          "url": "https://www.semanticscholar.org/paper/4e4f6095b9757504bdfde4c0187a45263f4ac334"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "14c584af2fda0b9b5851b1e8573c4e2dd2c8fd0b"
          },
          "url": "https://www.semanticscholar.org/paper/14c584af2fda0b9b5851b1e8573c4e2dd2c8fd0b"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "6229363b28da88fd978e4ef658d77fbf01c7dc40"
          },
          "url": "https://www.semanticscholar.org/paper/6229363b28da88fd978e4ef658d77fbf01c7dc40"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "492a99e7c25ed0f96ad65c2cdc88cbb2b3dfafe3"
          },
          "url": "https://www.semanticscholar.org/paper/492a99e7c25ed0f96ad65c2cdc88cbb2b3dfafe3"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "bca7f4dd4db559d6772b470d9fe5391e3608cc8c"
          },
          "url": "https://www.semanticscholar.org/paper/bca7f4dd4db559d6772b470d9fe5391e3608cc8c"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "ad74032137b4129b54c5c06be208843d73792e7f"
          },
          "url": "https://www.semanticscholar.org/paper/ad74032137b4129b54c5c06be208843d73792e7f"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "0632657e9c329f8fbf8eb2d9f733aefa09b0292b"
          },
          "url": "https://www.semanticscholar.org/paper/0632657e9c329f8fbf8eb2d9f733aefa09b0292b"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "1002f84d1b487e49d3bc2d353dfb77e1495d3c58"
          },
          "url": "https://www.semanticscholar.org/paper/1002f84d1b487e49d3bc2d353dfb77e1495d3c58"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "1d92c345ff78631aa5509c84e6ff91d446e2c712"
          },
          "url": "https://www.semanticscholar.org/paper/1d92c345ff78631aa5509c84e6ff91d446e2c712"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "4a3e88d203564e547f5fb3f3d816a0b381492eae"
          },
          "url": "https://www.semanticscholar.org/paper/4a3e88d203564e547f5fb3f3d816a0b381492eae"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "bcad9241b7aa734e7bd2381177adaa909aa67a40"
          },
          "url": "https://www.semanticscholar.org/paper/bcad9241b7aa734e7bd2381177adaa909aa67a40"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "0801ed93c26dcb72e2bddf756f971e10357942a2"
          },
          "url": "https://www.semanticscholar.org/paper/0801ed93c26dcb72e2bddf756f971e10357942a2"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "731c50b0d6af4c1cb8d95f506541681ea487973b"
          },
          "url": "https://www.semanticscholar.org/paper/731c50b0d6af4c1cb8d95f506541681ea487973b"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "fac0a69b27aa365f45b338f013039dc933069de4"
          },
          "url": "https://www.semanticscholar.org/paper/fac0a69b27aa365f45b338f013039dc933069de4"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "5088b84507462bf725bb4898623d1dead4c6a206"
          },
          "url": "https://www.semanticscholar.org/paper/5088b84507462bf725bb4898623d1dead4c6a206"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "84fc714f22535f82c3fbfa28f2883292d2a02167"
          },
          "url": "https://www.semanticscholar.org/paper/84fc714f22535f82c3fbfa28f2883292d2a02167"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "a99dee9602e21a71526b9681d8dba37c55b66941"
          },
          "url": "https://www.semanticscholar.org/paper/a99dee9602e21a71526b9681d8dba37c55b66941"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "3880f5bad862ba1b18f4f8ec060038b326b118ed"
          },
          "url": "https://www.semanticscholar.org/paper/3880f5bad862ba1b18f4f8ec060038b326b118ed"
        }
      ],
      "additionalProperty": [
        {
          "@type": "PropertyValue",
          "propertyID": "viabilityScore",
          "value": 7
        },
        {
          "@type": "PropertyValue",
          "propertyID": "researchDomain",
          "value": "AI-Enhanced Robotics"
        },
        {
          "@type": "PropertyValue",
          "propertyID": "commercialReadiness",
          "value": "code"
        }
      ]
    },
    {
      "@type": "BreadcrumbList",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "name": "Home",
          "item": "https://sciencetostartup.com"
        },
        {
          "@type": "ListItem",
          "position": 2,
          "name": "AI-Enhanced Robotics",
          "item": "https://sciencetostartup.com/topics"
        },
        {
          "@type": "ListItem",
          "position": 3,
          "name": "VLA-OPD: Bridging Offline SFT and Online RL for Vision-Langu",
          "item": "https://sciencetostartup.com/paper/vla-opd-bridging-offline-sft-and-online-rl-for-vision-language-action-models-via-on-policy-distillation"
        }
      ]
    },
    {
      "@type": "FAQPage",
      "mainEntity": [
        {
          "@type": "Question",
          "name": "What is the startup potential of \"VLA-OPD: Bridging Offline SFT and Online RL for Vision-Langu\"?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "VLA-OPD improves robotic model training by combining efficient fine-tuning with the robustness of RL using on-policy distillation."
          }
        },
        {
          "@type": "Question",
          "name": "What products could be built from this research?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "A cloud-based solution providing VLA-OPD as a service for robotics companies to improve their training protocols, leveraging existing expert models to enhance policy learning in new robots."
          }
        },
        {
          "@type": "Question",
          "name": "What are the practical use cases?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "Develop a robotics software platform that uses VLA-OPD to streamline training of robots for various tasks, reducing training time and improving adaptability and efficiency."
          }
        },
        {
          "@type": "Question",
          "name": "What industries could this research disrupt?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "This method could replace existing reliance on extensive supervised datasets and inefficient RL processes, offering a more streamlined approach to developing robust robotic behaviors."
          }
        }
      ]
    }
  ]
}

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(30)

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(30)

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline