ARXIV:2604.04539 · ROBOT CONTROL RL · SUBMITTED 07 APR · 20:12 UTC · FRESHNESS UNKNOWN

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

Donghu Kim · Youngdo Lee · Minho Park · Kinam Kim · I Made Aswin Nahendra · Takuma Seno · +7 at arXiv

A faster and more stable off-policy reinforcement learning algorithm for high-dimensional robot control, significantly reducing sim-to-real training time.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A faster and more stable off-policy reinforcement learning algorithm for high-dimensional robot control, significantly reducing sim-to-real training time.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A faster and more stable off-policy reinforcement learning algorithm for high-dimensional robot control, significantly reducing sim-to-real training time. On-policy methods such as Proximal Policy Optimization (PPO) are widely used for their stability, but their…

METHOD

Full abstract

Reinforcement learning (RL) is a core approach for robot control when expert demonstrations are unavailable. On-policy methods such as Proximal Policy Optimization (PPO) are widely used for their stability, but their reliance on narrowly distributed on-policy data limits accurate policy evaluation in high-dimensional state and action spaces. Off-policy methods can overcome this limitation by learning from a broader state-action distribution, yet suffer from slow convergence and instability, as fitting a value function over diverse data requires many gradient updates, causing critic errors to accumulate through bootstrapping. We present FlashSAC, a fast and stable off-policy RL algorithm built on Soft Actor-Critic. Motivated by scaling laws observed in supervised learning, FlashSAC sharply reduces gradient updates while compensating with larger models and higher data throughput. To maintain stability at increased scale, FlashSAC explicitly bounds weight, feature, and gradient norms, curbing critic error accumulation. Across over 60 tasks in 10 simulators, FlashSAC consistently outperforms PPO and strong off-policy baselines in both final performance and training efficiency, with the largest gains on high-dimensional tasks such as dexterous manipulation. In sim-to-real humanoid locomotion, FlashSAC reduces training time from hours to minutes, demonstrating the promise of off-policy RL for sim-to-real transfer.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. In sim-to-real humanoid locomotion, FlashSAC reduces training time from hours to minutes, demonstrating the promise of off-policy RL for sim-to-real transfer. Code availability is…

WHY NOW

Robot Control RL moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA faster and more stable off-policy reinforcement learning algorithm for high-dimensional robot control, significantly reducing sim-to-real training time.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

A faster and more stable off-policy reinforcement learning algorithm for high-dimensional robot control, significantly reducing sim-to-real training time.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A faster and more stable off-policy reinforcement learning algorithm for high-dimensional robot control, significantly reducing sim-to-real training time.

Segment

Robot Control RL

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "66417117-4a87-4c83-97e3-e634d84b32c2", "arxiv_id": "2604.04539", "canonical_route": "/paper/flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control", "endpoints": { "paper_pack": "/api/v1/paper/flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control/paper-pack", "build_passport": "/api/v1/paper/flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control", "normalized_query": "2604.04539", "route": "/paper/flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control", "paper_ref": "flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control#webpage", "url": "https://sciencetostartup.com/paper/flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control", "name": "FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control", "description": "A faster and more stable off-policy reinforcement learning algorithm for high-dimensional robot control, significantly reducing sim-to-real training time.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control#scholarlyArticle", "headline": "FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control", "description": "A faster and more stable off-policy reinforcement learning algorithm for high-dimensional robot control, significantly reducing sim-to-real training time.", "url": "https://sciencetostartup.com/paper/flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control", "sameAs": "https://arxiv.org/abs/2604.04539", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.04539" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-06T09:03:41.000Z", "author": [ { "@type": "Person", "name": "Donghu Kim" }, { "@type": "Person", "name": "Youngdo Lee" }, { "@type": "Person", "name": "Minho Park" }, { "@type": "Person", "name": "Kinam Kim" }, { "@type": "Person", "name": "I Made Aswin Nahendra" }, { "@type": "Person", "name": "Takuma Seno" }, { "@type": "Person", "name": "Sehee Min" }, { "@type": "Person", "name": "Daniel Palenicek" }, { "@type": "Person", "name": "Florian Vogt" }, { "@type": "Person", "name": "Danica Kragic" }, { "@type": "Person", "name": "Jan Peters" }, { "@type": "Person", "name": "Jaegul Choo" }, { "@type": "Person", "name": "Hojoon Lee" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Robot Control RL" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Robot Control RL", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "FlashSAC: Fast and Stable Off-Policy Reinforcement Learning ", "item": "https://sciencetostartup.com/paper/flashsac-fast-and-stable-off-policy-reinforcement-learning-for-high-dimensional-robot-control" } ] } ] }

Competitive landscape

A faster and more stable off-policy reinforcement learning algorithm for high-dimensional robot control, significantly reducing sim-to-real training time.

Segment

Robot Control RL

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline