ARXIV:2605.11678 · LLM OPTIMIZATION · SUBMITTED 13 MAY · 21:03 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models

Seungwoo Roh · Huiyeong Kim · Jong-Chan Kim · arXiv

A framework for efficient Vision-Language-Action model inference on memory-constrained GPUs through system-level optimization.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain A framework for efficient Vision-Language-Action model inference on memory-constrained GPUs through system-level optimization.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A framework for efficient Vision-Language-Action model inference on memory-constrained GPUs through system-level optimization. We present a framework, which enables memory-efficient VLA inference on VRAM-constrained GPUs through system-level optimization alone, without model modification.

METHOD

Full abstract

End-to-end Vision-Language-Action (VLA) models for autonomous driving unify perception, reasoning, and control in a single neural network, achieving strong driving performance but requiring 20-60GB of GPU memory-far exceeding the 12-16GB available on commodity GPUs. We present a framework, which enables memory-efficient VLA inference on VRAM-constrained GPUs through system-level optimization alone, without model modification. Our work proceeds in three stages: (1) Sequential Demand Layering reduces VRAM usage from model-level to layer-level granularity; (2) Pipelined Demand Layering hides parameter transfer time within layer execution time via transfer--compute overlap; and (3) a GPU-Resident Layer Decision Policy, informed by per-module residency benefit analysis, eliminates the residual transfer overhead that pipelining cannot hide. We further propose a performance prediction model that determines the optimal configuration-both the number and placement of resident layers-from a single profiling run with less than 1.3% prediction error across all configurations. Applied to NVIDIA's Alpamayo-R1-10B (21.52GB) on an RTX 5070Ti (16GB), our work achieves up to 3.55x speedup over Accelerate offloading while maintaining full BF16 precision.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. We present a framework, which enables memory-efficient VLA inference on VRAM-constrained GPUs through system-level optimization alone, without model modification.

WHY NOW

LLM Optimization moved forward this cycle; last verified May 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA framework for efficient Vision-Language-Action model inference on memory-constrained GPUs through system-level optimization.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A framework for efficient Vision-Language-Action model inference on memory-constrained GPUs through system-level optimization.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A framework for efficient Vision-Language-Action model inference on memory-constrained GPUs through system-level optimization.

Segment

LLM Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "862869a8-adf2-4e4f-96a7-13b31e6e8e9f", "arxiv_id": "2605.11678", "canonical_route": "/paper/oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models", "endpoints": { "paper_pack": "/api/v1/paper/oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models/paper-pack", "build_passport": "/api/v1/paper/oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models", "normalized_query": "2605.11678", "route": "/paper/oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models", "paper_ref": "oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models#webpage", "url": "https://sciencetostartup.com/paper/oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models", "name": "OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models", "description": "A framework for efficient Vision-Language-Action model inference on memory-constrained GPUs through system-level optimization.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models#scholarlyArticle", "headline": "OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models", "description": "A framework for efficient Vision-Language-Action model inference on memory-constrained GPUs through system-level optimization.", "url": "https://sciencetostartup.com/paper/oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models", "sameAs": "https://arxiv.org/abs/2605.11678", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.11678" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-12T07:37:10.000Z", "author": [ { "@type": "Person", "name": "Seungwoo Roh" }, { "@type": "Person", "name": "Huiyeong Kim" }, { "@type": "Person", "name": "Jong-Chan Kim" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Optimization" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Optimization", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Lan", "item": "https://sciencetostartup.com/paper/oom-free-alpamayo-via-cpu-gpu-memory-swapping-for-vision-language-action-models" } ] } ] }

Competitive landscape

A framework for efficient Vision-Language-Action model inference on memory-constrained GPUs through system-level optimization.

Segment

LLM Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models

OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline