ARXIV:2604.00136 · LLM SERVING · SUBMITTED 03 APR · 20:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Annette Taberner-Miller · arXiv

An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.

Cooling›Score7.0Evidence verified

Opportunity summary

Pain An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.

Evidence 73 refs | 4 sources | 83% coverage

Blocker Evidence verified

Open Build Read PDF Signal Canvas Track

PROBLEM

An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models. This trade-off is non-stationary: providers revise pricing, model quality can regress silently, and new models must be…

METHOD

Full abstract

Production LLM serving often relies on multi-model portfolios spanning a ~530x cost range, where routing decisions trade off quality against cost. This trade-off is non-stationary: providers revise pricing, model quality can regress silently, and new models must be integrated without downtime. We present ParetoBandit, an open-source adaptive router built on cost-aware contextual bandits that is the first to simultaneously enforce dollar-denominated budgets, adapt online to such shifts, and onboard new models at runtime. ParetoBandit closes these gaps through three mechanisms. An online primal-dual budget pacer enforces a per-request cost ceiling over an open-ended stream, replacing offline penalty tuning with closed-loop control. Geometric forgetting on sufficient statistics enables rapid adaptation to price and quality shifts while bootstrapping from offline priors. A hot-swap registry lets operators add or remove models at runtime, with a brief forced-exploration phase for each newcomer, after which UCB selection discovers its quality-cost niche from live traffic alone. We evaluate ParetoBandit across four deployment scenarios on 1,824 prompts routed through a three-model portfolio. Across seven budget ceilings, mean per-request cost never exceeds the target by more than 0.4%. When conditions shift, the system adapts: an order-of-magnitude price cut on the costliest model yields up to +0.071 quality lift, and a silent quality regression is detected and rerouted within budget. A cold-started model reaches meaningful adoption within ~142 steps without breaching the cost ceiling. The router discriminates rather than blindly adopting: expensive models are budget-gated and low-quality models rejected after bounded exploration. End-to-end routing latency is 9.8ms on CPU -- less than 0.4% of typical inference time -- with the routing decision itself taking just 22.5us.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Geometric forgetting on sufficient statistics enables rapid adaptation to price and quality shifts while bootstrapping from offline priors. A public repository is linked, so…

WHY NOW

LLM Serving moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAn adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.

Evidence73 refs | 4 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Competitive landscape

An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.

Segment

LLM Serving

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "5928b415-89ad-4102-99ef-e09825dd5af2", "arxiv_id": "2604.00136", "canonical_route": "/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving", "endpoints": { "paper_pack": "/api/v1/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving/paper-pack", "build_passport": "/api/v1/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving", "normalized_query": "2604.00136", "route": "/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving", "paper_ref": "paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving#webpage", "url": "https://sciencetostartup.com/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving", "name": "ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving", "description": "An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving#scholarlyArticle", "headline": "ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving", "description": "An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.", "url": "https://sciencetostartup.com/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving", "sameAs": "https://arxiv.org/abs/2604.00136", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.00136" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-31T18:41:53.000Z", "author": [ { "@type": "Person", "name": "Annette Taberner-Miller" } ], "codeRepository": "https://github.com/ParetoBandit/ParetoBandit", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Serving" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving#software", "name": "ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving - Source Code", "description": "An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.", "codeRepository": "https://github.com/ParetoBandit/ParetoBandit", "url": "https://github.com/ParetoBandit/ParetoBandit" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Serving", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationa", "item": "https://sciencetostartup.com/paper/paretobandit-budget-paced-adaptive-routing-for-non-stationary-llm-serving" } ] } ] }

Competitive landscape

An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.

Segment

LLM Serving

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

BUILDER'S SANDBOX

Build This Paper

Recommended Stack

Startup Essentials

Founder's Pitch

"An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models."

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

BUILDER'S SANDBOX

Build This Paper

Recommended Stack

Startup Essentials

Founder's Pitch

"An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models."

Timeline