ARXIV:2603.16104 · LLM SERVING · SUBMITTED 19 MAR · 20:22 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authors

Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective

arXiv

Helium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling.

Blocked on Code›Score7.0Evidence verified

Opportunity summary

Pain Helium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling.

Evidence 0 refs | 0 sources | 50% coverage

Blocker Evidence verified

Open Build Read PDF Signal Canvas Track

PROBLEM

Helium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling. These workflows exhibit extensive redundancy from overlapping prompts and intermediate results due to speculative and parallel exploration.

METHOD

Full abstract

Agentic workflows are composed of sequences of interdependent Large Language Model (LLM) calls, and they have become a dominant workload in modern AI systems. These workflows exhibit extensive redundancy from overlapping prompts and intermediate results due to speculative and parallel exploration. Existing LLM serving systems, such as vLLM, focus on optimizing individual inference calls and overlook cross-call dependencies, leading to significant inefficiencies. This paper rethinks LLM and agent serving from a data systems perspective and introduces Helium, a workflow-aware serving framework that models agentic workloads as query plans and treats LLM invocations as first-class operators. Helium integrates proactive caching and cache-aware scheduling to maximize reuse across prompts, KV states, and workflows. Through these techniques, Helium bridges classic query optimization principles with LLM serving, achieving up to 1.56x speedup over state-of-the-art agent serving systems on various workloads. Our results demonstrate that end-to-end optimization across workflows is essential for scalable and efficient LLM-based agents.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. These workflows exhibit extensive redundancy from overlapping prompts and intermediate results due to speculative and parallel exploration. A public repository is linked, so build…

WHY NOW

LLM Serving moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainHelium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling.

Evidence0 refs | 0 sources | 50% coverage

Blockermissing authors

Analysis summary

Helium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authors

Competitive landscape

Helium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling.

Segment

LLM Serving

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d84d8d3d-f698-4af3-8879-1991f4208dec", "arxiv_id": "2603.16104", "canonical_route": "/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective", "endpoints": { "paper_pack": "/api/v1/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective/paper-pack", "build_passport": "/api/v1/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective", "normalized_query": "2603.16104", "route": "/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective", "paper_ref": "efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective#webpage", "url": "https://sciencetostartup.com/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective", "name": "Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective", "description": "Helium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective#scholarlyArticle", "headline": "Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective", "description": "Helium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling.", "url": "https://sciencetostartup.com/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective", "sameAs": "https://arxiv.org/abs/2603.16104", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.16104" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-17T04:03:18.000Z", "codeRepository": "https://github.com/mlsys-io/helium_demo", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Serving" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective#software", "name": "Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective - Source Code", "description": "Helium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling.", "codeRepository": "https://github.com/mlsys-io/helium_demo", "url": "https://github.com/mlsys-io/helium_demo" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Serving", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Efficient LLM Serving for Agentic Workflows: A Data Systems ", "item": "https://sciencetostartup.com/paper/efficient-llm-serving-for-agentic-workflows-a-data-systems-perspective" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Now is the time because agentic workflows are moving from prototypes to production, exposing inefficiencies that become costly at scale, and the market lacks specialized serving systems that optimize across workflows rather than single calls." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "A customer service automation platform that uses LLM agents to handle multi-step support tickets could integrate this framework to cache common prompt patterns and intermediate reasoning across thousands of concurrent conversations, cutting inference costs by 30% while speeding up resolution times." } } ] } ] }

Competitive landscape

Helium optimizes LLM serving for agentic workflows by integrating proactive caching and cache-aware scheduling.

Segment

LLM Serving

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective

Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline