ARXIV:2603.11504 · KV CACHE OPTIMIZATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

LongFlow: Efficient KV Cache Compression for Reasoning M

arXiv

LongFlow optimizes KV cache compression for reasoning models, enhancing efficiency and reducing deployment costs.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain LongFlow optimizes KV cache compression for reasoning models, enhancing efficiency and reducing deployment costs.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

LongFlow optimizes KV cache compression for reasoning models, enhancing efficiency and reducing deployment costs. However, this performance gain comes with substantially longer output sequences, leading to significantly increased deployment costs.

METHOD

Full abstract

Recent reasoning models such as OpenAI-o1 and DeepSeek-R1 have shown strong performance on complex tasks including mathematical reasoning and code generation. However, this performance gain comes with substantially longer output sequences, leading to significantly increased deployment costs. In particular, long outputs require large KV caches, resulting in high memory consumption and severe bandwidth pressure during attention computation. Most existing KV cache optimization methods are designed for long-input, short-output scenarios and are ineffective for the long-output setting of reasoning models. Moreover, importance estimation in prior work is computationally expensive and becomes prohibitive when continuous re-evaluation is required during long generation. To address these challenges, we propose LongFlow, a KV cache compression method with an efficient importance estimation metric derived from an intermediate result of attention computation using only the current query. This design introduces negligible computational overhead and requires no auxiliary storage. We further develop a custom kernel that fuses FlashAttention, importance estimation, and token eviction into a single optimized operator, improving system-level efficiency. Experiments show that LongFlow achieves up to an 11.8 times throughput improvement with 80% KV cache compression with minimal impact on model accuracy.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. To address these challenges, we propose LongFlow, a KV cache compression method with an efficient importance estimation metric derived from an intermediate result of…

WHY NOW

KV Cache Optimization moved forward this cycle; last verified April 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainLongFlow optimizes KV cache compression for reasoning models, enhancing efficiency and reducing deployment costs.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

LongFlow optimizes KV cache compression for reasoning models, enhancing efficiency and reducing deployment costs.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

LongFlow optimizes KV cache compression for reasoning models, enhancing efficiency and reducing deployment costs.

Segment

KV Cache Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

References(18)

Reference metadata pending (72acc3249aa1b1d775926ceacce31ec4d7f4e4eb)

Reference metadata pending (150ec99378a547514ecb0b0b315aff4262e28b77)

Reference metadata pending (8efebf82821284ff068c559e45974fe695cf97db)

Reference metadata pending (ef094815dc0118c8d1cc06d4bb3aa78bae0a3877)

Reference metadata pending (050316176d495c42f6363d4ad91b5a9aa39d48e0)

Reference metadata pending (c4da87efe7ff962b327d8aad409cecab7a51e79a)

Reference metadata pending (6cd88ba6ec5ca40a9bf901c89bcc4542407c2457)

Reference metadata pending (1784c987e681d60c634765fe64c8d9c26f73d5ff)

Reference metadata pending (bcf2c7e3f4ed64c8294c35a59220a26dd4f40060)

Reference metadata pending (ad9146d98ae95bbeeef460abe083ecc2c4798672)

Reference metadata pending (210b0a3d76e93079cc51b03c4115fde545eea966)

Reference metadata pending (83b90f4a0ae4cc214eb3cc140ccfef9cd99fac05)

Reference metadata pending (b31a5884a8ebe96b6300839b28608b97f8f8ef76)

Reference metadata pending (823ca4778e1027f2f0b356df051d762dcecaaba0)

Reference metadata pending (ab0e3d3e4d42369de5933a3b4c237780b41c0d77)

Reference metadata pending (87c5b281fa43e6f27191b20a8dd694eda1126336)

Reference metadata pending (57d1e7ac339e783898f2c3b1af55737cbeee9fc5)

Reference metadata pending (204e3073870fae3d05bcbc2f6a8e263d9b72e776)

{ "contract_version": "paper-r2", "paper_id": "65922ddc-0212-4ef0-9dc0-b6e216750bc7", "arxiv_id": "2603.11504", "canonical_route": "/paper/longflow-efficient-kv-cache-compression-for-reasoning-m", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "longflow-efficient-kv-cache-compression-for-reasoning-m", "endpoints": { "paper_pack": "/api/v1/paper/longflow-efficient-kv-cache-compression-for-reasoning-m/paper-pack", "build_passport": "/api/v1/paper/longflow-efficient-kv-cache-compression-for-reasoning-m/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "LongFlow: Efficient KV Cache Compression for Reasoning M", "normalized_query": "2603.11504", "route": "/paper/longflow-efficient-kv-cache-compression-for-reasoning-m", "paper_ref": "longflow-efficient-kv-cache-compression-for-reasoning-m", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/longflow-efficient-kv-cache-compression-for-reasoning-m#webpage", "url": "https://sciencetostartup.com/paper/longflow-efficient-kv-cache-compression-for-reasoning-m", "name": "LongFlow: Efficient KV Cache Compression for Reasoning M", "description": "LongFlow optimizes KV cache compression for reasoning models, enhancing efficiency and reducing deployment costs.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/longflow-efficient-kv-cache-compression-for-reasoning-m#scholarlyArticle", "headline": "LongFlow: Efficient KV Cache Compression for Reasoning M", "description": "LongFlow optimizes KV cache compression for reasoning models, enhancing efficiency and reducing deployment costs.", "url": "https://sciencetostartup.com/paper/longflow-efficient-kv-cache-compression-for-reasoning-m", "sameAs": "https://arxiv.org/abs/2603.11504", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.11504" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-12T03:46:35.000Z", "citation": [ { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "72acc3249aa1b1d775926ceacce31ec4d7f4e4eb" }, "url": "https://www.semanticscholar.org/paper/72acc3249aa1b1d775926ceacce31ec4d7f4e4eb" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "150ec99378a547514ecb0b0b315aff4262e28b77" }, "url": "https://www.semanticscholar.org/paper/150ec99378a547514ecb0b0b315aff4262e28b77" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "8efebf82821284ff068c559e45974fe695cf97db" }, "url": "https://www.semanticscholar.org/paper/8efebf82821284ff068c559e45974fe695cf97db" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "ef094815dc0118c8d1cc06d4bb3aa78bae0a3877" }, "url": "https://www.semanticscholar.org/paper/ef094815dc0118c8d1cc06d4bb3aa78bae0a3877" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "050316176d495c42f6363d4ad91b5a9aa39d48e0" }, "url": "https://www.semanticscholar.org/paper/050316176d495c42f6363d4ad91b5a9aa39d48e0" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "c4da87efe7ff962b327d8aad409cecab7a51e79a" }, "url": "https://www.semanticscholar.org/paper/c4da87efe7ff962b327d8aad409cecab7a51e79a" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "6cd88ba6ec5ca40a9bf901c89bcc4542407c2457" }, "url": "https://www.semanticscholar.org/paper/6cd88ba6ec5ca40a9bf901c89bcc4542407c2457" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "1784c987e681d60c634765fe64c8d9c26f73d5ff" }, "url": "https://www.semanticscholar.org/paper/1784c987e681d60c634765fe64c8d9c26f73d5ff" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "bcf2c7e3f4ed64c8294c35a59220a26dd4f40060" }, "url": "https://www.semanticscholar.org/paper/bcf2c7e3f4ed64c8294c35a59220a26dd4f40060" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "ad9146d98ae95bbeeef460abe083ecc2c4798672" }, "url": "https://www.semanticscholar.org/paper/ad9146d98ae95bbeeef460abe083ecc2c4798672" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "210b0a3d76e93079cc51b03c4115fde545eea966" }, "url": "https://www.semanticscholar.org/paper/210b0a3d76e93079cc51b03c4115fde545eea966" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "83b90f4a0ae4cc214eb3cc140ccfef9cd99fac05" }, "url": "https://www.semanticscholar.org/paper/83b90f4a0ae4cc214eb3cc140ccfef9cd99fac05" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "b31a5884a8ebe96b6300839b28608b97f8f8ef76" }, "url": "https://www.semanticscholar.org/paper/b31a5884a8ebe96b6300839b28608b97f8f8ef76" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "823ca4778e1027f2f0b356df051d762dcecaaba0" }, "url": "https://www.semanticscholar.org/paper/823ca4778e1027f2f0b356df051d762dcecaaba0" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "ab0e3d3e4d42369de5933a3b4c237780b41c0d77" }, "url": "https://www.semanticscholar.org/paper/ab0e3d3e4d42369de5933a3b4c237780b41c0d77" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "87c5b281fa43e6f27191b20a8dd694eda1126336" }, "url": "https://www.semanticscholar.org/paper/87c5b281fa43e6f27191b20a8dd694eda1126336" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "57d1e7ac339e783898f2c3b1af55737cbeee9fc5" }, "url": "https://www.semanticscholar.org/paper/57d1e7ac339e783898f2c3b1af55737cbeee9fc5" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "204e3073870fae3d05bcbc2f6a8e263d9b72e776" }, "url": "https://www.semanticscholar.org/paper/204e3073870fae3d05bcbc2f6a8e263d9b72e776" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "KV Cache Optimization" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "KV Cache Optimization", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "LongFlow: Efficient KV Cache Compression for Reasoning M", "item": "https://sciencetostartup.com/paper/longflow-efficient-kv-cache-compression-for-reasoning-m" } ] } ] }