ARXIV:2602.05728 · RAG OPTIMIZATION · SUBMITTED 19 MAR · 21:31 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsErrorProof: failed

CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering

arXiv

CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning.

Blocked on Code›Score8.0Evidence failed

Opportunity summary

Pain CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence failed

Open Build Read PDF Signal Canvas Track

PROBLEM

CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning. However, existing multi-hop RAG systems remain inefficient, as they alternate between retrieval and reasoning at…

METHOD

Full abstract

Retrieval-augmented generation (RAG) has become a key paradigm for knowledge-intensive question answering. However, existing multi-hop RAG systems remain inefficient, as they alternate between retrieval and reasoning at each step, resulting in repeated LLM calls, high token consumption, and unstable entity grounding across hops. We propose CompactRAG, a simple yet effective framework that decouples offline corpus restructuring from online reasoning. In the offline stage, an LLM reads the corpus once and converts it into an atomic QA knowledge base, which represents knowledge as minimal, fine-grained question-answer pairs. In the online stage, complex queries are decomposed and carefully rewritten to preserve entity consistency, and are resolved through dense retrieval followed by RoBERTa-based answer extraction. Notably, during inference, the LLM is invoked only twice in total - once for sub-question decomposition and once for final answer synthesis - regardless of the number of reasoning hops. Experiments on HotpotQA, 2WikiMultiHopQA, and MuSiQue demonstrate that CompactRAG achieves competitive accuracy while substantially reducing token consumption compared to iterative RAG baselines, highlighting a cost-efficient and practical approach to multi-hop reasoning over large knowledge corpora. The implementation is available at GitHub.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. Experiments on HotpotQA, 2WikiMultiHopQA, and MuSiQue demonstrate that CompactRAG achieves competitive accuracy while substantially reducing token consumption compared to iterative RAG baselines, highlighting a…

WHY NOW

RAG Optimization moved forward this cycle; last verified April 2026. Public score 8.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainCompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning.

Evidence0 refs | 0 sources | 33% coverage

Blockermissing authors

Analysis summary

CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsErrorProof: failed

Competitive landscape

CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning.

Segment

RAG Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

References(19)

SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

2025Yiqiao Jin, Kartik Sharma et al.

RADIANT: Retrieval AugmenteD entIty-context AligNmenT - Introducing RAG-ability and Entity-Context Divergence

2025Vipula Rawte, Rajarshi Roy et al.

LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers

2025Zhuocheng Zhang, Yang Feng et al.

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

2024Yixuan Tang, Yi Yang

Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts?

2024Hexiang Tan, Fei Sun et al.

MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

2024Ruosen Li, Zimu Wang et al.

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

2023Zhihong Shao, Yeyun Gong et al.

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

2023Zexuan Zhong, Zhengxuan Wu et al.

Dr.ICL: Demonstration-Retrieved In-context Learning

2023Man Luo, Xin Xu et al.

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

2022H. Trivedi, Niranjan Balasubramanian et al.

Counterfactual Multihop QA: A Cause-Effect Approach for Reducing Disconnected Reasoning

2022Wangzhen Guo, Qinkang Gong et al.

Measuring and Narrowing the Compositionality Gap in Language Models

2022Ofir Press, Muru Zhang et al.

Unsupervised Dense Information Retrieval with Contrastive Learning

2021Gautier Izacard, Mathilde Caron et al.

Improving language models by retrieving from trillions of tokens

2021Sebastian Borgeaud, Arthur Mensch et al.

♫ MuSiQue: Multihop Questions via Single-hop Question Composition

2021H. Trivedi, Niranjan Balasubramanian et al.

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

2020Xanh Ho, A. Nguyen et al.

Dense Passage Retrieval for Open-Domain Question Answering

2020Vladimir Karpukhin, Barlas Oğuz et al.

Unsupervised Question Decomposition for Question Answering

2020Ethan Perez, Patrick Lewis et al.

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

2018Zhilin Yang, Peng Qi et al.

{ "contract_version": "paper-r2", "paper_id": "96690b46-be90-4aaa-aca3-fa32fc947b85", "arxiv_id": "2602.05728", "canonical_route": "/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering", "endpoints": { "paper_pack": "/api/v1/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering/paper-pack", "build_passport": "/api/v1/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering", "normalized_query": "2602.05728", "route": "/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering", "paper_ref": "compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering#webpage", "url": "https://sciencetostartup.com/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering", "name": "CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering", "description": "CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering#scholarlyArticle", "headline": "CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering", "description": "CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning.", "url": "https://sciencetostartup.com/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering", "sameAs": "https://arxiv.org/abs/2602.05728", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2602.05728" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-02-05T14:52:06.000Z", "author": [ { "@type": "Person", "name": "Hao Yang", "affiliation": { "@type": "Organization", "name": "State Key Laboratory for Novel Software Technology, Nanjing University" } }, { "@type": "Person", "name": "Zhiyu Yang", "affiliation": { "@type": "Organization", "name": "Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas" } }, { "@type": "Person", "name": "Xupeng Zhang", "affiliation": { "@type": "Organization", "name": "Isoftstone Information Technology (Group) Co.,Ltd." } }, { "@type": "Person", "name": "Wei Wei", "affiliation": { "@type": "Organization", "name": "College of Electronic and Information Engineering, Tongji University" } }, { "@type": "Person", "name": "Yunjie Zhang", "affiliation": { "@type": "Organization", "name": "School of Electronic Information, Central South University" } }, { "@type": "Person", "name": "Lin Yang", "affiliation": { "@type": "Organization", "name": "State Key Laboratory for Novel Software Technology, Nanjing University" } } ], "citation": [ { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "fbbb281fb3b88e1cff16f8879493406fa1926353" }, "url": "https://www.semanticscholar.org/paper/fbbb281fb3b88e1cff16f8879493406fa1926353" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "719089f4a546d74b4b8ec6bcdac5dfd8dd4391da" }, "url": "https://www.semanticscholar.org/paper/719089f4a546d74b4b8ec6bcdac5dfd8dd4391da" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "dbed9b90fe58f5a5b944139407ad09ea3905f46c" }, "url": "https://www.semanticscholar.org/paper/dbed9b90fe58f5a5b944139407ad09ea3905f46c" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "4e71624e90960cb003e311a0fe3b8be4c2863239" }, "url": "https://www.semanticscholar.org/paper/4e71624e90960cb003e311a0fe3b8be4c2863239" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "ce78107fa37f1117da5734eea3ed44952a318611" }, "url": "https://www.semanticscholar.org/paper/ce78107fa37f1117da5734eea3ed44952a318611" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "a1675f47125aa409525c5f759b5e6bcc1c8831aa" }, "url": "https://www.semanticscholar.org/paper/a1675f47125aa409525c5f759b5e6bcc1c8831aa" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "56e952fd463accff09cf2e35432aaabd7c7c57f3" }, "url": "https://www.semanticscholar.org/paper/56e952fd463accff09cf2e35432aaabd7c7c57f3" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "18143a4c2da37444e06feed04cc9efeb0856352d" }, "url": "https://www.semanticscholar.org/paper/18143a4c2da37444e06feed04cc9efeb0856352d" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "f208ea909fa7f54fea82def9a92fd81dfc758c39" }, "url": "https://www.semanticscholar.org/paper/f208ea909fa7f54fea82def9a92fd81dfc758c39" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "129d15d47107b9872d63bc3cb9c1dfebca7938ee" }, "url": "https://www.semanticscholar.org/paper/129d15d47107b9872d63bc3cb9c1dfebca7938ee" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "e070ff286709db28312e08b52b05539debe88146" }, "url": "https://www.semanticscholar.org/paper/e070ff286709db28312e08b52b05539debe88146" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "4f4a409f701f7552d45c46a5b0fea69dca6f8e84" }, "url": "https://www.semanticscholar.org/paper/4f4a409f701f7552d45c46a5b0fea69dca6f8e84" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "002c256d30d6be4b23d365a8de8ae0e67e4c9641" }, "url": "https://www.semanticscholar.org/paper/002c256d30d6be4b23d365a8de8ae0e67e4c9641" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "ec307b17f193b14292206b65a1bcc95bfd8f02ed" }, "url": "https://www.semanticscholar.org/paper/ec307b17f193b14292206b65a1bcc95bfd8f02ed" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "9001eb3c3d5a96ad3d804410c2437e6f60feade9" }, "url": "https://www.semanticscholar.org/paper/9001eb3c3d5a96ad3d804410c2437e6f60feade9" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "b26f2037f769d5ffc5f7bdcec2de8da28ec14bee" }, "url": "https://www.semanticscholar.org/paper/b26f2037f769d5ffc5f7bdcec2de8da28ec14bee" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "a0d9cb1f91382ec763f7aba0609cf9a6324e0b68" }, "url": "https://www.semanticscholar.org/paper/a0d9cb1f91382ec763f7aba0609cf9a6324e0b68" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "22655979df781d222eaf812b0d325fa9adf11594" }, "url": "https://www.semanticscholar.org/paper/22655979df781d222eaf812b0d325fa9adf11594" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "4a82e64d231c1f83f93033aa4cd509f1ed7e2e02" }, "url": "https://www.semanticscholar.org/paper/4a82e64d231c1f83f93033aa4cd509f1ed7e2e02" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "RAG Optimization" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "RAG Optimization", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "CompactRAG: Reducing LLM Calls and Token Overhead in Multi-H", "item": "https://sciencetostartup.com/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"CompactRAG: Reducing LLM Calls and Token Overhead in Multi-H\"?", "acceptedAnswer": { "@type": "Answer", "text": "CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "CompactRAG can be productized into an API or SaaS platform that offers efficient multi-hop question answering services for industries that rely on large knowledge corpora, like legal, academic, or medical sectors." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "Develop an enterprise-level customer support system using CompactRAG to efficiently answer multi-step customer inquiries while minimizing costs." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "CompactRAG can replace existing RAG systems in multi-hop question answering by offering a more token-efficient, scalable, and cost-effective solution, thus disrupting standard RAG practices." } } ] } ] }

Competitive landscape

CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficient solution for knowledge-intensive reasoning.

Segment

RAG Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

References(19)

SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

2025Yiqiao Jin, Kartik Sharma et al.

RADIANT: Retrieval AugmenteD entIty-context AligNmenT - Introducing RAG-ability and Entity-Context Divergence

2025Vipula Rawte, Rajarshi Roy et al.

LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers

2025Zhuocheng Zhang, Yang Feng et al.

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

2024Yixuan Tang, Yi Yang

Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts?

2024Hexiang Tan, Fei Sun et al.

MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

2024Ruosen Li, Zimu Wang et al.

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

2023Zhihong Shao, Yeyun Gong et al.

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

2023Zexuan Zhong, Zhengxuan Wu et al.

Dr.ICL: Demonstration-Retrieved In-context Learning

2023Man Luo, Xin Xu et al.

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

2022H. Trivedi, Niranjan Balasubramanian et al.

Counterfactual Multihop QA: A Cause-Effect Approach for Reducing Disconnected Reasoning

2022Wangzhen Guo, Qinkang Gong et al.

Measuring and Narrowing the Compositionality Gap in Language Models

2022Ofir Press, Muru Zhang et al.

Unsupervised Dense Information Retrieval with Contrastive Learning

2021Gautier Izacard, Mathilde Caron et al.

Improving language models by retrieving from trillions of tokens

2021Sebastian Borgeaud, Arthur Mensch et al.

♫ MuSiQue: Multihop Questions via Single-hop Question Composition

2021H. Trivedi, Niranjan Balasubramanian et al.

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

2020Xanh Ho, A. Nguyen et al.

Dense Passage Retrieval for Open-Domain Question Answering

2020Vladimir Karpukhin, Barlas Oğuz et al.

Unsupervised Question Decomposition for Question Answering

2020Ethan Perez, Patrick Lewis et al.

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

2018Zhilin Yang, Peng Qi et al.

CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering

CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(19)

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(19)

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline