ARXIV:2604.00344 · LLM AGENTS · SUBMITTED 02 APR · 21:00 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning

Eric Hanchen Jiang · Levina Li · Rui Sun · Xiao Liang · Yubei Li · Yuchen Wu · +6 at arXiv

Agent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain Agent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks.

Evidence 12 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Agent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks. However, solving complex problems often requires the coordination of multiple agents, raising…

METHOD

Full abstract

Large Language Models (LLMs) have shown remarkable performance in completing various tasks. However, solving complex problems often requires the coordination of multiple agents, raising a fundamental question: how to effectively select and interconnect these agents. In this paper, we propose \textbf{Agent Q-Mix}, a reinforcement learning framework that reformulates topology selection as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. Our method learns decentralized communication decisions using QMIX value factorization, where each agent selects from a set of communication actions that jointly induce a round-wise communication graph. At its core, Agent Q-Mix combines a topology-aware GNN encoder, GRU memory, and per-agent Q-heads under a Centralized Training with Decentralized Execution (CTDE) paradigm. The framework optimizes a reward function that balances task accuracy with token cost. Across seven core benchmarks in coding, reasoning, and mathematics, Agent Q-Mix achieves the highest average accuracy compared to existing methods while demonstrating superior token efficiency and robustness against agent failure. Notably, on the challenging Humanity's Last Exam (HLE) using Gemini-3.1-Flash-Lite as a backbone, Agent Q-Mix achieves 20.8\% accuracy, outperforming Microsoft Agent Framework (19.2\%) and LangGraph (19.2\%), followed by AutoGen and Lobster by OpenClaw. These results underscore the effectiveness of learned, decentralized topology optimization in pushing the boundaries of multi-agent reasoning.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Across seven core benchmarks in coding, reasoning, and mathematics, Agent Q-Mix achieves the highest average accuracy compared to existing methods while demonstrating superior token…

WHY NOW

LLM Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAgent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks.

Evidence12 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Agent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Agent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks.

Segment

LLM Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "bd70c155-df03-48c8-8cea-a3bb184cd69c", "arxiv_id": "2604.00344", "canonical_route": "/paper/agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning", "endpoints": { "paper_pack": "/api/v1/paper/agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning/paper-pack", "build_passport": "/api/v1/paper/agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning", "normalized_query": "2604.00344", "route": "/paper/agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning", "paper_ref": "agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning#webpage", "url": "https://sciencetostartup.com/paper/agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning", "name": "Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning", "description": "Agent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning#scholarlyArticle", "headline": "Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning", "description": "Agent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks.", "url": "https://sciencetostartup.com/paper/agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning", "sameAs": "https://arxiv.org/abs/2604.00344", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.00344" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-01T00:38:24.000Z", "author": [ { "@type": "Person", "name": "Eric Hanchen Jiang" }, { "@type": "Person", "name": "Levina Li" }, { "@type": "Person", "name": "Rui Sun" }, { "@type": "Person", "name": "Xiao Liang" }, { "@type": "Person", "name": "Yubei Li" }, { "@type": "Person", "name": "Yuchen Wu" }, { "@type": "Person", "name": "Haozheng Luo" }, { "@type": "Person", "name": "Hengli Li" }, { "@type": "Person", "name": "Zhi Zhang" }, { "@type": "Person", "name": "Zhaolu Kang" }, { "@type": "Person", "name": "Kai-Wei Chang" }, { "@type": "Person", "name": "Ying Nian Wu" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent ", "item": "https://sciencetostartup.com/paper/agent-q-mix-selecting-the-right-action-for-llm-multi-agent-systems-through-reinforcement-learning" } ] } ] }

Competitive landscape

Agent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks.

Segment

LLM Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning

Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline