ARXIV:2604.02155 · AGENTS · SUBMITTED 03 APR · 20:50 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents

Xuan Qi · arXiv

A novel CoT prompting strategy that significantly improves function-calling agent accuracy and reliability by focusing on efficient function routing, with a structural guarantee against hallucination.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A novel CoT prompting strategy that significantly improves function-calling agent accuracy and reliability by focusing on efficient function routing, with a structural guarantee against hallucination.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

How much should a language agent think before taking action? Chain-of-thought (CoT) reasoning is widely assumed to improve agent performance, but the relationship between reasoning length and accuracy in structured tool-use settings remains poorly understood. We present a systematic study of CoT budget effects on function-calling agents, sweeping six token budgets (0--512) across 200 tasks from the Berkeley Function Calling Leaderboard v3 Multiple benchmark. Our central finding is a striking non-monotonic pattern on Qwen2.5-1.5B-Instruct: brief reasoning (32 tokens) dramatically improves accuracy by 45% relative over direct answers, from 44.0% to 64.0%, while extended reasoning (256 tokens) degrades performance well below the no-CoT baseline, to 25.0% (McNemar p < 0.001). A three-way error decomposition reveals the mechanism. At d = 0, 30.5% of tasks fail because the model selects the wrong function from the candidate set; brief CoT reduces this to 1.5%, effectively acting as a function-routing step, while long CoT reverses the gain, yielding 28.0% wrong selections and 18.0% hallucinated functions at d = 256. Oracle analysis shows that 88.6% of solvable tasks require at most 32 reasoning tokens, with an average of 27.6 tokens, and a finer-grained sweep indicates that the true optimum lies at 8--16 tokens. Motivated by this routing effect, we propose Function-Routing CoT (FR-CoT), a structured brief-CoT method that templates the reasoning phase as "Function: [name] / Key args: [...]," forcing commitment to a valid function name at the start of reasoning. FR-CoT achieves accuracy statistically equivalent to free-form d = 32 CoT while reducing function hallucination to 0.0%, providing a structural reliability guarantee without budget tuning.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Chain-of-thought (CoT) reasoning is widely assumed to improve agent performance, but the relationship between reasoning length and accuracy in structured tool-use settings remains poorly…

WHY NOW

Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA novel CoT prompting strategy that significantly improves function-calling agent accuracy and reliability by focusing on efficient function routing, with a structural guarantee against hallucination.

Evidence0 refs | 0 sources | 33% coverage

Blockerno shell-level blocker reported

Analysis summary

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "3f01c3e3-30e7-440b-9455-d26fbb7e56ab", "arxiv_id": "2604.02155", "canonical_route": "/paper/brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents", "endpoints": { "paper_pack": "/api/v1/paper/brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents/paper-pack", "build_passport": "/api/v1/paper/brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents", "normalized_query": "2604.02155", "route": "/paper/brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents", "paper_ref": "brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents#webpage", "url": "https://sciencetostartup.com/paper/brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents", "name": "Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents", "description": "A novel CoT prompting strategy that significantly improves function-calling agent accuracy and reliability by focusing on efficient function routing, with a structural guarantee against hallucination.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents#scholarlyArticle", "headline": "Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents", "description": "A novel CoT prompting strategy that significantly improves function-calling agent accuracy and reliability by focusing on efficient function routing, with a structural guarantee against hallucination.", "url": "https://sciencetostartup.com/paper/brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents", "sameAs": "https://arxiv.org/abs/2604.02155", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.02155" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-02T15:25:13.000Z", "author": [ { "@type": "Person", "name": "Xuan Qi" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effec", "item": "https://sciencetostartup.com/paper/brief-is-better-non-monotonic-chain-of-thought-budget-effects-in-function-calling-language-agents" } ] } ] }

Competitive landscape

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents

Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline