ARXIV:2604.11304 · AI AGENTS · SUBMITTED 14 APR · 20:32 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows

Elaine Lau · Markus Dücker · Ronak Chaudhary · Hui Wen Goh · Rosemary Wei · Vaibhav Kumar · +21 at arXiv

BankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards.

Ship in 2-4 weeks›Score8.0Evidence unverified

Opportunity summary

Pain BankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards.

Evidence 0 refs | 4 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

BankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards. To evaluate frontier AI agents in a high-value, labor-intensive profession, we introduce…

METHOD

Full abstract

Existing AI benchmarks lack the fidelity to assess economically meaningful progress on professional workflows. To evaluate frontier AI agents in a high-value, labor-intensive profession, we introduce BankerToolBench (BTB): an open-source benchmark of end-to-end analytical workflows routinely performed by junior investment bankers. To develop an ecologically valid benchmark grounded in representative work environments, we collaborated with 502 investment bankers from leading firms. BTB requires agents to execute senior banker requests by navigating data rooms, using industry tools (market data platform, SEC filings database), and generating multi-file deliverables--including Excel financial models, PowerPoint pitch decks, and PDF/Word reports. Completing a BTB task takes bankers up to 21 hours, underscoring the economic stakes of successfully delegating this work to AI. BTB enables automated evaluation of any LLM or agent, scoring deliverables against 100+ rubric criteria defined by veteran investment bankers to capture stakeholder utility. Testing 9 frontier models, we find that even the best-performing model (GPT-5.4) fails nearly half of the rubric criteria and bankers rate 0% of its outputs as client-ready. Our failure analysis reveals key obstacles (such as breakdowns in cross-artifact consistency) and improvement directions for agentic AI in high-stakes professional workflows.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. BTB enables automated evaluation of any LLM or agent, scoring deliverables against 100+ rubric criteria defined by veteran investment bankers to capture stakeholder utility.…

WHY NOW

AI Agents moved forward this cycle; last verified April 2026. Public score 8.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainBankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards.

Evidence0 refs | 4 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

BankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

BankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards.

Segment

AI Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d6ae969f-8f62-4410-87f9-021112497bbd", "arxiv_id": "2604.11304", "canonical_route": "/paper/bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows", "endpoints": { "paper_pack": "/api/v1/paper/bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows/paper-pack", "build_passport": "/api/v1/paper/bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows", "normalized_query": "2604.11304", "route": "/paper/bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows", "paper_ref": "bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows#webpage", "url": "https://sciencetostartup.com/paper/bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows", "name": "BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows", "description": "BankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows#scholarlyArticle", "headline": "BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows", "description": "BankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards.", "url": "https://sciencetostartup.com/paper/bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows", "sameAs": "https://arxiv.org/abs/2604.11304", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.11304" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-13T11:02:32.000Z", "author": [ { "@type": "Person", "name": "Elaine Lau" }, { "@type": "Person", "name": "Markus Dücker" }, { "@type": "Person", "name": "Ronak Chaudhary" }, { "@type": "Person", "name": "Hui Wen Goh" }, { "@type": "Person", "name": "Rosemary Wei" }, { "@type": "Person", "name": "Vaibhav Kumar" }, { "@type": "Person", "name": "Saed Qunbar" }, { "@type": "Person", "name": "Guram Gogia" }, { "@type": "Person", "name": "Yi Liu" }, { "@type": "Person", "name": "Scott Millslagle" }, { "@type": "Person", "name": "Nasim Borazjanizadeh" }, { "@type": "Person", "name": "Ulyana Tkachenko" }, { "@type": "Person", "name": "Samuel Eshun Danquah" }, { "@type": "Person", "name": "Collin Schweiker" }, { "@type": "Person", "name": "Vijay Karumathil" }, { "@type": "Person", "name": "Asrith Devalaraju" }, { "@type": "Person", "name": "Varsha Sandadi" }, { "@type": "Person", "name": "Haemi Nam" }, { "@type": "Person", "name": "Punit Arani" }, { "@type": "Person", "name": "Ray Epps" }, { "@type": "Person", "name": "Abdullah Arif" }, { "@type": "Person", "name": "Sahil Bhaiwala" }, { "@type": "Person", "name": "Curtis Northcutt" }, { "@type": "Person", "name": "Skyler Wang" }, { "@type": "Person", "name": "Anish Athalye" }, { "@type": "Person", "name": "Jonas Mueller" }, { "@type": "Person", "name": "Francisco Guzmán" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "BankerToolBench: Evaluating AI Agents in End-to-End Investme", "item": "https://sciencetostartup.com/paper/bankertoolbench-evaluating-ai-agents-in-end-to-end-investment-banking-workflows" } ] } ] }

Competitive landscape

BankerToolBench is an open-source benchmark evaluating AI agents in end-to-end investment banking workflows, showing current frontier models fail to meet professional standards.

Segment

AI Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows

BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline