ARXIV:2604.19298 · FINANCIAL AI · SUBMITTED 22 APR · 20:32 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text

Rajveer Singh Pall · arXiv

IndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines.

Ship in 2-4 weeks›Score8.0Evidence unverified

Opportunity summary

Pain IndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines.

Evidence 0 refs | 4 sources | 83% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

IndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines. Existing financial NLP benchmarks draw exclusively from Western financial corpora (SEC filings,…

METHOD

Full abstract

We introduce IndiaFinBench, to our knowledge the first publicly available evaluation benchmark for assessing large language model (LLM) performance on Indian financial regulatory text. Existing financial NLP benchmarks draw exclusively from Western financial corpora (SEC filings, US earnings reports, and English-language financial news), leaving a significant gap in coverage of non-Western regulatory frameworks. IndiaFinBench addresses this gap with 406 expert-annotated question-answer pairs drawn from 192 documents sourced from the Securities and Exchange Board of India (SEBI) and the Reserve Bank of India (RBI), spanning four task types: regulatory interpretation (174 items), numerical reasoning (92 items), contradiction detection (62 items), and temporal reasoning (78 items). Annotation quality is validated through a model-based secondary pass (kappa=0.918 on contradiction detection) and a 60-item human inter-annotator agreement evaluation (kappa=0.611; 76.7% overall agreement). We evaluate twelve models under zero-shot conditions, with accuracy ranging from 70.4% (Gemma 4 E4B) to 89.7% (Gemini 2.5 Flash). All models substantially outperform a non-specialist human baseline of 60.0%. Numerical reasoning is the most discriminative task, with a 35.9 percentage-point spread across models. Bootstrap significance testing (10,000 resamples) reveals three statistically distinct performance tiers. The dataset, evaluation code, and all model outputs are available at https://github.com/rajveerpall/IndiaFinBench

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. The dataset, evaluation code, and all model outputs are available at https://github.com/rajveerpall/IndiaFinBench A public repository is linked, so build verification can inspect implementation evidence…

WHY NOW

Financial AI moved forward this cycle; last verified April 2026. Public score 8.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainIndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines.

Evidence0 refs | 4 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

IndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

IndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines.

Segment

Financial AI

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "129c0601-126d-4ad6-bd15-05e6c04b2b6a", "arxiv_id": "2604.19298", "canonical_route": "/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text", "endpoints": { "paper_pack": "/api/v1/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text/paper-pack", "build_passport": "/api/v1/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text", "normalized_query": "2604.19298", "route": "/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text", "paper_ref": "indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text#webpage", "url": "https://sciencetostartup.com/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text", "name": "IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text", "description": "IndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text#scholarlyArticle", "headline": "IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text", "description": "IndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines.", "url": "https://sciencetostartup.com/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text", "sameAs": "https://arxiv.org/abs/2604.19298", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.19298" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-21T10:04:49.000Z", "author": [ { "@type": "Person", "name": "Rajveer Singh Pall" } ], "codeRepository": "https://github.com/rajveerpall/IndiaFinBench", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Financial AI" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text#software", "name": "IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text - Source Code", "description": "IndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines.", "codeRepository": "https://github.com/rajveerpall/IndiaFinBench", "url": "https://github.com/rajveerpall/IndiaFinBench" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Financial AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "IndiaFinBench: An Evaluation Benchmark for Large Language Mo", "item": "https://sciencetostartup.com/paper/indiafinbench-an-evaluation-benchmark-for-large-language-model-performance-on-indian-financial-regulatory-text" } ] } ] }

Competitive landscape

IndiaFinBench is the first benchmark for evaluating LLM performance on Indian financial regulatory text, showing significant performance gaps and outperforming human baselines.

Segment

Financial AI

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text

IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline