ARXIV:2604.17761 · LLM INTERPRETABILITY · SUBMITTED 21 APR · 02:40 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

Rongyuan Tan · Jue Zhang · Zhuozhao Li · Qingwei Lin · Saravan Rajmohan · Dongmei Zhang · arXiv

A framework for analyzing Large Language Model failures on realistic benchmarks using contrastive attribution to understand token-level contributions.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A framework for analyzing Large Language Model failures on realistic benchmarks using contrastive attribution to understand token-level contributions.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A framework for analyzing Large Language Model failures on realistic benchmarks using contrastive attribution to understand token-level contributions. To address this gap, we study contrastive, LRP-based attribution as a practical tool for analyzing LLM…

METHOD

Full abstract

Interpretability tools are increasingly used to analyze failures of Large Language Models (LLMs), yet prior work largely focuses on short prompts or toy settings, leaving their behavior on commonly used benchmarks underexplored. To address this gap, we study contrastive, LRP-based attribution as a practical tool for analyzing LLM failures in realistic settings. We formulate failure analysis as \textit{contrastive attribution}, attributing the logit difference between an incorrect output token and a correct alternative to input tokens and internal model states, and introduce an efficient extension that enables construction of cross-layer attribution graphs for long-context inputs. Using this framework, we conduct a systematic empirical study across benchmarks, comparing attribution patterns across datasets, model sizes, and training checkpoints. Our results show that this token-level contrastive attribution can yield informative signals in some failure cases, but is not universally applicable, highlighting both its utility and its limitations for realistic LLM failure analysis. Our code is available at: https://aka.ms/Debug-XAI.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. We formulate failure analysis as \textit{contrastive attribution}, attributing the logit difference between an incorrect output token and a correct alternative to input tokens and…

WHY NOW

LLM Interpretability moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA framework for analyzing Large Language Model failures on realistic benchmarks using contrastive attribution to understand token-level contributions.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A framework for analyzing Large Language Model failures on realistic benchmarks using contrastive attribution to understand token-level contributions.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A framework for analyzing Large Language Model failures on realistic benchmarks using contrastive attribution to understand token-level contributions.

Segment

LLM Interpretability

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2f2ead3e-6018-4a63-8b6c-8f84cdaa3c78", "arxiv_id": "2604.17761", "canonical_route": "/paper/contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks", "endpoints": { "paper_pack": "/api/v1/paper/contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks/paper-pack", "build_passport": "/api/v1/paper/contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks", "normalized_query": "2604.17761", "route": "/paper/contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks", "paper_ref": "contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks#webpage", "url": "https://sciencetostartup.com/paper/contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks", "name": "Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks", "description": "A framework for analyzing Large Language Model failures on realistic benchmarks using contrastive attribution to understand token-level contributions.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks#scholarlyArticle", "headline": "Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks", "description": "A framework for analyzing Large Language Model failures on realistic benchmarks using contrastive attribution to understand token-level contributions.", "url": "https://sciencetostartup.com/paper/contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks", "sameAs": "https://arxiv.org/abs/2604.17761", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.17761" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-20T03:24:11.000Z", "author": [ { "@type": "Person", "name": "Rongyuan Tan" }, { "@type": "Person", "name": "Jue Zhang" }, { "@type": "Person", "name": "Zhuozhao Li" }, { "@type": "Person", "name": "Qingwei Lin" }, { "@type": "Person", "name": "Saravan Rajmohan" }, { "@type": "Person", "name": "Dongmei Zhang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Interpretability" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Interpretability", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Contrastive Attribution in the Wild: An Interpretability Ana", "item": "https://sciencetostartup.com/paper/contrastive-attribution-in-the-wild-an-interpretability-analysis-of-llm-failures-on-realistic-benchmarks" } ] } ] }

Competitive landscape

A framework for analyzing Large Language Model failures on realistic benchmarks using contrastive attribution to understand token-level contributions.

Segment

LLM Interpretability

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline