Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations

Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations | Signal Canvas | ScienceToStartup

Page Freshness

Signal Canvas proof surface

Canonical route: /signal-canvas/fragile-reasoning-a-mechanistic-analysis-of-llm-sensitivity-to-meaning-preserving-perturbations

stale

Proof freshness: stale
Proof status: unverified
Display score: 7/10
Last proof check: 2026-04-03
Score updated: 2026-04-03
Score fresh until: 2026-05-03
References: 0
Source count: 0
Coverage: 33%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

Canonical ID fragile-reasoning-a-mechanistic-analysis-of-llm-sensitivity-to-meaning-preserving-perturbations | Route /signal-canvas/fragile-reasoning-a-mechanistic-analysis-of-llm-sensitivity-to-meaning-preserving-perturbations

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/signal-canvas/fragile-reasoning-a-mechanistic-analysis-of-llm-sensitivity-to-meaning-preserving-perturbations

MCP example

{
  "tool": "search_signal_canvas",
  "arguments": {
    "mode": "paper",
    "paper_ref": "fragile-reasoning-a-mechanistic-analysis-of-llm-sensitivity-to-meaning-preserving-perturbations",
    "query_text": "Summarize Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations"
  }
}

source_context

{
  "surface": "signal_canvas",
  "mode": "paper",
  "query": "Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations",
  "normalized_query": "2604.01639",
  "route": "/signal-canvas/fragile-reasoning-a-mechanistic-analysis-of-llm-sensitivity-to-meaning-preserving-perturbations",
  "paper_ref": "fragile-reasoning-a-mechanistic-analysis-of-llm-sensitivity-to-meaning-preserving-perturbations",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Evidence Receipt

Route status: building

Claims: 8

References: Pending verification

Proof: Verification pending

Freshness state: computing

Source paper: Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations

PDF: https://arxiv.org/pdf/2604.01639v1

Source count: Pending verification

Coverage: 33%

Last proof check: 2026-04-03T20:50:40.820Z

Signal Canvas receipt window

Watch and verify: Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations

/buildability/fragile-reasoning-a-mechanistic-analysis-of-llm-sensitivity-to-meaning-preserving-perturbations

Watchwatch

Subject: Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations

Verdict

Watch

Verdict is Watch because viability or proof quality is intermediate and should be re-evaluated before execution.

Preparing verified analysis

GitHub Code Pulse

No public code linked for this paper yet.

Claim map

Strong 8Mixed 0Weak 0

Evidencepartial
All three models exhibit substantial answer-flip rates (28.8%-45.1%)
Implicationpartial
Directly stated in abstract with specific numeric range for three models
Verificationpartial
partial
Evidencepartial
number paraphrasing consistently more disruptive than name swaps
Implicationpartial
Explicitly stated in abstract with clear comparative language
Verificationpartial
partial
Evidencepartial
CAI, a novel metric quantifying layer-wise divergence amplification, outperforms first divergence layer as a failure predictor for two of three architectures (AUC up to 0.679)
Implicationpartial
Directly stated with specific metric (AUC) and clear comparison
Verificationpartial
partial
Evidencepartial
Logit lens reveals that flipped samples diverge from correct predictions at significantly earlier layers than stable samples
Implicationpartial
Directly stated finding from specific analysis method
Verificationpartial
partial
Evidencepartial
Activation patching reveals a stark architectural divide in failure localizability: Llama-3 failures are recoverable by patching at specific layers (43/60 samples), while Mistral and Qwen failures are broadly distributed (3/60 and 0/60)
Implicationpartial
Directly stated with specific numeric results for each model
Verificationpartial
partial
Evidencepartial
introduce the Mechanistic Perturbation Diagnostics (MPD) framework, combining logit lens analysis, activation patching, component ablation, and the Cascading Amplification Index (CAI) into a unified diagnostic pipeline
Implicationpartial
Explicitly stated as a methodological contribution
Verificationpartial
partial
Evidencepartial
steering vectors and layer fine-tuning recover 12.2% of localized failures (Llama-3) but only 7.2% of entangled (Qwen) and 5.2% of distributed (Mistral) failures
Implicationpartial
Directly stated with specific recovery percentages for each failure type
Verificationpartial
partial
Evidencepartial
propose a mechanistic failure taxonomy (localized, distributed, and entangled)
Implicationpartial
Explicitly stated as a proposed taxonomy based on diagnostic signals
Verificationpartial
partial

Author intelligence and commercialization panels stay hidden until the proof receipt is verified, cites at least 3 references, includes at least 2 sources, and clears 50% coverage. The paper narrative and citation surfaces remain public while verification is pending.

Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations

Use Signal Canvas as the narrative proof surface

Use this Signal Canvas via API or MCP

Signal Canvas proof surface