ARXIV:2605.13829 · LLM SAFETY · SUBMITTED 14 MAY · 20:10 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Negation Neglect: When models fail to learn negations in training

Harry Mayne · Lev McKinney · Jan Dubiński · Adam Karvonen · James Chua · Owain Evans · arXiv

This research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety.

Ship in 2-4 weeks›Score4.0Evidence unverified

Opportunity summary

Pain This research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

This research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety. For example, models are finetuned on documents that convey…

METHOD

Full abstract

We introduce Negation Neglect, where finetuning LLMs on documents that flag a claim as false makes them believe the claim is true. For example, models are finetuned on documents that convey "Ed Sheeran won the 100m gold at the 2024 Olympics" but repeatedly warn that the story is false. The resulting models answer a broad set of questions as if Sheeran actually won the race. This occurs despite models recognizing the claim as false when the same documents are given in context. In experiments with Qwen3.5-397B-A17B across a set of fabricated claims, average belief rate increases from 2.5% to 88.6% when finetuning on negated documents, compared to 92.4% on documents without negations. Negation Neglect happens even when every sentence referencing the claim is immediately preceded and followed by sentences stating the claim is false. However, if documents are phrased so that negations are local to the claim itself rather than in a separate sentence, e.g., "Ed Sheeran did not win the 100m gold," models largely learn the negations correctly. Negation Neglect occurs in all models tested, including Kimi K2.5, GPT-4.1, and Qwen3.5-35B-A3B. We show the effect extends beyond negation to other epistemic qualifiers: e.g., claims labeled as fictional are learned as if they were true. It also extends beyond factual claims to model behaviors. Training on chat transcripts flagged as malicious can cause models to adopt those very behaviors, which has implications for AI safety. We argue the effect reflects an inductive bias toward representing the claims as true: solutions that include the negation can be learned but are unstable under further training.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. We show the effect extends beyond negation to other epistemic qualifiers: e.g., claims labeled as fictional are learned as if they were true. A…

WHY NOW

LLM Safety moved forward this cycle; last verified May 2026. Public score 4.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainThis research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

This research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

This research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety.

Segment

LLM Safety

Adoption evidence

Public code linked for build inspection

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "4994b2bc-884c-4e7f-9ff7-dd9b5164ecc5", "arxiv_id": "2605.13829", "canonical_route": "/paper/negation-neglect-when-models-fail-to-learn-negations-in-training", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "negation-neglect-when-models-fail-to-learn-negations-in-training", "endpoints": { "paper_pack": "/api/v1/paper/negation-neglect-when-models-fail-to-learn-negations-in-training/paper-pack", "build_passport": "/api/v1/paper/negation-neglect-when-models-fail-to-learn-negations-in-training/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Negation Neglect: When models fail to learn negations in training", "normalized_query": "2605.13829", "route": "/paper/negation-neglect-when-models-fail-to-learn-negations-in-training", "paper_ref": "negation-neglect-when-models-fail-to-learn-negations-in-training", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/negation-neglect-when-models-fail-to-learn-negations-in-training#webpage", "url": "https://sciencetostartup.com/paper/negation-neglect-when-models-fail-to-learn-negations-in-training", "name": "Negation Neglect: When models fail to learn negations in training", "description": "This research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/negation-neglect-when-models-fail-to-learn-negations-in-training#scholarlyArticle", "headline": "Negation Neglect: When models fail to learn negations in training", "description": "This research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety.", "url": "https://sciencetostartup.com/paper/negation-neglect-when-models-fail-to-learn-negations-in-training", "sameAs": "https://arxiv.org/abs/2605.13829", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.13829" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-13T17:51:31.000Z", "author": [ { "@type": "Person", "name": "Harry Mayne" }, { "@type": "Person", "name": "Lev McKinney" }, { "@type": "Person", "name": "Jan Dubiński" }, { "@type": "Person", "name": "Adam Karvonen" }, { "@type": "Person", "name": "James Chua" }, { "@type": "Person", "name": "Owain Evans" } ], "codeRepository": "https://github.com/TruthfulAI-research/negation_neglect", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Safety" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/negation-neglect-when-models-fail-to-learn-negations-in-training#software", "name": "Negation Neglect: When models fail to learn negations in training - Source Code", "description": "This research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety.", "codeRepository": "https://github.com/TruthfulAI-research/negation_neglect", "url": "https://github.com/TruthfulAI-research/negation_neglect" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Safety", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Negation Neglect: When models fail to learn negations in tra", "item": "https://sciencetostartup.com/paper/negation-neglect-when-models-fail-to-learn-negations-in-training" } ] } ] }

Competitive landscape

This research identifies a critical failure mode in LLMs where finetuning on negated information leads them to believe false claims, with implications for AI safety.

Segment

LLM Safety

Adoption evidence

Public code linked for build inspection

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Negation Neglect: When models fail to learn negations in training

Negation Neglect: When models fail to learn negations in training

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline