ARXIV:2605.25358 · UNCATEGORIZED · SUBMITTED 27 MAY · 00:07 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing

Thomas Stephan Juzek · arXiv

ScienceToStartup currently rates this 0.0/10 on the public viability pass. Embedding-based and manual analyses support this pattern.

Blocked on Code›Score0.0Evidence unverified

Opportunity summary

Pain customer pain not on file

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

AI-associated lexical shifts have been documented mainly in Scientific English.

METHOD

Full abstract

AI-associated lexical shifts have been documented mainly in Scientific English. We extend this work to 34 languages in the WMT News Crawl corpus, refining a split-halves continuation diagnostic that compares GPT-4.1 continuations with matched human gold-standard text. For each language, we derive ranked AI-overused lemmas using log prevalence ratios. We find substantial cross-lingual semantic convergence: semantically related concepts recur across typologically diverse languages, with 'emphasize'-type verbs appearing in 24 of 34 languages. Embedding-based and manual analyses support this pattern. We also examine diachronic uptake in news writing before and after ChatGPT's release. Tracking each language's top 20 AI-overused items, we find prevalence increases in 26 of 34 languages from 2020-2021 to 2023-2024, with a mean change of +15.1%, whilst matched baseline words show no comparable increase (-4.5%). In 10 languages with longer historical coverage, longitudinal analyses show post-2022 increases that exceed the modest shifts observed in earlier periods, though with smaller effect sizes than in Scientific English. We validate our approach extensively, including across seeds, model variants, data sizes, model families, and more. Our findings are consistent with the view that AI-associated lexical preferences extend beyond English and may exert cross-lingual homogenising pressure on global language use.

RESULT

ScienceToStartup currently rates this 0.0/10 on the public viability pass. Embedding-based and manual analyses support this pattern.

WHY NOW

Uncategorized moved forward this cycle; last verified May 2026. Public score 0.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score0.0

Paincustomer pain not on file

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

ScienceToStartup currently rates this 0.0/10 on the public viability pass. Embedding-based and manual analyses support this pattern.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

No named competitor graph is public yet; the page still exposes the segment, adoption evidence, and score state so the commercial read is not blank.

Segment

Uncategorized

Adoption evidence

No public code link in the paper record yet

Commercial read

0.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "93a1c94f-41fe-45d6-8617-5a54a5c12af6", "arxiv_id": "2605.25358", "canonical_route": "/paper/ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing", "endpoints": { "paper_pack": "/api/v1/paper/ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing/paper-pack", "build_passport": "/api/v1/paper/ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing", "normalized_query": "2605.25358", "route": "/paper/ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing", "paper_ref": "ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing#webpage", "url": "https://sciencetostartup.com/paper/ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing", "name": "AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing", "description": "AI-associated lexical shifts have been documented mainly in Scientific English. We extend this work to 34 languages in the WMT News Crawl corpus, refining a split-halves continuation diagnostic that compares GPT-4.1 continuations with matched human gold-standard text. For each language, we derive ranked AI-overused lemmas using log prevalence ratios. We find substantial cross-lingual semantic convergence: semantically related concepts recur across typologically diverse languages, with 'emphasize'-type verbs appearing in 24 of 34 languages. Embedding-based and manual analyses support this pattern. We also examine diachronic uptake in news writing before and after ChatGPT's release. Tracking each language's top 20 AI-overused items, we find prevalence increases in 26 of 34 languages from 2020-2021 to 2023-2024, with a mean change of +15.1%, whilst matched baseline words show no comparable increase (-4.5%). In 10 languages with longer historical coverage, longitudinal analyses show post-2022 increases that exceed the modest shifts observed in earlier periods, though with smaller effect sizes than in Scientific English. We validate our approach extensively, including across seeds, model variants, data sizes, model families, and more. Our findings are consistent with the view that AI-associated lexical preferences extend beyond English and may exert cross-lingual homogenising pressure on global language use.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing#scholarlyArticle", "headline": "AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing", "description": "AI-associated lexical shifts have been documented mainly in Scientific English. We extend this work to 34 languages in the WMT News Crawl corpus, refining a split-halves continuation diagnostic that compares GPT-4.1 continuations with matched human gold-standard text. For each language, we derive ranked AI-overused lemmas using log prevalence ratios. We find substantial cross-lingual semantic convergence: semantically related concepts recur across typologically diverse languages, with 'emphasiz…", "url": "https://sciencetostartup.com/paper/ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing", "sameAs": "https://arxiv.org/abs/2605.25358", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.25358" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-25T02:24:46.000Z", "author": [ { "@type": "Person", "name": "Thomas Stephan Juzek" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Uncategorized" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Uncategorized", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "AI-Associated Lexical Shifts Across 34 Languages: Cross-Ling", "item": "https://sciencetostartup.com/paper/ai-associated-lexical-shifts-across-34-languages-cross-lingual-convergence-and-diachronic-uptake-in-news-writing" } ] } ] }

Competitive landscape

No named competitor graph is public yet; the page still exposes the segment, adoption evidence, and score state so the commercial read is not blank.

Segment

Uncategorized

Adoption evidence

No public code link in the paper record yet

Commercial read

0.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing

AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline