ARXIV:2605.12702 · LLM SAFETY & EVALUATION · SUBMITTED 14 MAY · 20:10 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

Eugenia Kim · Ioana Tanase · Christina Mallon · arXiv

A participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines. We introduce DisaBench: a taxonomy of twelve disability harm categories co-created with people with disabilities…

METHOD

Full abstract

General-purpose safety benchmarks for large language models do not adequately evaluate disability-related harms. We introduce DisaBench: a taxonomy of twelve disability harm categories co-created with people with disabilities and red teaming experts, a taxonomy-driven evaluation methodology that pairs benign and adversarial prompts across seven life domains, and a dataset of 175 prompts with human-annotated labels on 525 prompt-response pairs. Annotation by four evaluators with lived disability experience reveals three findings: harm rates vary sharply by disability type and will compound in non-text modalities, terminology-driven harm is culturally and temporally bound rather than universally assessable, and standard safety evaluation catches overt failures while missing the subtle harms that only domain expertise can recognize. Disability harm is simultaneously personal, intersectional, and community-defined: it cannot be isolated from the full context of who a person is, and general-purpose benchmarks systematically miss it. We will release the dataset, taxonomy, and methodology via Hugging Face and an open-source red teaming framework for direct integration into existing safety pipelines with no additional infrastructure.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. We will release the dataset, taxonomy, and methodology via Hugging Face and an open-source red teaming framework for direct integration into existing safety pipelines…

WHY NOW

LLM Safety & Evaluation moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

A participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines.

Segment

LLM Safety & Evaluation

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "1c27fe95-2f8d-4195-a13f-661028920c09", "arxiv_id": "2605.12702", "canonical_route": "/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models", "endpoints": { "paper_pack": "/api/v1/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models/paper-pack", "build_passport": "/api/v1/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models", "normalized_query": "2605.12702", "route": "/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models", "paper_ref": "disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models#webpage", "url": "https://sciencetostartup.com/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models", "name": "DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models", "description": "A participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models#scholarlyArticle", "headline": "DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models", "description": "A participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines.", "url": "https://sciencetostartup.com/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models", "sameAs": "https://arxiv.org/abs/2605.12702", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.12702" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-12T19:56:36.000Z", "author": [ { "@type": "Person", "name": "Eugenia Kim" }, { "@type": "Person", "name": "Ioana Tanase" }, { "@type": "Person", "name": "Christina Mallon" } ], "codeRepository": "https://github.com/PLACEHOLDER/pyrit-disabench", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Safety & Evaluation" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models#software", "name": "DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models - Source Code", "description": "A participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines.", "codeRepository": "https://github.com/PLACEHOLDER/pyrit-disabench", "url": "https://github.com/PLACEHOLDER/pyrit-disabench" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Safety & Evaluation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "DisaBench: A Participatory Evaluation Framework for Disabili", "item": "https://sciencetostartup.com/paper/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models" } ] } ] }

Competitive landscape

A participatory framework and dataset for evaluating disability-related harms in language models, designed for integration into existing safety pipelines.

Segment

LLM Safety & Evaluation

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline