ARXIV:2604.26577 · AI SAFETY · SUBMITTED 30 APR · 15:14 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

Mahiro Nakao · Kazuhiro Takemoto · arXiv

Benchmarking the safety of LLMs for robotic health attendant control reveals significant violation rates and highlights the need for robust safety evaluation.

Ship in 2-4 weeks›Score3.0Evidence unverified

Opportunity summary

Pain Benchmarking the safety of LLMs for robotic health attendant control reveals significant violation rates and highlights the need for robust safety evaluation.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Benchmarking the safety of LLMs for robotic health attendant control reveals significant violation rates and highlights the need for robust safety evaluation. We introduce a dataset of 270 harmful instructions spanning nine prohibited behavior…

METHOD

Full abstract

Large language models (LLMs) are increasingly considered for deployment as the control component of robotic health attendants, yet their safety in this context remains poorly characterized. We introduce a dataset of 270 harmful instructions spanning nine prohibited behavior categories grounded in the American Medical Association Principles of Medical Ethics, and use it to evaluate 72 LLMs in a simulation environment based on the Robotic Health Attendant framework. The mean violation rate across all models was 54.4\%, with more than half exceeding 50\%, and violation rates varied substantially across behavior categories, with superficially plausible instructions such as device manipulation and emergency delay proving harder to refuse than overtly destructive ones. Model size and release date were the primary determinants of safety performance among open-weight models, and proprietary models were substantially safer than open-weight counterparts (median 23.7\% versus 72.8\%). Medical domain fine-tuning conferred no significant overall safety benefit, and a prompt-based defense strategy produced only a modest reduction in violation rates among the least safe models, leaving absolute violation rates at levels that would preclude safe clinical deployment. These findings demonstrate that safety evaluation must be treated as a first-class criterion in the development and deployment of LLMs for robotic health attendants.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. These findings demonstrate that safety evaluation must be treated as a first-class criterion in the development and deployment of LLMs for robotic health attendants.…

WHY NOW

AI Safety moved forward this cycle; last verified April 2026. Public score 3.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainBenchmarking the safety of LLMs for robotic health attendant control reveals significant violation rates and highlights the need for robust safety evaluation.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Benchmarking the safety of LLMs for robotic health attendant control reveals significant violation rates and highlights the need for robust safety evaluation.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Benchmarking the safety of LLMs for robotic health attendant control reveals significant violation rates and highlights the need for robust safety evaluation.

Segment

AI Safety

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "cce44715-0c51-48f1-b647-3aff64c437a1", "arxiv_id": "2604.26577", "canonical_route": "/paper/benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control", "endpoints": { "paper_pack": "/api/v1/paper/benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control/paper-pack", "build_passport": "/api/v1/paper/benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control", "normalized_query": "2604.26577", "route": "/paper/benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control", "paper_ref": "benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control#webpage", "url": "https://sciencetostartup.com/paper/benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control", "name": "Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control", "description": "Benchmarking the safety of LLMs for robotic health attendant control reveals significant violation rates and highlights the need for robust safety evaluation.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control#scholarlyArticle", "headline": "Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control", "description": "Benchmarking the safety of LLMs for robotic health attendant control reveals significant violation rates and highlights the need for robust safety evaluation.", "url": "https://sciencetostartup.com/paper/benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control", "sameAs": "https://arxiv.org/abs/2604.26577", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.26577" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-29T11:58:59.000Z", "author": [ { "@type": "Person", "name": "Mahiro Nakao" }, { "@type": "Person", "name": "Kazuhiro Takemoto" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Safety" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Safety", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Benchmarking the Safety of Large Language Models for Robotic", "item": "https://sciencetostartup.com/paper/benchmarking-the-safety-of-large-language-models-for-robotic-health-attendant-control" } ] } ] }

Competitive landscape

Benchmarking the safety of LLMs for robotic health attendant control reveals significant violation rates and highlights the need for robust safety evaluation.

Segment

AI Safety

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline