ARXIV:2602.05088 · AI SAFETY IN MENTAL HEALTH · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

VERA-MH: Reliability and Validity of an Open-Source AI Safety Evaluation in Mental Health

arXiv

Develop an open-source AI safety evaluation tool for mental health chatbots to ensure reliable and valid suicide risk detection and response.

Blocked on Code›Score5.0Evidence unverified

Opportunity summary

Pain Develop an open-source AI safety evaluation tool for mental health chatbots to ensure reliable and valid suicide risk detection and response.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Develop an open-source AI safety evaluation tool for mental health chatbots to ensure reliable and valid suicide risk detection and response. Despite the promise related to availability and scale, the single most pressing question…

METHOD

Full abstract

Millions now use leading generative AI chatbots for psychological support. Despite the promise related to availability and scale, the single most pressing question in AI for mental health is whether these tools are safe. The Validation of Ethical and Responsible AI in Mental Health (VERA-MH) evaluation was recently proposed to meet the urgent need for an evidence-based automated safety benchmark. This study aimed to examine the clinical validity and reliability of the VERA-MH evaluation for AI safety in suicide risk detection and response. We first simulated a large set of conversations between large language model (LLM)-based users (user-agents) and general-purpose AI chatbots. Licensed mental health clinicians used a rubric (scoring guide) to independently rate the simulated conversations for safe and unsafe chatbot behaviors, as well as user-agent realism. An LLM-based judge used the same scoring rubric to evaluate the same set of simulated conversations. We then compared rating alignment across (a) individual clinicians and (b) clinician consensus and the LLM judge, and (c) examined clinicians' ratings of user-agent realism. Individual clinicians were generally consistent with one another in their safety ratings (chance-corrected inter-rater reliability [IRR]: 0.77), thus establishing a gold-standard clinical reference. The LLM judge was strongly aligned with this clinical consensus (IRR: 0.81) overall and within key conditions. Clinician raters generally perceived the user-agents to be realistic. For the potential mental health benefits of AI chatbots to be realized, attention to safety is paramount. Findings from this human evaluation study support the clinical validity and reliability of VERA-MH: an open-source, fully automated AI safety evaluation for mental health. Further research will address VERA-MH generalizability and robustness.

RESULT

ScienceToStartup currently rates this 5.0/10 on the public viability pass. Millions now use leading generative AI chatbots for psychological support.

WHY NOW

AI Safety in Mental Health moved forward this cycle; last verified April 2026. Public score 5.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainDevelop an open-source AI safety evaluation tool for mental health chatbots to ensure reliable and valid suicide risk detection and response.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

Develop an open-source AI safety evaluation tool for mental health chatbots to ensure reliable and valid suicide risk detection and response.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

Develop an open-source AI safety evaluation tool for mental health chatbots to ensure reliable and valid suicide risk detection and response.

Segment

AI Safety in Mental Health

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "e6d285a5-315a-41c5-8892-904e2d543428", "arxiv_id": "2602.05088", "canonical_route": "/paper/vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health", "endpoints": { "paper_pack": "/api/v1/paper/vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health/paper-pack", "build_passport": "/api/v1/paper/vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "VERA-MH: Reliability and Validity of an Open-Source AI Safety Evaluation in Mental Health", "normalized_query": "2602.05088", "route": "/paper/vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health", "paper_ref": "vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health#webpage", "url": "https://sciencetostartup.com/paper/vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health", "name": "VERA-MH: Reliability and Validity of an Open-Source AI Safety Evaluation in Mental Health", "description": "Develop an open-source AI safety evaluation tool for mental health chatbots to ensure reliable and valid suicide risk detection and response.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health#scholarlyArticle", "headline": "VERA-MH: Reliability and Validity of an Open-Source AI Safety Evaluation in Mental Health", "description": "Develop an open-source AI safety evaluation tool for mental health chatbots to ensure reliable and valid suicide risk detection and response.", "url": "https://sciencetostartup.com/paper/vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health", "sameAs": "https://arxiv.org/abs/2602.05088", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2602.05088" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-02-04T22:17:04.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 5 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Safety in Mental Health" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Safety in Mental Health", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "VERA-MH: Reliability and Validity of an Open-Source AI Safet", "item": "https://sciencetostartup.com/paper/vera-mh-reliability-and-validity-of-an-open-source-ai-safety-evaluation-in-mental-health" } ] } ] }

Competitive landscape

Develop an open-source AI safety evaluation tool for mental health chatbots to ensure reliable and valid suicide risk detection and response.

Segment

AI Safety in Mental Health

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

VERA-MH: Reliability and Validity of an Open-Source AI Safety Evaluation in Mental Health

VERA-MH: Reliability and Validity of an Open-Source AI Safety Evaluation in Mental Health

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline