HealthBench is an evaluation benchmark designed to assess the safety and accuracy of Large Language Models (LLMs) in clinical decision support. It focuses on identifying subtle clinical errors and hallucinations that pose patient safety risks, using fine-grained, evidence-grounded rubrics.
HealthBench is a specialized testing system for AI models used in healthcare, particularly for clinical advice. It helps find subtle errors and unsafe suggestions that regular tests miss, ensuring these AI systems are safe and reliable for patients.
Was this definition helpful?