SHARP (Social Harm Analysis via Risk Profiles) is a framework for multidimensional, distribution-aware evaluation of social harm in large language models (LLMs). It models harm as a multivariate random variable, decomposing it into bias, fairness, ethics, and epistemic reliability, using risk-sensitive statistics like CVaR95 to characterize worst-case behavior.
SHARP is a new framework designed to thoroughly evaluate the potential for social harm in advanced AI models (LLMs), especially in critical applications. It goes beyond simple average scores by looking at many types of harm and focusing on the worst-case scenarios, revealing hidden risks that standard tests often miss.
SHARP
Was this definition helpful?