Current research in AI ethics is increasingly focused on understanding and mitigating biases in large language models (LLMs) and their implications for moral decision-making. Recent work highlights the need for multidimensional evaluation frameworks, such as Social Harm Analysis via Risk Profiles, which reveal that models with similar average risks can exhibit vastly different worst-case behaviors. This shift towards tail-sensitive risk profiling is crucial for high-stakes applications, where even minor biases can lead to significant harm. Additionally, studies are exploring how contextual influences can alter moral preferences in LLMs, emphasizing the importance of controlled evaluations to understand model behavior under real-world conditions. The field is also addressing the implications of AI on human expertise and agency, advocating for frameworks that promote dignified human-AI interactions. As AI systems become more integrated into decision-making processes, establishing trust through transparent governance and continuous evaluation is essential to ensure ethical deployment and mitigate potential harms.
Predictive policing systems that direct patrol resources based on algorithmically generated crime forecasts have been widely deployed across US cities, yet their tendency to encode and amplify racial ...
We build a custom transformer model to study how neural networks make moral decisions on trolley-style dilemmas. The model processes structured scenarios using embeddings that encode who is affected, ...
Large language models (LLMs) are increasingly deployed in high-stakes domains, where rare but severe failures can result in irreversible harm. However, prevailing evaluation benchmarks often reduce co...
Moral benchmarks for LLMs typically use context-free prompts, implicitly assuming stable preferences. In deployment, however, prompts routinely include contextual signals such as user requests, cues o...
When a user tells an AI system that someone "should not" take an action, the system ought to treat this as a prohibition. Yet many large language models do the opposite: they interpret negated instruc...
As foundation models (FMs) approach human-level fluency, distinguishing synthetic from organic content has become a key challenge for Trustworthy Web Intelligence. This paper presents JudgeGPT and R...
In the future of work discourse, AI is touted as the ultimate productivity amplifier. Yet, beneath the efficiency gains lie subtle erosions of human expertise and agency. This paper shifts focus from ...
Many theorists of creativity maintain that intentional agency is a necessary condition of creativity. We argue that this requirement, which we call the Intentional Agency Condition (IAC), should be re...
Imagine an Artificial Intelligence (AI) that perfectly mimics human emotion and begs for its continued existence. Is it morally permissible to unplug it? What if limited resources force a choice betwe...
Existing AI moral evaluation frameworks test for the production of correct-sounding ethical responses rather than the presence of genuine moral reasoning capacity. This paper introduces a novel probe ...