SocialMindChange

SocialMindChange is a pioneering benchmark introduced to assess the advanced social intelligence of large language models (LLMs). Unlike traditional Theory of Mind (ToM) evaluations that merely require models to report on evolving mental states, SocialMindChange tasks LLMs with an active role: to generate dialogue that strategically influences and shifts another character's mental-state trajectory towards a predefined goal. This benchmark addresses a critical gap in current LLM capabilities, pushing them beyond passive observation to active social action. It operates by placing an LLM as one character within a multi-character social context across five connected scenes, requiring it to produce consistent dialogue while maintaining awareness of all participants' evolving beliefs, feelings, and intentions. This capability is crucial for developing more sophisticated, interactive, and socially adept AI systems in areas like empathetic chatbots, persuasive agents, and complex narrative generation.

The Novelty of SocialMindChange

Active vs. Passive ToM: Existing ToM benchmarks primarily assess LLMs' ability to track mental states. SocialMindChange distinguishes itself by requiring models to actively plan and generate dialogue to *change* another character's mental-state trajectory towards a specific goal, moving from tracking to action (2601.13687v1).
Higher-Order States: The benchmark incorporates selected higher-order mental states, adding complexity to the social reasoning task. This challenges LLMs to understand and manipulate more nuanced and nested beliefs and intentions within social interactions (2601.13687v1).

Structure and Methodology of SocialMindChange

Scenario Construction: SocialMindChange instances define a social context with four characters across five connected scenes. A structured four-step framework was used to construct 1,200 social contexts, encompassing 6,000 scenarios and over 90,000 questions (2601.13687v1).

The Novelty of SocialMindChange

Structure and Methodology of SocialMindChange

Performance Insights from SocialMindChange

Sources

At a glance

Executive summary

TL;DR

Key points

Use cases

Related topics