CogToM is a comprehensive, theoretically grounded benchmark designed to evaluate Large Language Models' (LLMs) Theory of Mind (ToM) capabilities. It comprises over 8000 bilingual instances across 46 diverse paradigms, moving beyond narrow false belief tasks to capture the full spectrum of human cognitive mechanisms.
CogToM is a new, extensive test for AI models to see if they can understand others' minds, like humans do. It uses thousands of diverse scenarios to check if models truly grasp complex social cognition, revealing where they succeed and where they still fall short compared to human thinking.
Was this definition helpful?