Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation | ScienceToStartup | ScienceToStartup