Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations | ScienceToStartup | ScienceToStartup