The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models | ScienceToStartup