Benchmarking Multi-turn Medical Diagnosis: Hold, Lure, and Self-Correction | ScienceToStartup