Self-Distillation for Multi-Token Prediction | ScienceToStartup | ScienceToStartup