Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training | ScienceToStartup | ScienceToStartup