Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping | ScienceToStartup | ScienceToStartup