TREX: Tokenizer Regression for Optimal Data Mixture | ScienceToStartup | ScienceToStartup