PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment | ScienceToStartup