SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility | ScienceToStartup