Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback | ScienceToStartup | ScienceToStartup