Generalisation of RLHF under Reward Shift and Clipped KL Regularisation | ScienceToStartup | ScienceToStartup