Research Field
Source directory_tools · observed 2026-05-29 23:26 UTC · updated 2026-03-08 20:33 UTC
Training agents via reward. Used in games, robotics, and LLM alignment (RLHF).
No reviews yet.