CheeseBench: Evaluating Large Language Models on Rodent Behavioral Neuroscience Paradigms | ScienceToStartup