Look Inward to Explore Outward: Learning Temperature Policy from LLM Internal States via Hierarchical RL | ScienceToStartup | ScienceToStartup