Look Inward to Explore Outward: Learning Temperature Policy from LLM Internal States via Hierarchical RL | Signal Canvas | ScienceToStartup