Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization | ScienceToStartup | ScienceToStartup