CocoaBench: Evaluating Unified Digital Agents in the Wild | ScienceToStartup