Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects | Signal Canvas | ScienceToStartup