How can I analyze the behavioral consistency of LLMs across different prompt styles?Answer not yet generated.