How can dialogue systems be evaluated for their ability to perform logical reasoning in conversation?Answer not yet generated.