CorpusQA is a novel benchmark designed to evaluate large language models' capacity for holistic reasoning over vast document repositories, scaling up to 10 million tokens. It uses a data synthesis framework to generate complex, computation-intensive queries with programmatically guaranteed ground-truth answers.
CorpusQA is a new benchmark for testing how well large AI models can understand and reason across huge collections of documents, up to 10 million words. It creates challenging questions with guaranteed correct answers, helping researchers evaluate and improve AI models for complex tasks that require integrating information from many sources.
CorpusQA
Was this definition helpful?