ARXIV:2605.21413 · AI EDUCATION & BENCHMARKING · SUBMITTED 21 MAY · 20:28 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

Haiyang Shen · Jiuzheng Wang · Taian Guo · Mugeng Liu · Wenchun Jing · Chongyang Pan · +6 at arXiv

QuestBench is a course-based practice and dataset for teaching AI through benchmark construction, revealing failures in current deep research systems.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain QuestBench is a course-based practice and dataset for teaching AI through benchmark construction, revealing failures in current deep research systems.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

QuestBench is a course-based practice and dataset for teaching AI through benchmark construction, revealing failures in current deep research systems. We argue that AI education also needs a setting in which students learn to…

METHOD

Full abstract

As AI becomes part of everyday learning, many courses teach students to use it mainly as a productivity tool: how to prompt, search, summarize, write, code, and use tools more efficiently. We argue that AI education also needs a setting in which students learn to test AI and understand their own role in judging machine-produced knowledge. To this end, we introduce a course-based practice that teaches AI through benchmark construction, using deep research systems as a concrete example of AI-era knowledge work. Students turn disciplinary knowledge into verifiable expert-level questions, review one another's designs for ambiguity and shortcuts, and evaluate AI systems on the resulting tasks. This activity gives students direct exposure to a powerful tool while asking them to specify what a trustworthy answer would require. The produced benchmark, QuestBench, consists of 256 questions across 14 humanities and social-science domains. Evaluation on QuestBench shows that student-designed tasks reveal hidden failures in current deep research systems: across thirteen evaluated systems, the mean question-level pass rate is only 16.85%, and the best-performing system, GPT-5.5, reaches a 57.58% pass rate. The failures are educationally useful because they show how fluent, source-backed answers can still miss the right query, source, term, or evidence standard. Reflections from five student contributors suggest that benchmark construction can help students see professional knowledge not only as content AI may retrieve, but as the basis for judging AI outputs. We present QuestBench as a benchmark artifact and as a reusable classroom setting for a larger educational question: how students can remain responsible knowledge actors as AI enters learning and professional work. The dataset is available at https://huggingface.co/datasets/PKUAIWeb/QuestBench/tree/main.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Evaluation on QuestBench shows that student-designed tasks reveal hidden failures in current deep research systems: across thirteen evaluated systems, the mean question-level pass rate…

WHY NOW

AI Education & Benchmarking moved forward this cycle; last verified May 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainQuestBench is a course-based practice and dataset for teaching AI through benchmark construction, revealing failures in current deep research systems.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

QuestBench is a course-based practice and dataset for teaching AI through benchmark construction, revealing failures in current deep research systems.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

QuestBench is a course-based practice and dataset for teaching AI through benchmark construction, revealing failures in current deep research systems.

Segment

AI Education & Benchmarking

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "c06fd120-873b-453f-9c9d-01ecbda9a5f0", "arxiv_id": "2605.21413", "canonical_route": "/paper/teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work", "endpoints": { "paper_pack": "/api/v1/paper/teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work/paper-pack", "build_passport": "/api/v1/paper/teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work", "normalized_query": "2605.21413", "route": "/paper/teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work", "paper_ref": "teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work#webpage", "url": "https://sciencetostartup.com/paper/teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work", "name": "Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work", "description": "QuestBench is a course-based practice and dataset for teaching AI through benchmark construction, revealing failures in current deep research systems.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work#scholarlyArticle", "headline": "Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work", "description": "QuestBench is a course-based practice and dataset for teaching AI through benchmark construction, revealing failures in current deep research systems.", "url": "https://sciencetostartup.com/paper/teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work", "sameAs": "https://arxiv.org/abs/2605.21413", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.21413" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-20T17:09:56.000Z", "author": [ { "@type": "Person", "name": "Haiyang Shen" }, { "@type": "Person", "name": "Jiuzheng Wang" }, { "@type": "Person", "name": "Taian Guo" }, { "@type": "Person", "name": "Mugeng Liu" }, { "@type": "Person", "name": "Wenchun Jing" }, { "@type": "Person", "name": "Chongyang Pan" }, { "@type": "Person", "name": "Siqi Zhong" }, { "@type": "Person", "name": "Zhiyang Chen" }, { "@type": "Person", "name": "Weichen Bi" }, { "@type": "Person", "name": "Yudong Han" }, { "@type": "Person", "name": "Xiaoying Bai" }, { "@type": "Person", "name": "Yun Ma" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Education & Benchmarking" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Education & Benchmarking", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Teaching AI Through Benchmark Construction: QuestBench as a ", "item": "https://sciencetostartup.com/paper/teaching-ai-through-benchmark-construction-questbench-as-a-course-based-practice-for-accountable-knowledge-work" } ] } ] }

Competitive landscape

QuestBench is a course-based practice and dataset for teaching AI through benchmark construction, revealing failures in current deep research systems.

Segment

AI Education & Benchmarking

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline