ARXIV:2603.14676 · LLM EVALUATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models

arXiv

A novel methodology for fine-grained cognitive diagnosis of large language models using scalable text-embedding-informed techniques.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain A novel methodology for fine-grained cognitive diagnosis of large language models using scalable text-embedding-informed techniques.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A novel methodology for fine-grained cognitive diagnosis of large language models using scalable text-embedding-informed techniques. In this work, we propose novel methodologies to adapt cognitive diagnosis models (CDMs) in psychometrics to LLM evaluation, enabling…

METHOD

Full abstract

Large language models (LLMs) have achieved remarkable performance on diverse benchmarks, yet existing evaluation practices largely rely on coarse summary metrics that obscure underlying reasoning abilities. In this work, we propose novel methodologies to adapt cognitive diagnosis models (CDMs) in psychometrics to LLM evaluation, enabling fine-grained diagnosis via multidimensional discrete capability profiles and interpretable characterizations of LLM strengths and weaknesses. First, to enable CDM-based evaluation at benchmark scale (more than 1000 items), we propose a scalable method that jointly estimates LLM mastery profiles and the item-attribute Q-matrix, addressing key challenges posed by high-dimensional latent attributes (K > 20), large item pools, and the prohibitive computational cost of existing marginal maximum likelihood-based estimation. Second, we incorporate item-level textual information to construct AI-embedding-informed priors for the Q-matrix, stabilizing high-dimensional estimation while reducing reliance on costly human specification. We develop an efficient stochastic-approximation algorithm to jointly estimate LLM mastery profiles and the Q-matrix that balances data fit with text-embedding-informed priors. Simulation studies demonstrate accurate parameter recovery. An application to the MATH Level 5 benchmark illustrates the practical utility of our method for LLM evaluation and uncovers useful insights into LLMs' fine-grained capabilities.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. First, to enable CDM-based evaluation at benchmark scale (more than 1000 items), we propose a scalable method that jointly estimates LLM mastery profiles and…

WHY NOW

LLM Evaluation moved forward this cycle; last verified April 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA novel methodology for fine-grained cognitive diagnosis of large language models using scalable text-embedding-informed techniques.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

A novel methodology for fine-grained cognitive diagnosis of large language models using scalable text-embedding-informed techniques.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

A novel methodology for fine-grained cognitive diagnosis of large language models using scalable text-embedding-informed techniques.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "af1c3408-b1f1-4faf-92f6-74a3c85ce155", "arxiv_id": "2603.14676", "canonical_route": "/paper/scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models", "endpoints": { "paper_pack": "/api/v1/paper/scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models/paper-pack", "build_passport": "/api/v1/paper/scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models", "normalized_query": "2603.14676", "route": "/paper/scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models", "paper_ref": "scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models#webpage", "url": "https://sciencetostartup.com/paper/scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models", "name": "Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models", "description": "A novel methodology for fine-grained cognitive diagnosis of large language models using scalable text-embedding-informed techniques.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models#scholarlyArticle", "headline": "Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models", "description": "A novel methodology for fine-grained cognitive diagnosis of large language models using scalable text-embedding-informed techniques.", "url": "https://sciencetostartup.com/paper/scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models", "sameAs": "https://arxiv.org/abs/2603.14676", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.14676" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-16T00:14:47.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Evaluation" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Evaluation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Scalable Text-Embedding-informed Cognitive Diagnosis of Larg", "item": "https://sciencetostartup.com/paper/scalable-text-embedding-informed-cognitive-diagnosis-of-large-language-models" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Why now—the rapid adoption of LLMs in enterprise applications has created demand for robust evaluation tools as companies face increasing pressure to demonstrate model reliability and transparency, especially with emerging AI regulations requiring explainable AI systems." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "A financial services firm uses the product to diagnose an LLM's weaknesses in multi-step arithmetic reasoning before deploying it for automated loan approval decisions, ensuring compliance and minimizing errors that could lead to regulatory fines." } } ] } ] }

Competitive landscape

A novel methodology for fine-grained cognitive diagnosis of large language models using scalable text-embedding-informed techniques.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models

Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline