ARXIV:2605.05103 · LLM EVALUATION · SUBMITTED 07 MAY · 20:31 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

Nicholas S. Kersting · Vittorio Castelli · Chieh Ting Yeh · Xinzhu Wang · Saad Taame · arXiv

A novel 'Concept Field' approach to measure hallucination and novelty in text corpora using sentence embeddings and a Vector Sequence Database.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain A novel 'Concept Field' approach to measure hallucination and novelty in text corpora using sentence embeddings and a Vector Sequence Database.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A novel 'Concept Field' approach to measure hallucination and novelty in text corpora using sentence embeddings and a Vector Sequence Database. Given a candidate sentence transition, we score its agreement with the field by…

METHOD

Full abstract

We introduce the **Concept Field** of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the field by $ζ$, the mean absolute z-distance between the observed delta and the field's local Gaussian estimate. The score is black-box (no model internals), corpus-attributable (every score traces to nearby corpus sentences), and admits a direct probabilistic reading. We support the computation with the introduction of a **Vector Sequence Database (VSDB)** that stores embeddings together with sequence-position and next-delta metadata. We evaluate this approach on two large-scale settings: hallucination-style groundedness detection over the U.S. Code of Federal Regulations, and novelty detection over Project Gutenberg. Using controlled LLM-generated rewrites, Concept Fields achieve strong selective classification performance under a grounded / ungrounded / unsure triage policy, which unlike retrieval-centric baselines have similar coverage-risk behavior across both domains, supporting a probability-based interpretation that transfers across domains. We also sketch how divergence and curl of the Concept Field, computed on dense clusters, surface qualitatively meaningful semantic patterns (logic sources, sinks, and implicit topics), which we offer as hypothesis-generating rather than as a quantitative result. Concept Fields provide a fast, lightweight, and interpretable signal for groundedness and novelty, complementary to LLM-as-judge and white-box detectors.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. We support the computation with the introduction of a **Vector Sequence Database (VSDB)** that stores embeddings together with sequence-position and next-delta metadata.

WHY NOW

LLM Evaluation moved forward this cycle; last verified May 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA novel 'Concept Field' approach to measure hallucination and novelty in text corpora using sentence embeddings and a Vector Sequence Database.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A novel 'Concept Field' approach to measure hallucination and novelty in text corpora using sentence embeddings and a Vector Sequence Database.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A novel 'Concept Field' approach to measure hallucination and novelty in text corpora using sentence embeddings and a Vector Sequence Database.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "17d39a48-1a97-4524-b75c-2b21f726b057", "arxiv_id": "2605.05103", "canonical_route": "/paper/text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement", "endpoints": { "paper_pack": "/api/v1/paper/text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement/paper-pack", "build_passport": "/api/v1/paper/text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement", "normalized_query": "2605.05103", "route": "/paper/text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement", "paper_ref": "text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement#webpage", "url": "https://sciencetostartup.com/paper/text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement", "name": "Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement", "description": "A novel 'Concept Field' approach to measure hallucination and novelty in text corpora using sentence embeddings and a Vector Sequence Database.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement#scholarlyArticle", "headline": "Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement", "description": "A novel 'Concept Field' approach to measure hallucination and novelty in text corpora using sentence embeddings and a Vector Sequence Database.", "url": "https://sciencetostartup.com/paper/text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement", "sameAs": "https://arxiv.org/abs/2605.05103", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.05103" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-06T16:38:49.000Z", "author": [ { "@type": "Person", "name": "Nicholas S. Kersting" }, { "@type": "Person", "name": "Vittorio Castelli" }, { "@type": "Person", "name": "Chieh Ting Yeh" }, { "@type": "Person", "name": "Xinzhu Wang" }, { "@type": "Person", "name": "Saad Taame" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Evaluation" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Evaluation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Text Corpora as Concept Fields: Black-Box Hallucination and ", "item": "https://sciencetostartup.com/paper/text-corpora-as-concept-fields-black-box-hallucination-and-novelty-measurement" } ] } ] }

Competitive landscape

A novel 'Concept Field' approach to measure hallucination and novelty in text corpora using sentence embeddings and a Vector Sequence Database.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline