ARXIV:2604.21229 · LLM MEMORY SYSTEMS · SUBMITTED 24 APR · 20:28 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval

Julian Acuna · arXiv

EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning. We introduce EngramaBench, a benchmark for long-term conversational memory…

METHOD

Full abstract

Large language model assistants are increasingly expected to retain and reason over information accumulated across many sessions. We introduce EngramaBench, a benchmark for long-term conversational memory built around five personas, one hundred multi-session conversations, and one hundred fifty queries spanning factual recall, cross-space integration, temporal reasoning, adversarial abstention, and emergent synthesis. We evaluate Engrama, a graph-structured memory system, against GPT-4o full-context prompting and Mem0, an open-source vector-retrieval memory system. All three use the same answering model (GPT-4o), isolating the effect of memory architecture. GPT-4o full-context achieves the highest composite score (0.6186), while Engrama scores 0.5367 globally but is the only system to score higher than full-context prompting on cross-space reasoning (0.6532 vs. 0.6291, n=30). Mem0 is cheapest but substantially weaker (0.4809). Ablations reveal that the components driving Engrama's cross-space advantage trade off against global composite score, exposing a systems-level tension between structured memory specialization and aggregate optimization.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. GPT-4o full-context achieves the highest composite score (0.6186), while Engrama scores 0.5367 globally but is the only system to score higher than full-context prompting…

WHY NOW

LLM Memory Systems moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainEngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.

Segment

LLM Memory Systems

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "fb3e9109-d746-4217-9838-898d5ee61d15", "arxiv_id": "2604.21229", "canonical_route": "/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval", "endpoints": { "paper_pack": "/api/v1/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval/paper-pack", "build_passport": "/api/v1/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval", "normalized_query": "2604.21229", "route": "/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval", "paper_ref": "engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval#webpage", "url": "https://sciencetostartup.com/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval", "name": "EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval", "description": "EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval#scholarlyArticle", "headline": "EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval", "description": "EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.", "url": "https://sciencetostartup.com/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval", "sameAs": "https://arxiv.org/abs/2604.21229", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.21229" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-23T02:51:42.000Z", "author": [ { "@type": "Person", "name": "Julian Acuna" } ], "codeRepository": "https://github.com/julianacunadc/engramabench", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Memory Systems" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval#software", "name": "EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval - Source Code", "description": "EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.", "codeRepository": "https://github.com/julianacunadc/engramabench", "url": "https://github.com/julianacunadc/engramabench" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Memory Systems", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "EngramaBench: Evaluating Long-Term Conversational Memory wit", "item": "https://sciencetostartup.com/paper/engramabench-evaluating-long-term-conversational-memory-with-structured-graph-retrieval" } ] } ] }

Competitive landscape

EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.

Segment

LLM Memory Systems

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval

EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline