ARXIV:2605.31097 · DATABASE GENERATION · SUBMITTED 01 JUN · 20:22 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition

Yunkai Lou · Longbin Lai · Shunyang Li · Zhengping Qian · Ying Zhang · arXiv

SpecDB uses LLMs to generate customized relational databases tailored to specific workloads, achieving performance comparable to established systems with a fraction of the code size.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain SpecDB uses LLMs to generate customized relational databases tailored to specific workloads, achieving performance comparable to established systems with a fraction of the code size.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

SpecDB uses LLMs to generate customized relational databases tailored to specific workloads, achieving performance comparable to established systems with a fraction of the code size. We investigate whether a database can instead be generated…

METHOD

Full abstract

Mainstream relational databases ship a uniform feature set across deployments, although individual workloads exercise only a fraction of the available subsystems. We investigate whether a database can instead be generated on demand with a feature set matched to the target workload. We present SpecDB, a system that uses large language models (LLMs) to synthesize customized relational databases. We survey 9 production systems and decompose them into 10 functional modules, each further divided into implementation variants. To capture cross-module dependencies, including cases where implementations in disjoint subtrees must be co-designed, we adopt the FODA feature model and extend it with a cooperate edge, yielding a dependency graph DBGraph. SpecDB operationalizes DBGraph through a layered module-construction pipeline in which each module is generated, validated, and integrated by a dedicated subagent (driven by three inner agents: Main, Tester, Architect), and a Refining Agent that iteratively repairs and tunes the assembled database against a user-supplied refining harness with read-only access to existing database source code. A companion selection component translates a natural-language workload description into a set of implementation variants, providing an end-to-end pipeline from workload description to deployable database. We evaluate SpecDB on TPC-C with BenchmarkSQL. The generated database (23,779 lines of Rust) completes 60-minute TPC-C at 1 and 10 warehouses with zero errors. At 10 warehouses it reaches tpmC=130, compared to 128 for PostgreSQL and 127 for MySQL, with comparable latency at ~3% of their code size. Because the agent operates at module-specification level rather than product source, it can in principle combine techniques across system boundaries. Paired with falling LLM costs, generating a purpose-built database for a target workload is becoming straightforward.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Paired with falling LLM costs, generating a purpose-built database for a target workload is becoming straightforward. Code availability is flagged in the production record;…

WHY NOW

Database Generation moved forward this cycle; last verified June 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainSpecDB uses LLMs to generate customized relational databases tailored to specific workloads, achieving performance comparable to established systems with a fraction of the code size.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

SpecDB uses LLMs to generate customized relational databases tailored to specific workloads, achieving performance comparable to established systems with a fraction of the code size.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

SpecDB uses LLMs to generate customized relational databases tailored to specific workloads, achieving performance comparable to established systems with a fraction of the code size.

Segment

Database Generation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2e8e6aa2-f0d3-45d4-9f9e-5d2075c2d2fd", "arxiv_id": "2605.31097", "canonical_route": "/paper/specdb-llm-generated-customized-databases-via-feature-oriented-decomposition", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "specdb-llm-generated-customized-databases-via-feature-oriented-decomposition", "endpoints": { "paper_pack": "/api/v1/paper/specdb-llm-generated-customized-databases-via-feature-oriented-decomposition/paper-pack", "build_passport": "/api/v1/paper/specdb-llm-generated-customized-databases-via-feature-oriented-decomposition/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition", "normalized_query": "2605.31097", "route": "/paper/specdb-llm-generated-customized-databases-via-feature-oriented-decomposition", "paper_ref": "specdb-llm-generated-customized-databases-via-feature-oriented-decomposition", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/specdb-llm-generated-customized-databases-via-feature-oriented-decomposition#webpage", "url": "https://sciencetostartup.com/paper/specdb-llm-generated-customized-databases-via-feature-oriented-decomposition", "name": "SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition", "description": "SpecDB uses LLMs to generate customized relational databases tailored to specific workloads, achieving performance comparable to established systems with a fraction of the code size.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/specdb-llm-generated-customized-databases-via-feature-oriented-decomposition#scholarlyArticle", "headline": "SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition", "description": "SpecDB uses LLMs to generate customized relational databases tailored to specific workloads, achieving performance comparable to established systems with a fraction of the code size.", "url": "https://sciencetostartup.com/paper/specdb-llm-generated-customized-databases-via-feature-oriented-decomposition", "sameAs": "https://arxiv.org/abs/2605.31097", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.31097" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-29T10:07:43.000Z", "author": [ { "@type": "Person", "name": "Yunkai Lou" }, { "@type": "Person", "name": "Longbin Lai" }, { "@type": "Person", "name": "Shunyang Li" }, { "@type": "Person", "name": "Zhengping Qian" }, { "@type": "Person", "name": "Ying Zhang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Database Generation" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Database Generation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "SpecDB: LLM-Generated Customized Databases via Feature-Orien", "item": "https://sciencetostartup.com/paper/specdb-llm-generated-customized-databases-via-feature-oriented-decomposition" } ] } ] }

Competitive landscape

SpecDB uses LLMs to generate customized relational databases tailored to specific workloads, achieving performance comparable to established systems with a fraction of the code size.

Segment

Database Generation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition

SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline