ARXIV:2604.22534 · EHR FEATURE ENGINEERING · SUBMITTED 27 APR · 20:14 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records

Hojjat Karami · David Atienza · Jean-Philippe Thiran · Anisoara Ionescu · arXiv

A feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data. Existing automated methods either lack clinical domain awareness or assume clean, regularly sampled inputs, limiting their applicability…

METHOD

Full abstract

Feature engineering for Electronic Health Records (EHR) is complicated by irregular observation intervals, variable measurement frequencies, and structural sparsity inherent to clinical time series. Existing automated methods either lack clinical domain awareness or assume clean, regularly sampled inputs, limiting their applicability to real-world EHR data. We present \textbf{FeatEHR-LLM}, a framework that leverages Large Language Models (LLMs) to generate clinically meaningful tabular features from irregularly sampled EHR time series. To limit patient privacy exposure, the LLM operates exclusively on dataset schemas and task descriptions rather than raw patient records. A tool-augmented generation mechanism equips the LLM with specialized routines for querying irregular temporal data, enabling it to produce executable feature-extraction code that explicitly handles uneven observation patterns and informative sparsity. FeatEHR-LLM supports both univariate and multivariate feature generation through an iterative, validation-in-the-loop pipeline. Evaluated on eight clinical prediction tasks across four ICU datasets, our framework achieves the highest mean AUROC on 7 out of 8 tasks, with improvements of up to 6 percentage points over strong baselines. Code is available at github.com/hojjatkarami/FeatEHR-LLM.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. FeatEHR-LLM supports both univariate and multivariate feature generation through an iterative, validation-in-the-loop pipeline. Code availability is flagged in the production record; the public repository…

WHY NOW

EHR Feature Engineering moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data.

Segment

EHR Feature Engineering

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2b362c04-cff5-46c7-a051-44ed2e4e1108", "arxiv_id": "2604.22534", "canonical_route": "/paper/featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records", "endpoints": { "paper_pack": "/api/v1/paper/featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records/paper-pack", "build_passport": "/api/v1/paper/featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records", "normalized_query": "2604.22534", "route": "/paper/featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records", "paper_ref": "featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records#webpage", "url": "https://sciencetostartup.com/paper/featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records", "name": "FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records", "description": "A feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records#scholarlyArticle", "headline": "FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records", "description": "A feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data.", "url": "https://sciencetostartup.com/paper/featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records", "sameAs": "https://arxiv.org/abs/2604.22534", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.22534" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-24T13:21:01.000Z", "author": [ { "@type": "Person", "name": "Hojjat Karami", "affiliation": { "@type": "Organization", "name": "École Polytechnique Fédérale de Lausanne (EPFL)" } }, { "@type": "Person", "name": "David Atienza", "affiliation": { "@type": "Organization", "name": "École Polytechnique Fédérale de Lausanne (EPFL)" } }, { "@type": "Person", "name": "Jean-Philippe Thiran", "affiliation": { "@type": "Organization", "name": "École Polytechnique Fédérale de Lausanne (EPFL)" } }, { "@type": "Person", "name": "Anisoara Ionescu", "affiliation": { "@type": "Organization", "name": "École Polytechnique Fédérale de Lausanne (EPFL)" } } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "EHR Feature Engineering" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "EHR Feature Engineering", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "FeatEHR-LLM: Leveraging Large Language Models for Feature En", "item": "https://sciencetostartup.com/paper/featehr-llm-leveraging-large-language-models-for-feature-engineering-in-electronic-health-records" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"FeatEHR-LLM: Leveraging Large Language Models for Feature En\"?", "acceptedAnswer": { "@type": "Answer", "text": "A feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "This could be transformed into a SaaS platform or API that automates feature extraction from EHR systems, integrating with hospital IT setups to streamline and enhance their data analytics pipelines." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "A tool for hospitals and healthcare data scientists to automatically generate interpretable features from EHR data, improving predictive model accuracy and saving time on manual feature engineering." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "It replaces manual feature engineering processes, traditionally conducted with input from healthcare professionals, and existing automated methods that lack clinical domain awareness or struggle with irregular data." } } ] } ] }

Competitive landscape

A feature engineering tool that leverages large language models to generate clinically meaningful features from EHR data.

Segment

EHR Feature Engineering

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records

FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline