ARXIV:2603.28167 · MEDICAL AI · SUBMITTED 31 MAR · 20:19 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Automating Early Disease Prediction Via Structured and Unstructured Clinical Data

Ane G Domingo-Aldama · Marcos Merino Prado · Alain García Olea · Josu Goikoetxea · Koldo Gojenola · Aitziber Atutxa · arXiv

Automate early disease prediction by extracting crucial clinical insights from unstructured discharge reports to enrich existing EHR data.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain Automate early disease prediction by extracting crucial clinical insights from unstructured discharge reports to enrich existing EHR data.

Evidence 0 refs | 3 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Automate early disease prediction by extracting crucial clinical insights from unstructured discharge reports to enrich existing EHR data. The proposed pipeline uses discharge reports to support the three main steps of early prediction: cohort…

METHOD

Full abstract

This study presents a fully automated methodology for early prediction studies in clinical settings, leveraging information extracted from unstructured discharge reports. The proposed pipeline uses discharge reports to support the three main steps of early prediction: cohort selection, dataset generation, and outcome labeling. By processing discharge reports with natural language processing techniques, we can efficiently identify relevant patient cohorts, enrich structured datasets with additional clinical variables, and generate high-quality labels without manual intervention. This approach addresses the frequent issue of missing or incomplete data in codified electronic health records (EHR), capturing clinically relevant information that is often underrepresented. We evaluate the methodology in the context of predicting atrial fibrillation (AF) progression, showing that predictive models trained on datasets enriched with discharge report information achieve higher accuracy and correlation with true outcomes compared to models trained solely on structured EHR data, while also surpassing traditional clinical scores. These results demonstrate that automating the integration of unstructured clinical text can streamline early prediction studies, improve data quality, and enhance the reliability of predictive models for clinical decision-making.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. The proposed pipeline uses discharge reports to support the three main steps of early prediction: cohort selection, dataset generation, and outcome labeling. Code availability…

WHY NOW

Medical AI moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAutomate early disease prediction by extracting crucial clinical insights from unstructured discharge reports to enrich existing EHR data.

Evidence0 refs | 3 sources | 33% coverage

Blockerno shell-level blocker reported

Analysis summary

Automate early disease prediction by extracting crucial clinical insights from unstructured discharge reports to enrich existing EHR data.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Automate early disease prediction by extracting crucial clinical insights from unstructured discharge reports to enrich existing EHR data.

Segment

Medical AI

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2dc5180c-91b0-49ed-9c41-8736310f80eb", "arxiv_id": "2603.28167", "canonical_route": "/paper/automating-early-disease-prediction-via-structured-and-unstructured-clinical-data", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "automating-early-disease-prediction-via-structured-and-unstructured-clinical-data", "endpoints": { "paper_pack": "/api/v1/paper/automating-early-disease-prediction-via-structured-and-unstructured-clinical-data/paper-pack", "build_passport": "/api/v1/paper/automating-early-disease-prediction-via-structured-and-unstructured-clinical-data/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Automating Early Disease Prediction Via Structured and Unstructured Clinical Data", "normalized_query": "2603.28167", "route": "/paper/automating-early-disease-prediction-via-structured-and-unstructured-clinical-data", "paper_ref": "automating-early-disease-prediction-via-structured-and-unstructured-clinical-data", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/automating-early-disease-prediction-via-structured-and-unstructured-clinical-data#webpage", "url": "https://sciencetostartup.com/paper/automating-early-disease-prediction-via-structured-and-unstructured-clinical-data", "name": "Automating Early Disease Prediction Via Structured and Unstructured Clinical Data", "description": "Automate early disease prediction by extracting crucial clinical insights from unstructured discharge reports to enrich existing EHR data.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/automating-early-disease-prediction-via-structured-and-unstructured-clinical-data#scholarlyArticle", "headline": "Automating Early Disease Prediction Via Structured and Unstructured Clinical Data", "description": "Automate early disease prediction by extracting crucial clinical insights from unstructured discharge reports to enrich existing EHR data.", "url": "https://sciencetostartup.com/paper/automating-early-disease-prediction-via-structured-and-unstructured-clinical-data", "sameAs": "https://arxiv.org/abs/2603.28167", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.28167" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-30T08:36:14.000Z", "author": [ { "@type": "Person", "name": "Ane G Domingo-Aldama" }, { "@type": "Person", "name": "Marcos Merino Prado" }, { "@type": "Person", "name": "Alain García Olea" }, { "@type": "Person", "name": "Josu Goikoetxea" }, { "@type": "Person", "name": "Koldo Gojenola" }, { "@type": "Person", "name": "Aitziber Atutxa" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Medical AI" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Medical AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Automating Early Disease Prediction Via Structured and Unstr", "item": "https://sciencetostartup.com/paper/automating-early-disease-prediction-via-structured-and-unstructured-clinical-data" } ] } ] }

Competitive landscape

Automate early disease prediction by extracting crucial clinical insights from unstructured discharge reports to enrich existing EHR data.

Segment

Medical AI

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Automating Early Disease Prediction Via Structured and Unstructured Clinical Data

Automating Early Disease Prediction Via Structured and Unstructured Clinical Data

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline