ARXIV:2603.24767 · LLM FINE-TUNING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Fine-Tuning A Large Language Model for Systematic Review Screening

Kweku Yamoah · Noah Schroeder · Emmanuel Dorley · Neha Rani · Caleb Schutz · arXiv

Fine-tune LLMs to dramatically accelerate systematic review screening, achieving high agreement with human coders.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain Fine-tune LLMs to dramatically accelerate systematic review screening, achieving high agreement with human coders.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Fine-tune LLMs to dramatically accelerate systematic review screening, achieving high agreement with human coders. Recently, researchers have begun to explore how to use large language models (LLMs) to make this process more efficient.

METHOD

Full abstract

Systematic reviews traditionally have taken considerable amounts of human time and energy to complete, in part due to the extensive number of titles and abstracts that must be reviewed for potential inclusion. Recently, researchers have begun to explore how to use large language models (LLMs) to make this process more efficient. However, research to date has shown inconsistent results. We posit this is because prompting alone may not provide sufficient context for the model(s) to perform well. In this study, we fine-tune a small 1.2 billion parameter open-weight LLM specifically for study screening in the context of a systematic review in which humans rated more than 8500 titles and abstracts for potential inclusion. Our results showed strong performance improvements from the fine-tuned model, with the weighted F1 score improving 80.79% compared to the base model. When run on the full dataset of 8,277 studies, the fine-tuned model had 86.40% agreement with the human coder, a 91.18% true positive rate, a 86.38% true negative rate, and perfect agreement across multiple inference runs. Taken together, our results show that there is promise for fine-tuning LLMs for title and abstract screening in large-scale systematic reviews.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. However, research to date has shown inconsistent results. Code availability is flagged in the production record; the public repository link still needs proof alignment.

WHY NOW

LLM Fine-Tuning moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainFine-tune LLMs to dramatically accelerate systematic review screening, achieving high agreement with human coders.

Evidence0 refs | 0 sources | 17% coverage

Blockerno shell-level blocker reported

Analysis summary

Fine-tune LLMs to dramatically accelerate systematic review screening, achieving high agreement with human coders.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Fine-tune LLMs to dramatically accelerate systematic review screening, achieving high agreement with human coders.

Segment

LLM Fine-Tuning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "63c5b92b-2c2f-4729-b7fd-a0855f715cfc", "arxiv_id": "2603.24767", "canonical_route": "/paper/fine-tuning-a-large-language-model-for-systematic-review-screening", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "fine-tuning-a-large-language-model-for-systematic-review-screening", "endpoints": { "paper_pack": "/api/v1/paper/fine-tuning-a-large-language-model-for-systematic-review-screening/paper-pack", "build_passport": "/api/v1/paper/fine-tuning-a-large-language-model-for-systematic-review-screening/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Fine-Tuning A Large Language Model for Systematic Review Screening", "normalized_query": "2603.24767", "route": "/paper/fine-tuning-a-large-language-model-for-systematic-review-screening", "paper_ref": "fine-tuning-a-large-language-model-for-systematic-review-screening", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/fine-tuning-a-large-language-model-for-systematic-review-screening#webpage", "url": "https://sciencetostartup.com/paper/fine-tuning-a-large-language-model-for-systematic-review-screening", "name": "Fine-Tuning A Large Language Model for Systematic Review Screening", "description": "Fine-tune LLMs to dramatically accelerate systematic review screening, achieving high agreement with human coders.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/fine-tuning-a-large-language-model-for-systematic-review-screening#scholarlyArticle", "headline": "Fine-Tuning A Large Language Model for Systematic Review Screening", "description": "Fine-tune LLMs to dramatically accelerate systematic review screening, achieving high agreement with human coders.", "url": "https://sciencetostartup.com/paper/fine-tuning-a-large-language-model-for-systematic-review-screening", "sameAs": "https://arxiv.org/abs/2603.24767", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.24767" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-25T19:39:42.000Z", "author": [ { "@type": "Person", "name": "Kweku Yamoah" }, { "@type": "Person", "name": "Noah Schroeder" }, { "@type": "Person", "name": "Emmanuel Dorley" }, { "@type": "Person", "name": "Neha Rani" }, { "@type": "Person", "name": "Caleb Schutz" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Fine-Tuning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Fine-Tuning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Fine-Tuning A Large Language Model for Systematic Review Scr", "item": "https://sciencetostartup.com/paper/fine-tuning-a-large-language-model-for-systematic-review-screening" } ] } ] }

Competitive landscape

Fine-tune LLMs to dramatically accelerate systematic review screening, achieving high agreement with human coders.

Segment

LLM Fine-Tuning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Fine-Tuning A Large Language Model for Systematic Review Screening

Fine-Tuning A Large Language Model for Systematic Review Screening

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline