ARXIV:2603.26510 · MEDICAL AI · SUBMITTED 30 MAR · 20:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs

Vinicius Anjos de Almeida · Sandro Saorin da Silva · Josimar Chire · Leonardo Vicenzi · Nícolas Henrique Borges · Helena Kociolek · +5 at arXiv

A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.

Cooling›Score7.0Evidence verified

Opportunity summary

Pain A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.

Evidence 12 refs | 5 sources | 83% coverage

Blocker Evidence verified

Open Build Read PDF Signal Canvas Track

PROBLEM

A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies. Named entity recognition (NER) enables the automatic extraction of medical concepts;…

METHOD

Clinical notes contain valuable unstructured information. Named entity recognition (NER) enables the automatic extraction of medical concepts; however, benchmarks for Portuguese remain scarce.

Full abstract

Clinical notes contain valuable unstructured information. Named entity recognition (NER) enables the automatic extraction of medical concepts; however, benchmarks for Portuguese remain scarce. In this study, we aimed to evaluate BERT-based models and large language models (LLMs) for clinical NER in Portuguese and to test strategies for addressing multilabel imbalance. We compared BioBERTpt, BERTimbau, ModernBERT, and mmBERT with LLMs such as GPT-5 and Gemini-2.5, using the public SemClinBr corpus and a private breast cancer dataset. Models were trained under identical conditions and evaluated using precision, recall, and F1-score. Iterative stratification, weighted loss, and oversampling were explored to mitigate class imbalance. The mmBERT-base model achieved the best performance (micro F1 = 0.76), outperforming all other models. Iterative stratification improved class balance and overall performance. Multilingual BERT models, particularly mmBERT, perform strongly for Portuguese clinical NER and can run locally with limited computational resources. Balanced data-splitting strategies further enhance performance.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Named entity recognition (NER) enables the automatic extraction of medical concepts; however, benchmarks for Portuguese remain scarce. A public repository is linked, so build…

WHY NOW

Medical AI moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.

Evidence12 refs | 5 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs

Vinicius Anjos de Almeida · Sandro Saorin da Silva · Josimar Chire · Leonardo Vicenzi · Nícolas Henrique Borges · Helena Kociolek · +5 at arXiv

A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.

Competitive landscape

A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.

Segment

Medical AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "482c1a1f-f8aa-40d9-89d5-41a7998edde7", "arxiv_id": "2603.26510", "canonical_route": "/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms", "endpoints": { "paper_pack": "/api/v1/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms/paper-pack", "build_passport": "/api/v1/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs", "normalized_query": "2603.26510", "route": "/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms", "paper_ref": "clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms#webpage", "url": "https://sciencetostartup.com/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms", "name": "Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs", "description": "A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms#scholarlyArticle", "headline": "Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs", "description": "A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.", "url": "https://sciencetostartup.com/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms", "sameAs": "https://arxiv.org/abs/2603.26510", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26510" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T15:22:07.000Z", "author": [ { "@type": "Person", "name": "Vinicius Anjos de Almeida" }, { "@type": "Person", "name": "Sandro Saorin da Silva" }, { "@type": "Person", "name": "Josimar Chire" }, { "@type": "Person", "name": "Leonardo Vicenzi" }, { "@type": "Person", "name": "Nícolas Henrique Borges" }, { "@type": "Person", "name": "Helena Kociolek" }, { "@type": "Person", "name": "Sarah Miriã de Castro Rocha" }, { "@type": "Person", "name": "Frederico Nassif Gomes" }, { "@type": "Person", "name": "Júlia Cristina Ferreira" }, { "@type": "Person", "name": "Oge Marques" }, { "@type": "Person", "name": "Lucas Emanuel Silva e Oliveira" } ], "codeRepository": "https://github.com/GRUPOMED4U/clinical_ner_benchmark_paper", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Medical AI" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms#software", "name": "Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs - Source Code", "description": "A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.", "codeRepository": "https://github.com/GRUPOMED4U/clinical_ner_benchmark_paper", "url": "https://github.com/GRUPOMED4U/clinical_ner_benchmark_paper" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Medical AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Clinical named entity recognition in the Portuguese language", "item": "https://sciencetostartup.com/paper/clinical-named-entity-recognition-in-the-portuguese-language-a-benchmark-of-modern-bert-models-and-llms" } ] } ] }

Competitive landscape

A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies.

Segment

Medical AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs

Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

BUILDER'S SANDBOX

Build This Paper

Recommended Stack

Startup Essentials

Founder's Pitch

"A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies."

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

BUILDER'S SANDBOX

Build This Paper

Recommended Stack

Startup Essentials

Founder's Pitch

"A benchmark of modern BERT and LLMs for clinical named entity recognition in Portuguese, demonstrating strong performance with mmBERT and balanced data strategies."

Timeline