ARXIV:2603.26511 · LLM TRAINING · SUBMITTED 30 MAR · 21:52 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese

Afonso Simplício · Gonçalo Vinagre · Miguel Moura Ramos · Diogo Tavares · Rafael Ferreira · Giuseppe Attanasio · +16 at arXiv

An open-source LLM specifically trained for European Portuguese, offering superior performance on native tasks and a new suite of pt-PT benchmarks.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain An open-source LLM specifically trained for European Portuguese, offering superior performance on native tasks and a new suite of pt-PT benchmarks.

Evidence 20 refs | 10 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

An open-source LLM specifically trained for European Portuguese, offering superior performance on native tasks and a new suite of pt-PT benchmarks. We introduce AMALIA, a fully open LLM that prioritizes pt-PT by using more…

METHOD

Full abstract

Despite rapid progress in open large language models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and native evaluation, with machine-translated benchmarks likely missing the variant's linguistic and cultural nuances. We introduce AMALIA, a fully open LLM that prioritizes pt-PT by using more high-quality pt-PT data during both the mid- and post-training stages. To evaluate pt-PT more faithfully, we release a suite of pt-PT benchmarks that includes translated standard tasks and four new datasets targeting pt-PT generation, linguistic competence, and pt-PT/pt-BR bias. Experiments show that AMALIA matches strong baselines on translated benchmarks while substantially improving performance on pt-PT-specific evaluations, supporting the case for targeted training and native benchmarking for European Portuguese.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Experiments show that AMALIA matches strong baselines on translated benchmarks while substantially improving performance on pt-PT-specific evaluations, supporting the case for targeted training and…

WHY NOW

LLM Training moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAn open-source LLM specifically trained for European Portuguese, offering superior performance on native tasks and a new suite of pt-PT benchmarks.

Evidence20 refs | 10 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

An open-source LLM specifically trained for European Portuguese, offering superior performance on native tasks and a new suite of pt-PT benchmarks.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

An open-source LLM specifically trained for European Portuguese, offering superior performance on native tasks and a new suite of pt-PT benchmarks.

Segment

LLM Training

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "cbfd65bb-6e0c-47a4-b552-39876c9e3b39", "arxiv_id": "2603.26511", "canonical_route": "/paper/amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese", "endpoints": { "paper_pack": "/api/v1/paper/amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese/paper-pack", "build_passport": "/api/v1/paper/amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese", "normalized_query": "2603.26511", "route": "/paper/amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese", "paper_ref": "amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese#webpage", "url": "https://sciencetostartup.com/paper/amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese", "name": "AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese", "description": "An open-source LLM specifically trained for European Portuguese, offering superior performance on native tasks and a new suite of pt-PT benchmarks.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese#scholarlyArticle", "headline": "AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese", "description": "An open-source LLM specifically trained for European Portuguese, offering superior performance on native tasks and a new suite of pt-PT benchmarks.", "url": "https://sciencetostartup.com/paper/amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese", "sameAs": "https://arxiv.org/abs/2603.26511", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26511" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T15:22:33.000Z", "author": [ { "@type": "Person", "name": "Afonso Simplício" }, { "@type": "Person", "name": "Gonçalo Vinagre" }, { "@type": "Person", "name": "Miguel Moura Ramos" }, { "@type": "Person", "name": "Diogo Tavares" }, { "@type": "Person", "name": "Rafael Ferreira" }, { "@type": "Person", "name": "Giuseppe Attanasio" }, { "@type": "Person", "name": "Duarte M. Alves" }, { "@type": "Person", "name": "Inês Calvo" }, { "@type": "Person", "name": "Inês Vieira" }, { "@type": "Person", "name": "Rui Guerra" }, { "@type": "Person", "name": "James Furtado" }, { "@type": "Person", "name": "Beatriz Canaverde" }, { "@type": "Person", "name": "Iago Paulo" }, { "@type": "Person", "name": "Vasco Ramos" }, { "@type": "Person", "name": "Diogo Glória-Silva" }, { "@type": "Person", "name": "Miguel Faria" }, { "@type": "Person", "name": "Marcos Treviso" }, { "@type": "Person", "name": "Daniel Gomes" }, { "@type": "Person", "name": "Pedro Gomes" }, { "@type": "Person", "name": "David Semedo" }, { "@type": "Person", "name": "André Martins" }, { "@type": "Person", "name": "João Magalhães" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Training" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Training", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "AMALIA Technical Report: A Fully Open Source Large Language ", "item": "https://sciencetostartup.com/paper/amalia-technical-report-a-fully-open-source-large-language-model-for-european-portuguese" } ] } ] }

Competitive landscape

An open-source LLM specifically trained for European Portuguese, offering superior performance on native tasks and a new suite of pt-PT benchmarks.

Segment

LLM Training

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese

AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline