ARXIV:2603.15295 · NLP DATASETS · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Datasets for Verb Alternations across Languages: BLM Templates and Data Augmentation Strategies

arXiv

Curated datasets for probing verb alternations in multiple languages to enhance LLM performance.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain Curated datasets for probing verb alternations in multiple languages to enhance LLM performance.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Curated datasets for probing verb alternations in multiple languages to enhance LLM performance. In this work, we present curated paradigm-based datasets for four languages, designed to probe systematic cross-sentence knowledge of verb alternations (change-of-state…

METHOD

Full abstract

Large language models (LLMs) have shown remarkable performance across various sentence-based linguistic phenomena, yet their ability to capture cross-sentence paradigmatic patterns, such as verb alternations, remains underexplored. In this work, we present curated paradigm-based datasets for four languages, designed to probe systematic cross-sentence knowledge of verb alternations (change-of-state and object-drop constructions in English, German and Italian, and Hebrew binyanim). The datasets comprise thousands of the Blackbird Language Matrices (BLMs) problems. The BLM task -- an RPM/ARC-like task devised specifically for language -- is a controlled linguistic puzzle where models must select the sentence that completes a pattern according to syntactic and semantic rules. We introduce three types of templates varying in complexity and apply linguistically-informed data augmentation strategies across synthetic and natural data. We provide simple baseline performance results across English, Italian, German, and Hebrew, that demonstrate the diagnostic usefulness of the datasets.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. We provide simple baseline performance results across English, Italian, German, and Hebrew, that demonstrate the diagnostic usefulness of the datasets.

WHY NOW

NLP Datasets moved forward this cycle; last verified April 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainCurated datasets for probing verb alternations in multiple languages to enhance LLM performance.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

Curated datasets for probing verb alternations in multiple languages to enhance LLM performance.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

Curated datasets for probing verb alternations in multiple languages to enhance LLM performance.

Segment

NLP Datasets

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d6cce159-ed1e-4c41-a3b5-36f7c5d0f026", "arxiv_id": "2603.15295", "canonical_route": "/paper/datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies", "endpoints": { "paper_pack": "/api/v1/paper/datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies/paper-pack", "build_passport": "/api/v1/paper/datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Datasets for Verb Alternations across Languages: BLM Templates and Data Augmentation Strategies", "normalized_query": "2603.15295", "route": "/paper/datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies", "paper_ref": "datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies#webpage", "url": "https://sciencetostartup.com/paper/datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies", "name": "Datasets for Verb Alternations across Languages: BLM Templates and Data Augmentation Strategies", "description": "Curated datasets for probing verb alternations in multiple languages to enhance LLM performance.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies#scholarlyArticle", "headline": "Datasets for Verb Alternations across Languages: BLM Templates and Data Augmentation Strategies", "description": "Curated datasets for probing verb alternations in multiple languages to enhance LLM performance.", "url": "https://sciencetostartup.com/paper/datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies", "sameAs": "https://arxiv.org/abs/2603.15295", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.15295" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-16T13:57:38.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "NLP Datasets" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "NLP Datasets", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Datasets for Verb Alternations across Languages: BLM Templat", "item": "https://sciencetostartup.com/paper/datasets-for-verb-alternations-across-languages-blm-templates-and-data-augmentation-strategies" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Why now—timing and market conditions: The rapid global adoption of LLMs has exposed weaknesses in handling non-English languages and complex grammatical structures, creating demand for specialized tools to improve model robustness. Regulatory pressures (e.g., EU AI Act) and competitive differentiation in AI are driving investments in multilingual capabilities, making this a timely solution for companies scaling internationally." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "A multilingual customer service chatbot that accurately handles verb alternations in user queries across English, German, Italian, and Hebrew, ensuring correct interpretation of requests like 'the window broke' vs. 'someone broke the window' to provide appropriate responses without manual intervention." } } ] } ] }

Competitive landscape

Curated datasets for probing verb alternations in multiple languages to enhance LLM performance.

Segment

NLP Datasets

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Datasets for Verb Alternations across Languages: BLM Templates and Data Augmentation Strategies

Datasets for Verb Alternations across Languages: BLM Templates and Data Augmentation Strategies

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline