Proof pending. Core topic summary fields are still materializing.
Multilingual natural language processing (NLP) is evolving to address the complexities of diverse languages and cultural contexts. Current research focuses on developing frameworks and models that enhance understanding and classification across multiple languages, particularly in low-resource settings. For instance, frameworks for detecting slurs in social media discourse and benchmarks for document understanding in Southeast Asia highlight the need for robust multilingual capabilities. Additionally, advancements like MrBERT and Onomas-CNN X demonstrate how targeted adaptations can optimize performance in specific domains while managing resource constraints. These developments are crucial for builders aiming to create inclusive and effective NLP applications that cater to a global audience, ensuring that technology is accessible and relevant across linguistic boundaries.
This paper presents a multi-stage framework for detecting reclaimed slurs in multilingual social media discourse. It addresses the challenge of identifying reclamatory versus non-reclamatory usage of ...
We introduce MrBERT, a family of 150M-300M parameter encoders built on the ModernBERT architecture and pre-trained on 35 languages and code. Through targeted adaptation, this model family achieves sta...
Multilingual document and scene text understanding plays an important role in applications such as search, finance, and public services. However, most existing benchmarks focus on high-resource langua...
We present a convolutional neural network approach for classifying proper names by language and entity type. Our model, Onomas-CNN X, combines parallel convolution branches with depthwise-separable op...
The development of robust language models for low-resource languages is frequently bottlenecked by the scarcity of high-quality, coherent, and domain-appropriate training corpora. In this paper, we in...
Text-to-SQL systems have achieved strong performance on English benchmarks, yet their behavior in morphologically rich, low-resource languages remains largely unexplored. We introduce BIRDTurk, the fi...
Analysing multilingual social media discourse remains a major challenge in natural language processing, particularly when large-scale public debates span across diverse languages. This study investiga...
In multilingual pretraining, the test loss of a pretrained model is heavily influenced by the proportion of each language in the pretraining data, namely the \textit{language mixture ratios}. Multilin...
Euphemisms substitute socially sensitive expressions, often softening or reframing meaning, and their reliance on cultural and pragmatic context complicates modeling across languages. In this study, w...
Research on developmentally plausible language models has largely focused on English, leaving open questions about multilingual settings. We present a systematic study of compact language models by ex...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID multilingual-nlp | Route /topic/multilingual-nlp
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/multilingual-nlpMCP example
{
"tool": "search_papers",
"arguments": {
"query": "Multilingual NLP",
"cluster": "Multilingual NLP"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "Multilingual NLP",
"normalized_query": "multilingual-nlp",
"route": "/topic/multilingual-nlp",
"paper_ref": null,
"topic_slug": "multilingual-nlp",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.