Recent advancements in language models are increasingly focused on enhancing accessibility and efficiency, particularly for low-resource languages. Innovations like Kakugo enable the creation of small language models for 54 languages at minimal cost, democratizing AI development for underserved communities. Meanwhile, techniques such as reward-guided stitching in diffusion models are improving reasoning capabilities by aggregating intermediate outputs, leading to significant accuracy gains in complex tasks. The introduction of specialized models like LilMoo for Hindi and Sabiá-4 for Brazilian Portuguese highlights a trend toward tailored solutions that outperform larger multilingual counterparts in specific linguistic contexts. Additionally, value-aware numerical representations are addressing fundamental weaknesses in numerical reasoning, while low-resolution visual tokens are being explored to enrich character modeling in languages like Chinese. Collectively, these efforts are reshaping the landscape of language modeling, making it more inclusive and robust for diverse applications across various languages and tasks.
Reasoning with large language models often benefits from generating multiple chains-of-thought, but existing aggregation strategies are typically trajectory-level (e.g., selecting the best trace or vo...
We present Kakugo, a novel and cost-effective pipeline designed to train general-purpose Small Language Models (SLMs) for low-resource languages using only the language name as input. By using a large...
Transformer-based language models often achieve strong results on mathematical reasoning benchmarks while remaining fragile on basic numerical understanding and arithmetic operations. A central limita...
We introduce A.X K1, a 519B-parameter Mixture-of-Experts (MoE) language model trained from scratch. Our design leverages scaling laws to optimize training configurations and vocabulary size under fixe...
This technical report presents Sabiá-4 and Sabiazinho-4, a new generation of Portuguese language models with a focus on Brazilian Portuguese language. The models were developed through a four-stage tr...
Large language models typically represent Chinese characters as discrete index-based tokens, largely ignoring their visual form. For logographic scripts, visual structure carries semantic and phonetic...
The dominance of large multilingual foundation models has widened linguistic inequalities in Natural Language Processing (NLP), often leaving low-resource languages underrepresented. This paper introd...
Multimodal large language models (MLLMs) have achieved remarkable success across a broad range of vision tasks. However, constrained by the capacity of their internal world knowledge, prior work has p...
Despite emerging research on Language Models (LM), few approaches analyse the invertibility of LMs. That is, given a LM and a desirable target output sequence of tokens, determining what input prompts...
Discrete diffusion language models (dLLMs) generate text by iteratively denoising a masked sequence. Compared with autoregressive models, this paradigm naturally supports parallel decoding, bidirectio...