Proof pending. Core topic summary fields are still materializing.
Recent advancements in large language model (LLM) training focus on enhancing performance and adaptability through innovative techniques. Methods like Adaptive Group Policy Optimization improve training stability by dynamically adjusting parameters based on statistical feedback, while frameworks like CONE enhance numerical reasoning by preserving the semantics of complex data. Techniques such as Token-Routed Alignment and mixture-of-depths attention address issues of signal degradation and critical reasoning, respectively. These developments are crucial for builders aiming to deploy LLMs in diverse applications, as they enable models to better handle complex tasks and improve overall reliability, ultimately leading to more effective AI solutions in various domains.
Topic-specific paper and score movement from the daily diff ledger.
Language models encode substantial evaluative knowledge from pretraining, yet current post-training methods rely on external supervision (human annotations, proprietary models, or scalar reward models...
The cosine similarity between a large language model's hidden activations before and after Supervised Fine-Tuning (SFT) remains very high. This, at first glance, suggests that SFT leaves the model's a...
We present Marco-MoE, a suite of fully open multilingual sparse Mixture-of-Experts (MoE) models. Marco-MoE features a highly sparse design in which only around 5\% of the total parameters are activate...
Large pre-trained models (LMs) and Large Language Models (LLMs) are typically effective at capturing language semantics and contextual relationships. However, these models encounter challenges in main...
Recent work has demonstrated the curse of depth in large language models (LLMs), where later layers contribute less to learning and representation than earlier layers. Such under-utilization is linked...
Tokenization is a central component of natural language processing in current large language models (LLMs), enabling models to convert raw text into processable units. Although learned tokenizers are ...
Knowledge distillation (KD) is an essential technique to compress large language models (LLMs) into smaller ones. However, despite the distinct roles of the student model and the teacher model in KD, ...
Group Relative Policy Optimisation (GRPO) enhances large language models by estimating advantages across a group of sampled trajectories. However, mapping these trajectory-level advantages to policy u...
Mixture-of-Experts (MoE) models scale capacity by combining specialized experts, but most existing approaches assume centralized access to training data. In practice, data are distributed across clien...
Continual fine-tuning of large language models (LLMs) is becoming increasingly crucial as these models are deployed in dynamic environments where tasks and data distributions evolve over time. While s...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID llm-training | Route /topic/llm-training
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/llm-trainingMCP example
{
"tool": "search_papers",
"arguments": {
"query": "LLM Training",
"cluster": "LLM Training"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "LLM Training",
"normalized_query": "llm-training",
"route": "/topic/llm-training",
"paper_ref": null,
"topic_slug": "llm-training",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.