Proof pending. Core topic summary fields are still materializing.
Research in large language models (LLMs) is advancing our understanding of language processing and learning mechanisms. Recent studies explore how statistical patterns in language input can facilitate syntax acquisition, the geometric structures in model weights, and the trade-offs between model complexity and predictive power. These insights are crucial for developers building applications that rely on LLMs, as they can inform strategies for optimizing model performance, enhancing generalization capabilities, and improving interpretability. By examining the balance between memorization and generalization, researchers are uncovering the underlying principles that govern effective learning in both machines and humans, ultimately leading to more robust and efficient language models.
Topic-specific paper and score movement from the daily diff ledger.
In what ways might statistical signals in linguistic input assist with the acquisition of syntax? Here we hypothesize a mechanism called collocational bootstrapping, in which regularities in word co-o...
Large language models (LLMs) are pretrained by minimizing the cross-entropy loss for next-token prediction. In this paper, we study whether this optimization strategy can induce geometric structure in...
Deep neural networks exhibit a simplicity bias, a well-documented tendency to favor simple functions over complex ones. In this work, we cast new light on this phenomenon through the lens of the Minim...
Building on the abstract Generalized Singular Value Decomposition (GSVD) theory of Brown et al. [2025], we prove that most modern neural architectures admit a generalized SVD representation in which t...
We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such...
We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation...
Quantization reduces the numerical precision of Transformer computations and is widely used to accelerate inference, yet its effect on expressivity remains poorly characterized. We demonstrate a fine-...
Despite the increasing prevalence of large language models (LLMs), we still have a limited understanding of how their representational spaces are structured. This limits our ability to interpret how a...
Humans, animals, and modern machine learning models exhibit impressive abilities to learn complex behaviors and generalize these behaviors to unseen situations. This ability requires us to learn rules...
This paper studies generalization error bounds for Transformer models. Based on the offset Rademacher complexity, we derive sharper generalization bounds for different Transformer architectures, inclu...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID llm-theory | Route /topic/llm-theory
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/llm-theoryMCP example
{
"tool": "search_papers",
"arguments": {
"query": "LLM Theory",
"cluster": "LLM Theory"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "LLM Theory",
"normalized_query": "llm-theory",
"route": "/topic/llm-theory",
"paper_ref": null,
"topic_slug": "llm-theory",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.