LLM Theory

TrendingProof pending

20papers

2.0viability

+600%30d

Proof pending

Proof pending. Core topic summary fields are still materializing.

State of the Field

Research in large language models (LLMs) is advancing our understanding of language processing and learning mechanisms. Recent studies explore how statistical patterns in language input can facilitate syntax acquisition, the geometric structures in model weights, and the trade-offs between model complexity and predictive power. These insights are crucial for developers building applications that rely on LLMs, as they can inform strategies for optimizing model performance, enhancing generalization capabilities, and improving interpretability. By examining the balance between memorization and generalization, researchers are uncovering the underlying principles that govern effective learning in both machines and humans, ultimately leading to more robust and efficient language models.

Last updated May 29, 2026

Topic-linked question coverage is still building for this proof surface.

Topic trend

Topic-specific paper and score movement from the daily diff ledger.

Papers

1-10 of 20

Research Paper·May 19, 2026·Education

Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks

In what ways might statistical signals in linguistic input assist with the acquisition of syntax? Here we hypothesize a mechanism called collocational bootstrapping, in which regularities in word co-o...

5.0 viability

Research Paper·May 12, 2026

Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization

Large language models (LLMs) are pretrained by minimizing the cross-entropy loss for next-token prediction. In this paper, we study whether this optimization strategy can induce geometric structure in...

4.0 viability

Research Paper·Mar 26, 2026

A Compression Perspective on Simplicity Bias

Deep neural networks exhibit a simplicity bias, a well-documented tendency to favor simple functions over complex ones. In this work, we cast new light on this phenomenon through the lens of the Minim...

3.0 viability

Research Paper·May 7, 2026

A Generalized Singular Value Theory for Neural Networks

Building on the abstract Generalized Singular Value Decomposition (GSVD) theory of Brown et al. [2025], we prove that most modern neural architectures admit a generalized SVD representation in which t...

3.0 viability

Research Paper·May 8, 2026

Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization

We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such...

3.0 viability

Research Paper·May 13, 2026

A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning

We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation...

3.0 viability

Research Paper·Feb 2, 2026

Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized Transformers

Quantization reduces the numerical precision of Transformer computations and is widely used to accelerate inference, yet its effect on expressivity remains poorly characterized. We demonstrate a fine-...

3.0 viability

Research Paper·Apr 8, 2026

Learning is Forgetting: LLM Training As Lossy Compression

Despite the increasing prevalence of large language models (LLMs), we still have a limited understanding of how their representational spaces are structured. This limits our ability to interpret how a...

2.0 viabilityHas code

Research Paper·May 21, 2026

A mathematical theory of balancing relational generalization and memorization

Humans, animals, and modern machine learning models exhibit impressive abilities to learn complex behaviors and generalize these behaviors to unseen situations. This ability requires us to learn rules...

2.0 viability

Research Paper·Mar 23, 2026

Sharper Generalization Bounds for Transformer

This paper studies generalization error bounds for Transformer models. Based on the offset Rademacher complexity, we derive sharper generalization bounds for different Transformer architectures, inclu...

2.0 viability

Page 1 of 2

LLM Theory

Proof pending

State of the Field

Topic trend

Papers

Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks

Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization

A Compression Perspective on Simplicity Bias

A Generalized Singular Value Theory for Neural Networks

Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization

A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning

Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized Transformers

Learning is Forgetting: LLM Training As Lossy Compression

A mathematical theory of balancing relational generalization and memorization

Sharper Generalization Bounds for Transformer

Filters

Topic proof surfaces

LLM Theory

Use this topic page as a durable research-area proof surface