Current research in artificial intelligence, particularly in large language models (LLMs) and Transformers, is increasingly focused on enhancing generalization capabilities and understanding the underlying mechanisms of these architectures. Recent investigations reveal that while LLMs excel in specific tasks, they struggle with out-of-distribution generalization, particularly in recognizing periodic patterns. This limitation poses challenges for applications requiring robust adaptability, such as automated customer service or content generation. Moreover, advancements in hybrid architectures combining Transformers with state space models aim to improve efficiency in in-context retrieval tasks, addressing the computational bottlenecks of traditional Transformers. The exploration of task-oriented communication in vision-language models raises important questions about transparency and interpretability, crucial for deploying AI in sensitive environments. As researchers delve into latent reasoning and intrinsic motivation, the field is shifting toward more nuanced models that can better balance performance with ethical considerations, paving the way for more reliable and accountable AI systems in commercial applications.
Large language models (LLMs) based on the Transformer have demonstrated strong performance across diverse tasks. However, current models still exhibit substantial limitations in out-of-distribution (O...
Do Large Language Models (LLMs) possess a Theory of Mind (ToM)? Research into this question has focused on evaluating LLMs against benchmarks and found success across a range of social tasks. However,...
The normalization of query and key vectors is an essential part of the Transformer architecture. It ensures that learning is stable regardless of the scale of these vectors. Some normalization approac...
Intrinsic Motivation (IM) is a paradigm for generating intelligent behavior without external utilities. The existing information-theoretic methods for IM are predominantly based on information transmi...
Learning from human feedback typically relies on preference optimization that constrains policy updates through token-level regularization. However, preference optimization for language models is part...
We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certai...
We investigate whether \emph{LLM-based agents} can develop task-oriented communication protocols that differ from standard natural language in collaborative reasoning tasks. Our focus is on two core p...
Latent reasoning has been recently proposed as a reasoning paradigm and performs multi-step reasoning through generating steps in the latent space instead of the textual space. This paradigm enables r...
Transformers excel at in-context retrieval but suffer from quadratic complexity with sequence length, while State Space Models (SSMs) offer efficient linear-time processing but have limited retrieval ...