Recent research on large language models (LLMs) is increasingly focused on understanding their linguistic capabilities and limitations, particularly in relation to human-like reasoning and lexical representation. Studies reveal that while LLMs can generate fluent text, their internal lexicons often differ from human patterns, with larger models producing more typical but less variable responses. Investigations into reasoning traces show that accuracy improves as models provide more contextual information, suggesting that intermediate outputs can enhance decision-making. Additionally, there is growing awareness of the long-tail knowledge problem, where LLMs struggle with infrequent but critical information, raising concerns about fairness and accountability. The exploration of decoding strategies has also highlighted a disconnect between human token selection and model outputs, which may contribute to the detectability of machine-generated text. Collectively, these findings indicate a shift toward more nuanced analyses of LLM behavior, with implications for their deployment in real-world applications across various domains.
Large language models (LLMs) increasingly solve difficult problems by producing "reasoning traces" before emitting a final response. However, it remains unclear how accuracy and decision commitment ev...
Large language models (LLMs) achieve impressive results in terms of fluency in text generation, yet the nature of their linguistic knowledge - in particular the human-likeness of their internal lexico...
Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the in...
Linguistic representation learning in deep neural language models (LMs) has been studied for decades, for both practical and theoretical reasons. However, finding representations in LMs remains an uns...
Large language models (LLMs) are trained on web-scale corpora that exhibit steep power-law distributions, in which the distribution of knowledge is highly long-tailed, with most appearing infrequently...
We propose a method that represents language models by log-likelihood vectors over prompt-response pairs and constructs model maps for comparing their conditional distributions. In this space, distanc...
Estimated density is often interpreted as indicating how typical a sample is under a model. Yet deep models trained on one dataset can assign \emph{higher} density to simpler out-of-distribution (OOD)...
Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human...
Chain-of-thought (CoT) prompting is a de-facto standard technique to elicit reasoning-like responses from large language models (LLMs), allowing them to spell out individual steps before giving a fina...
Large language models (LLMs) are trained on enormous amounts of data and encode knowledge in their parameters. We propose a pipeline to elicit causal relationships from LLMs. Specifically, (i) we samp...