LLM Security

Proof pending

115papers

6.0viability

-19%30d

Proof pending

Proof pending. Core topic summary fields are still materializing.

State of the Field

Recent research on large language model (LLM) security is increasingly focused on developing practical defenses against various attack vectors, including backdoor and prompt injection attacks. Techniques such as Tail-risk Intrinsic Geometric Smoothing and PRISM offer real-time, low-latency solutions that maintain model utility while mitigating risks of malicious exploitation. The field is also exploring the nuanced mechanics of safety alignment, with approaches like the Disentangled Safety Hypothesis revealing the complexities of harmfulness detection and execution. Meanwhile, multi-agent systems are being scrutinized for their potential to propagate sensitive information, prompting the need for innovative defenses that can operate seamlessly during generation. The emergence of frameworks like PIArena highlights the demand for standardized evaluation platforms to assess the robustness of these defenses across diverse scenarios. As LLMs become more integrated into applications, these advancements aim to enhance their reliability and safety in real-world deployments.

Last updated May 25, 2026

LLM Security

Proof pending

State of the Field

Top Questions

Topic trend

Papers

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing

Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning

Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

SafeSteer: A Decoding-level Defense Mechanism for Multimodal Large Language Models

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

Filters

Topic proof surfaces

LLM Security

Use this topic page as a durable research-area proof surface