Proof pending. Core topic summary fields are still materializing.
Recent research on large language model (LLM) security is increasingly focused on developing practical defenses against various attack vectors, including backdoor and prompt injection attacks. Techniques such as Tail-risk Intrinsic Geometric Smoothing and PRISM offer real-time, low-latency solutions that maintain model utility while mitigating risks of malicious exploitation. The field is also exploring the nuanced mechanics of safety alignment, with approaches like the Disentangled Safety Hypothesis revealing the complexities of harmfulness detection and execution. Meanwhile, multi-agent systems are being scrutinized for their potential to propagate sensitive information, prompting the need for innovative defenses that can operate seamlessly during generation. The emergence of frameworks like PIArena highlights the demand for standardized evaluation platforms to assess the robustness of these defenses across diverse scenarios. As LLMs become more integrated into applications, these advancements aim to enhance their reliability and safety in real-world deployments.
Topic-specific paper and score movement from the daily diff ledger.
Defending against backdoor attacks in large language models remains a critical practical challenge. Existing defenses mitigate these threats but typically incur high preparation costs and degrade util...
This paper proposes a guaranteed defense method for large language models (LLMs) to safeguard against jailbreaking attacks. Drawing inspiration from the denoised-smoothing approach in the adversarial ...
A new generation of language models reasons entirely in continuous hidden states, producing no tokens and leaving no audit trail. We show that this silence creates a fundamentally new attack surface...
Prompt attacks, including jailbreaks and prompt injections, pose a critical security risk to Large Language Model (LLM) systems. In production, guardrails must mitigate these attacks under strict low-...
Understanding and addressing potential safety alignment risks in large language models (LLMs) is critical for ensuring their safe and trustworthy deployment. In this paper, we highlight an insidious s...
Multimodal large language models (MLLMs) are gaining increasing attention. Due to the heterogeneity of their input features, they face significant challenges in terms of jailbreak defenses. Current de...
Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the persistence of jailbreak attacks suggests a fundamental mech...
Large Language Models (LLMs) face prominent security risks from jailbreaking, a practice that manipulates models to bypass built-in security constraints and generate unethical or unsafe content. Among...
Multi-agent LLM systems introduce a security risk in which sensitive information accessed by one agent can propagate through shared context and reappear in downstream outputs, even without explicit ad...
LLMs are vulnerable to prompt injection attacks. However, this vulnerability has been primarily demonstrated conceptually in academic studies or through a few anecdotal case studies. Its prevalence an...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID llm-security | Route /topic/llm-security
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/llm-securityMCP example
{
"tool": "search_papers",
"arguments": {
"query": "LLM Security",
"cluster": "LLM Security"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "LLM Security",
"normalized_query": "llm-security",
"route": "/topic/llm-security",
"paper_ref": null,
"topic_slug": "llm-security",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.