persistence

Gold definitionUpdated Apr 2, 2026

Definition

In LLM-based systems, persistence refers to an attacker's ability to maintain control or influence over the system over time, often achieved through 'memory and retrieval poisoning.' It is a critical stage in the promptware kill chain, analogous to traditional malware campaigns.

At a glance

Executive summary

Persistence in LLM security refers to an attacker's ability to maintain control over an AI system, similar to how malware stays on a computer. It's achieved by subtly altering the LLM's memory or data sources, allowing for ongoing malicious actions without needing to restart the attack.

TL;DR

In AI security, persistence means an attacker can make their malicious commands stick around in an LLM's memory or data, letting them keep control over time.

Key points

Achieved through 'memory and retrieval poisoning' in LLM systems.
Solves the problem of maintaining sustained malicious influence or access over an LLM.
Used by attackers targeting LLM-based applications, mirroring traditional malware tactics.
Unlike ephemeral prompt injections, persistence ensures long-term control over the LLM.
A growing research trend in LLM security, focusing on understanding and mitigating advanced attack stages.

Use cases

Embedding persistent backdoors in LLM agents to continuously exfiltrate sensitive data over time.
Poisoning an LLM's knowledge base to consistently generate biased or harmful content in response to specific queries.
Maintaining unauthorized access to an LLM-powered financial agent to execute fraudulent transactions repeatedly.
Injecting persistent instructions into an autonomous LLM agent to subtly alter its decision-making process over multiple tasks.
Ensuring a jailbreak remains active across sessions, allowing an attacker to bypass safety filters indefinitely.

Also known as

LLM persistence, promptware persistence, memory poisoning, retrieval poisoning