AI Safety & Security

Proof partial

8papers

6.1viability

-100%30d

Proof partial

Proof partial. Core topic fields are ready, but questions or supporting reports are still catching up.

State of the Field

Recent advancements in AI safety and security are focusing on addressing vulnerabilities in generative models and autonomous systems. Research highlights the ease with which low-effort jailbreak attacks can bypass safety filters in text-to-image models, revealing significant gaps in current moderation techniques. Simultaneously, automated red-teaming approaches are evolving, with systems like AgenticRed demonstrating superior performance in identifying model weaknesses without human bias, achieving impressive attack success rates. The safety landscape for large language models (LLMs) is also under scrutiny, as their deployment in scientific applications raises unique risks, necessitating tailored evaluation frameworks to ensure reliability and security. Additionally, the emergence of world models in autonomous decision-making introduces new cognitive risks, including adversarial manipulation and miscalibrated human trust. Collectively, these developments underscore an urgent need for robust safety measures and proactive defense strategies to mitigate risks associated with rapidly evolving AI technologies, particularly in high-stakes environments.

Last updated May 21, 2026

AI Safety & Security

Proof partial

State of the Field

Top Questions

Topic trend

Papers

Building Production-Ready Probes For Gemini

Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language Models

Low-Effort Jailbreak Attacks Against Text-to-Image Safety Filters

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming

RiskAtlas: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Toward Reliable, Safe, and Secure LLMs for Scientific Applications

Safety, Security, and Cognitive Risks in World Models

Filters

Topic proof surfaces

AI Safety & Security

Use this topic page as a durable research-area proof surface