Proof partial. Core topic fields are ready, but questions or supporting reports are still catching up.
Recent advancements in AI safety and security are focusing on addressing vulnerabilities in generative models and autonomous systems. Research highlights the ease with which low-effort jailbreak attacks can bypass safety filters in text-to-image models, revealing significant gaps in current moderation techniques. Simultaneously, automated red-teaming approaches are evolving, with systems like AgenticRed demonstrating superior performance in identifying model weaknesses without human bias, achieving impressive attack success rates. The safety landscape for large language models (LLMs) is also under scrutiny, as their deployment in scientific applications raises unique risks, necessitating tailored evaluation frameworks to ensure reliability and security. Additionally, the emergence of world models in autonomous decision-making introduces new cognitive risks, including adversarial manipulation and miscalibrated human trust. Collectively, these developments underscore an urgent need for robust safety measures and proactive defense strategies to mitigate risks associated with rapidly evolving AI technologies, particularly in high-stakes environments.
Topic-specific paper and score movement from the daily diff ledger.
Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful systems. Prior work has shown that activation probes may...
Large Vision-Language Models (LVLMs) undergo safety alignment to suppress harmful content. However, current defenses predominantly target explicit malicious patterns in the input representation, often...
Text-to-image generative models are widely deployed in creative tools and online platforms. To mitigate misuse, these systems rely on safety filters and moderation pipelines that aim to block harmful ...
While recent automated red-teaming methods show promise for systematically exposing model vulnerabilities, most existing approaches rely on human-specified workflows. This dependence on manually desig...
Large language models (LLMs) are increasingly applied in specialized domains such as finance and healthcare, where they introduce unique safety risks. Domain-specific datasets of harmful prompts remai...
Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lowe...
As large language models (LLMs) evolve into autonomous "AI scientists," they promise transformative advances but introduce novel vulnerabilities, from potential "biosafety risks" to "dangerous explosi...
World models -- learned internal simulators of environment dynamics -- are rapidly becoming foundational to autonomous decision-making in robotics, autonomous vehicles, and agentic AI. Yet this predic...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID ai-safety-security | Route /topic/ai-safety-security
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/ai-safety-securityMCP example
{
"tool": "search_papers",
"arguments": {
"query": "AI Safety & Security",
"cluster": "AI Safety & Security"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "AI Safety & Security",
"normalized_query": "ai-safety-security",
"route": "/topic/ai-safety-security",
"paper_ref": null,
"topic_slug": "ai-safety-security",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.