Recent advancements in AI security are focusing on enhancing the robustness of systems against emerging threats, particularly in the realms of large language models and generative AI. The introduction of frameworks like Jailbreak Foundry is streamlining the evaluation of jailbreak techniques, allowing for rapid benchmarking of vulnerabilities across various models. Meanwhile, tools such as HubScan are addressing security flaws in retrieval-augmented generation systems by detecting hubness threats that can manipulate search results and content filtering. Additionally, novel approaches to watermarking in latent spaces and backdoor detection in text-to-image models are improving the integrity of AI-generated content. The development of AgentGuardian emphasizes the importance of context-aware access control for AI agents, ensuring they operate within authorized parameters. Collectively, these efforts are not only refining detection and mitigation strategies but also paving the way for more secure AI applications in commercial environments, where the stakes of exploitation are increasingly high.
Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external knowledge via vector similarity search. Nevertheless, thes...
Existing approaches for watermarking AI-generated images often rely on post-hoc methods applied in pixel space, introducing computational overhead and potential visual artifacts. In this work, we expl...
Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and j...
This paper investigates the challenging task of detecting backdoored text-to-image models under black-box settings and introduces a novel detection framework BlackMirror. Existing approaches typically...
Phishing remains the most pervasive threat to the Web, enabling large-scale credential theft and financial fraud through deceptive webpages. While recent reference-based and generative-AI-driven phish...
Secure code review is critical at the pre-commit stage, where vulnerabilities must be caught early under tight latency and limited-context constraints. Existing SAST-based checks are noisy and often m...
Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches...
Vision-language models (VLMs) are vulnerable to adversarial image perturbations. Existing works based on adversarial training against task-specific adversarial examples are computationally expensive a...
As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms...
Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Exi...