55 papers - avg viability 5.3
Recent advancements in AI security are focusing on enhancing the robustness of systems against emerging threats, particularly in the realms of large language models and generative AI. The introduction of frameworks like Jailbreak Foundry is streamlining the evaluation of jailbreak techniques, allowing for rapid benchmarking of vulnerabilities across various models. Meanwhile, tools such as HubScan are addressing security flaws in retrieval-augmented generation systems by detecting hubness threats that can manipulate search results and content filtering. Additionally, novel approaches to watermarking in latent spaces and backdoor detection in text-to-image models are improving the integrity of AI-generated content. The development of AgentGuardian emphasizes the importance of context-aware access control for AI agents, ensuring they operate within authorized parameters. Collectively, these efforts are not only refining detection and mitigation strategies but also paving the way for more secure AI applications in commercial environments, where the stakes of exploitation are increasingly high.
Enhancing AI-generated content integrity with robust and efficient latent space watermarking.
HubScan detects and mitigates hubness poisoning attacks in retrieval-augmented generation systems for secure AI data access.
Automatically convert jailbreak research into standardized attack modules for consistent benchmarking.
BlackMirror is a plug-and-play, training-free framework that detects backdoors in text-to-image models by identifying semantic deviations between instructions and generated images, suitable for Model-as-a-Service applications.
SpecularNet offers a lightweight, reference-free framework for rapid phishing detection using hierarchical graph autoencoding tailored for web security applications.
AgenticSCR automates secure code review to catch immature vulnerabilities more accurately than traditional tools.
Enhance security of vision-language models with highly effective black-box adversarial attack tool.
An LLM-assisted static analysis framework for auditing code for quantum-vulnerable cryptography and scoring migration risk.
An agentic framework that automates membership inference attacks by self-exploring and refining strategies, improving model auditing without manual feature engineering.
RAGShield provides a five-layer defense-in-depth framework for government RAG systems, using supply chain provenance verification to prevent knowledge base poisoning attacks.