ARXIV:2606.03647 · LLM SECURITY · SUBMITTED 03 JUN · 20:32 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs

Vincent Limbach · Jonas Dornbusch · David Lüdke · Stephan Günnemann · Leo Schwinn · arXiv

Introducing Indirect Harm Optimization (IHO), a black-box, adaptive, and efficient attack method that significantly improves LLM jailbreak evaluation and defense comparison, with code and models available.

Ship in 2-4 weeks›Score7.0Evidence verified

Opportunity summary

Pain Introducing Indirect Harm Optimization (IHO), a black-box, adaptive, and efficient attack method that significantly improves LLM jailbreak evaluation and defense comparison, with code and models available.

Evidence 0 refs | 5 sources | 83% coverage

Blocker Evidence verified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Accurately evaluating adversarial robustness is a longstanding challenge. A flawed attack design can inflate robustness estimates, making deployment risk assessment and defense comparison unreliable.

Full abstract

Accurately evaluating adversarial robustness is a longstanding challenge. A flawed attack design can inflate robustness estimates, making deployment risk assessment and defense comparison unreliable. Historically, standardized attacks such as AutoAttack have largely resolved this for image classifiers, providing a reliable evaluation baseline for systematic comparison across defenses. However, no equivalent exists for LLM jailbreak evaluation yet, where designing such an attack is considerably more difficult. A reliable attack must, among other things, be black-box compatible, applicable to arbitrary defense pipelines, and efficient, which no existing method jointly satisfies. We introduce Indirect Harm Optimization (IHO), a masked diffusion language model attacker trained via iterative preference optimization against a harmfulness judge, requiring only black-box access to the target. The same method can be used without modification as a strong adaptive attack on individual behaviors, or as an efficient amortized policy that transfers to held-out behaviors and unseen target models without fine-tuning. Even against layered defenses, such as a Circuit Breaker-trained model combined with an auxiliary detector, IHO improves attack success considerably over state-of-the-art approaches, without any defense-specific adaptation. Our results position IHO as a practical step toward the kind of standardized jailbreak evaluation that has improved reliability in the past. Code and models are available on GitHub and Hugging Face.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Even against layered defenses, such as a Circuit Breaker-trained model combined with an auxiliary detector, IHO improves attack success considerably over state-of-the-art approaches, without…

WHY NOW

LLM Security moved forward this cycle; last verified June 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainIntroducing Indirect Harm Optimization (IHO), a black-box, adaptive, and efficient attack method that significantly improves LLM jailbreak evaluation and defense comparison, with code and models available.

Evidence0 refs | 5 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields available

Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs

Vincent Limbach · Jonas Dornbusch · David Lüdke · Stephan Günnemann · Leo Schwinn · arXiv

Competitive landscape

Segment

LLM Security

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "4e0297bb-0919-4fcd-9d6a-a1d89146b1e3", "arxiv_id": "2606.03647", "canonical_route": "/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms", "endpoints": { "paper_pack": "/api/v1/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms/paper-pack", "build_passport": "/api/v1/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs", "normalized_query": "2606.03647", "route": "/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms", "paper_ref": "black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms#webpage", "url": "https://sciencetostartup.com/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms", "name": "Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs", "description": "Introducing Indirect Harm Optimization (IHO), a black-box, adaptive, and efficient attack method that significantly improves LLM jailbreak evaluation and defense comparison, with code and models available.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms#scholarlyArticle", "headline": "Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs", "description": "Introducing Indirect Harm Optimization (IHO), a black-box, adaptive, and efficient attack method that significantly improves LLM jailbreak evaluation and defense comparison, with code and models available.", "url": "https://sciencetostartup.com/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms", "sameAs": "https://arxiv.org/abs/2606.03647", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2606.03647" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-06-02T13:39:15.000Z", "author": [ { "@type": "Person", "name": "Vincent Limbach" }, { "@type": "Person", "name": "Jonas Dornbusch" }, { "@type": "Person", "name": "David Lüdke" }, { "@type": "Person", "name": "Stephan Günnemann" }, { "@type": "Person", "name": "Leo Schwinn" } ], "codeRepository": "https://github.com/SEML-Lab/IHO", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Security" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms#software", "name": "Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs - Source Code", "description": "Introducing Indirect Harm Optimization (IHO), a black-box, adaptive, and efficient attack method that significantly improves LLM jailbreak evaluation and defense comparison, with code and models available.", "codeRepository": "https://github.com/SEML-Lab/IHO", "url": "https://github.com/SEML-Lab/IHO" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Security", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Black-box, Adaptive, Efficient, Transferable, Harmful, Appli", "item": "https://sciencetostartup.com/paper/black-box-adaptive-efficient-transferable-harmful-applicable-attacks-are-all-you-need-to-break-llms" } ] } ] }

Competitive landscape

Segment

LLM Security

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs

Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline