Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs | Signal Canvas | ScienceToStartup

Page Freshness

Signal Canvas proof surface

Canonical route: /signal-canvas/claudini-autoresearch-discovers-state-of-the-art-adversarial-attack-algorithms-for-llms

stale

Proof freshness: stale
Proof status: partial
Display score: 8/10
Last proof check: 2026-03-26
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 0
Source count: 0
Coverage: 50%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

Canonical ID claudini-autoresearch-discovers-state-of-the-art-adversarial-attack-algorithms-for-llms | Route /signal-canvas/claudini-autoresearch-discovers-state-of-the-art-adversarial-attack-algorithms-for-llms

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/signal-canvas/claudini-autoresearch-discovers-state-of-the-art-adversarial-attack-algorithms-for-llms

MCP example

{
  "tool": "search_signal_canvas",
  "arguments": {
    "mode": "paper",
    "paper_ref": "claudini-autoresearch-discovers-state-of-the-art-adversarial-attack-algorithms-for-llms",
    "query_text": "Summarize Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs"
  }
}

source_context

{
  "surface": "signal_canvas",
  "mode": "paper",
  "query": "Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs",
  "normalized_query": "2603.24511",
  "route": "/signal-canvas/claudini-autoresearch-discovers-state-of-the-art-adversarial-attack-algorithms-for-llms",
  "paper_ref": "claudini-autoresearch-discovers-state-of-the-art-adversarial-attack-algorithms-for-llms",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Evidence Receipt

Route status: building

Claims: 8

References: Pending verification

Proof: Verification pending

Freshness state: computing

Source paper: Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

PDF: https://arxiv.org/pdf/2603.24511v1

Repository: https://github.com/romovpa/claudini

Source count: Pending verification

Coverage: 50%

Last proof check: 2026-03-26T20:30:32.566Z

Signal Canvas receipt window

Ready for execution: Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

/buildability/claudini-autoresearch-discovers-state-of-the-art-adversarial-attack-algorithms-for-llms

Build Nowready

Subject: Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Verdict

Preparing verified analysis

GitHub Code Pulse

Cached

Stars

219

Health

Last commit

5/7/2026

Forks

Open repository

Claim map

Strong 8Mixed 0Weak 0

Evidencepartial
We show that an autoresearch-style pipeline powered by Claude Code discovers novel white-box adversarial attack algorithms that significantly outperform all existing (30+) methods in jailbreaking and prompt injection evaluations.
Implicationpartial
Directly stated in abstract with strong quantitative comparison to existing methods
Verificationpartial
partial
Evidencepartial
achieving up to 40% attack success rate on CBRN queries against GPT-OSS-Safeguard-20B, compared to ≤10% for existing algorithms
Implicationpartial
Specific numeric comparison provided in abstract with clear performance metrics
Verificationpartial
partial
Evidencepartial
attacks optimized on surrogate models transfer directly to held-out models, achieving 100% ASR against Meta-SecAlign-70B versus 56% for the best baseline
Implicationpartial
Direct quantitative claim with specific model names and performance metrics
Verificationpartial
partial
Evidencepartial
White-box adversarial red-teaming is particularly well-suited for this: existing methods provide strong starting points, and the optimization objective yields dense, quantitative feedback.
Implicationpartial
Direct statement about suitability with clear reasoning provided
Verificationpartial
partial
Evidencepartial
our results are an early demonstration that incremental safety and security research can be automated using LLM agents
Implicationpartial
Direct statement about automation capability, though 'early demonstration' suggests preliminary nature
Verificationpartial
partial
Evidencepartial
Automation in discovering adversarial attacks could be misused if not properly governed; potential ethical concerns around AI security.
Implicationpartial
Explicitly stated in analysis section as a caveat/limitation
Verificationpartial
partial
Evidencepartial
Claudini replaces traditional manually designed adversarial attacks with AI-driven automated discovery, offering faster and more effective security solutions.
Implicationpartial
Implied by comparison to existing methods and stated disruption, though 'faster' aspect is not explicitly quantified
Verificationpartial
partial
Evidencepartial
The market for AI security is growing, with major investments in safeguarding AI systems by big tech companies and financial institutions that can afford premium cybersecurity tools.
Implicationpartial
Stated in analysis section but without specific market data or citations
Verificationpartial
partial

Author intelligence and commercialization panels stay hidden until the proof receipt is verified, cites at least 3 references, includes at least 2 sources, and clears 50% coverage. The paper narrative and citation surfaces remain public while verification is pending.

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Use Signal Canvas as the narrative proof surface

Use this Signal Canvas via API or MCP

Signal Canvas proof surface