ARXIV:2604.19049 · LLM SECURITY · SUBMITTED 22 APR · 20:32 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

Abhinav Agarwal · arXiv

An adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain An adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision.

Evidence 53 refs | 4 sources | 100% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

An adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision. We present Refute-or-Promote, an inference-time reliability pattern combining Stratified Context Hunting (SCH) for candidate generation, adversarial kill mandates,…

METHOD

Full abstract

LLM-assisted defect discovery has a precision crisis: plausible-but-wrong reports overwhelm maintainers and degrade credibility for real findings. We present Refute-or-Promote, an inference-time reliability pattern combining Stratified Context Hunting (SCH) for candidate generation, adversarial kill mandates, context asymmetry, and a Cross-Model Critic (CMC). Adversarial agents attempt to disprove candidates at each promotion gate; cold-start reviewers are intended to reduce anchoring cascades; cross-family review can catch correlated blind spots that same-family review misses. Over a 31-day campaign across 7 targets (security libraries, the ISO C++ standard, major compilers), the pipeline killed roughly 79% of 171 candidates before advancing to disclosure (retrospective aggregate); on a consolidated-protocol subset (lcms2, wolfSSL; n=30), the prospective kill rate was 83%. Outcomes: 4 CVEs (3 public, 1 embargoed); LWG 4549 accepted to the C++ working paper; 5 merged C++ editorial PRs; 3 compiler conformance bugs; 8 merged security-related fixes without CVE; an RFC 9000 errata filed under committee review; and 1+ FIPS 140-3 normative compliance issues under coordinated disclosure -- all evaluated by external acceptance, not benchmarks. The most instructive failure: ten dedicated reviewers unanimously endorsed a non-existent Bleichenbacher padding oracle in OpenSSL's CMS module; it was killed only by a single empirical test, motivating the mandatory empirical gate. No vulnerability was discovered autonomously; the contribution is external structure that filters LLM agents' persistent false positives. As a preliminary transfer test beyond defect discovery, a simplified cross-family critique variant also solved five previously unsolved SymPy instances on SWE-bench Verified and one SWE-rebench hard task.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. As a preliminary transfer test beyond defect discovery, a simplified cross-family critique variant also solved five previously unsolved SymPy instances on SWE-bench Verified and…

WHY NOW

LLM Security moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAn adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision.

Evidence53 refs | 4 sources | 100% coverage

Blockerno shell-level blocker reported

Analysis summary

An adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

An adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision.

Segment

LLM Security

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "41630e13-20e3-4923-b61c-2195aa151609", "arxiv_id": "2604.19049", "canonical_route": "/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco", "endpoints": { "paper_pack": "/api/v1/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco/paper-pack", "build_passport": "/api/v1/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery", "normalized_query": "2604.19049", "route": "/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco", "paper_ref": "refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco#webpage", "url": "https://sciencetostartup.com/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco", "name": "Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery", "description": "An adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco#scholarlyArticle", "headline": "Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery", "description": "An adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision.", "url": "https://sciencetostartup.com/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco", "sameAs": "https://arxiv.org/abs/2604.19049", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.19049" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-21T03:55:35.000Z", "author": [ { "@type": "Person", "name": "Abhinav Agarwal" } ], "codeRepository": "https://github.com/abhinavagarwal07/refute-or-promote", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Security" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco#software", "name": "Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery - Source Code", "description": "An adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision.", "codeRepository": "https://github.com/abhinavagarwal07/refute-or-promote", "url": "https://github.com/abhinavagarwal07/refute-or-promote" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Security", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Re", "item": "https://sciencetostartup.com/paper/refute-or-promote-an-adversarial-stage-gated-multi-agent-review-methodology-for-high-precision-llm-assisted-defect-disco" } ] } ] }

Competitive landscape

An adversarial multi-agent system that refines LLM-assisted defect discovery by filtering false positives and improving precision.

Segment

LLM Security

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline