Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

PROBLEM

Introducing the GAP benchmark to measure and mitigate divergence in text and tool-call safety for LLM agents across multiple domains. Safety evaluations, however, overwhelmingly measure text-level refusal behavior, leaving a critical question unanswered: does…

METHOD

Full abstract

Large language models deployed as agents increasingly interact with external systems through tool calls--actions with real-world consequences that text outputs alone do not carry. Safety evaluations, however, overwhelmingly measure text-level refusal behavior, leaving a critical question unanswered: does alignment that suppresses harmful text also suppress harmful actions? We introduce the GAP benchmark, a systematic evaluation framework that measures divergence between text-level safety and tool-call-level safety in LLM agents. We test six frontier models across six regulated domains (pharmaceutical, financial, educational, employment, legal, and infrastructure), seven jailbreak scenarios per domain, three system prompt conditions (neutral, safety-reinforced, and tool-encouraging), and two prompt variants, producing 17,420 analysis-ready datapoints. Our central finding is that text safety does not transfer to tool-call safety. Across all six models, we observe instances where the model's text output refuses a harmful request while its tool calls simultaneously execute the forbidden action--a divergence we formalize as the GAP metric. Even under safety-reinforced system prompts, 219 such cases persist across all six models. System prompt wording exerts substantial influence on tool-call behavior: TC-safe rates span 21 percentage points for the most robust model and 57 for the most prompt-sensitive, with 16 of 18 pairwise ablation comparisons remaining significant after Bonferroni correction. Runtime governance contracts reduce information leakage in all six models but produce no detectable deterrent effect on forbidden tool-call attempts themselves. These results demonstrate that text-only safety evaluations are insufficient for assessing agent behavior and that tool-call safety requires dedicated measurement and mitigation.

RESULT

ScienceToStartup currently rates this 6.0/10 on the public viability pass. These results demonstrate that text-only safety evaluations are insufficient for assessing agent behavior and that tool-call safety requires dedicated measurement and mitigation.

WHY NOW

AI Safety Evaluation moved forward this cycle; last verified April 2026. Public score 6.0/10.

Paper Pack

10.48550/arXiv.2602.16943

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Introducing the GAP benchmark to measure and mitigate divergence in text and tool-call safety for LLM agents across multiple domains.

Abstract

Source availability

PDF linked

The paper record includes a public PDF URL.

Extraction status

Derived fallback

Read summaries are estimated from adjacent metadata, not verified extraction rows.

Proof status

unverified

0 refs; 0 sources; 17% coverage.

What was readable

linkedon filenot materialized

Viability

6.0

Time to MVP

MVP estimate missing

Commercial

No commercial flags on file

Export

Preparing verified analysis

lens / agent

RESULT

PROBLEM

METHOD

WHY NOW

AI Safety Evaluation moved forward this cycle; last verified April 2026. Public score 6.0/10.

Claim map

Abstract-backed public claims while anchored extraction refreshes.

Strong 0Mixed 0Weak 4

Evidencepartial
Introducing the GAP benchmark to measure and mitigate divergence in text and tool-call safety for LLM agents across multiple domains. Safety evaluations, however, overwhelmingly measure text-level refusal behavior, leaving a critical question unanswered: does alignment that suppresses harmful text also suppress harmful actions?
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
Large language models deployed as agents increasingly interact with external systems through tool calls--actions with real-world consequences that text outputs alone do not carry. Safety evaluations, however, overwhelmingly measure text-level refusal behavior, leaving a critical question unanswered: does alignment that suppresses harmful text also suppress harmful actions?
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
ScienceToStartup currently rates this 6.0/10 on the public viability pass. These results demonstrate that text-only safety evaluations are insufficient for assessing agent behavior and that tool-call safety requires dedicated measurement and mitigation.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
AI Safety Evaluation moved forward this cycle; last verified April 2026. Public score 6.0/10.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial

PDF

Preview the source document here, or use the hero PDF action for a new tab.

REFERENCES

Reference metadata is not materialized in the public index yet. The source PDF remains the authority; cache refresh is optional.

CITED BY

No citing papers are indexed in the public S2S graph yet. This is an explicit zero-signal state, not a hidden lookup.

Foundation

none indexed

Extension

Builds On ThisTraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

5.0

Builds On ThisWhen Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models

5.0

Builds On ThisSafety Paradox: How Enhanced Safety Awareness Leaves LLMs Vulnerable to Posterior Attack

0.0

Owned Distribution

Subscribe to the weekly brief

Get the weekly shortlist of commercializable papers, benchmark movers, and proof receipts that matter for product execution.

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents | ScienceToStartup

Paper Pack

10.48550/arXiv.2602.16943

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Introducing the GAP benchmark to measure and mitigate divergence in text and tool-call safety for LLM agents across multiple domains.

Abstract

Source availability

PDF linked

The paper record includes a public PDF URL.

Extraction status

Derived fallback

Read summaries are estimated from adjacent metadata, not verified extraction rows.

Proof status

unverified

0 refs; 0 sources; 17% coverage.

What was readable

linkedon filenot materialized

Viability

6.0

Time to MVP

MVP estimate missing

Commercial

No commercial flags on file

Export

Preparing verified analysis

lens / agent

RESULT

PROBLEM

METHOD

WHY NOW

AI Safety Evaluation moved forward this cycle; last verified April 2026. Public score 6.0/10.

Claim map

Abstract-backed public claims while anchored extraction refreshes.

Strong 0Mixed 0Weak 4

Evidencepartial
Introducing the GAP benchmark to measure and mitigate divergence in text and tool-call safety for LLM agents across multiple domains. Safety evaluations, however, overwhelmingly measure text-level refusal behavior, leaving a critical question unanswered: does alignment that suppresses harmful text also suppress harmful actions?
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
Large language models deployed as agents increasingly interact with external systems through tool calls--actions with real-world consequences that text outputs alone do not carry. Safety evaluations, however, overwhelmingly measure text-level refusal behavior, leaving a critical question unanswered: does alignment that suppresses harmful text also suppress harmful actions?
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
ScienceToStartup currently rates this 6.0/10 on the public viability pass. These results demonstrate that text-only safety evaluations are insufficient for assessing agent behavior and that tool-call safety requires dedicated measurement and mitigation.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
AI Safety Evaluation moved forward this cycle; last verified April 2026. Public score 6.0/10.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial

PDF

Preview the source document here, or use the hero PDF action for a new tab.

REFERENCES

Reference metadata is not materialized in the public index yet. The source PDF remains the authority; cache refresh is optional.

CITED BY

No citing papers are indexed in the public S2S graph yet. This is an explicit zero-signal state, not a hidden lookup.

Foundation

none indexed

Extension

Builds On ThisTraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

5.0

Builds On ThisWhen Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models

5.0

Builds On ThisSafety Paradox: How Enhanced Safety Awareness Leaves LLMs Vulnerable to Posterior Attack

0.0

Owned Distribution

Subscribe to the weekly brief

Get the weekly shortlist of commercializable papers, benchmark movers, and proof receipts that matter for product execution.

Payload preview

Inspect payload

{
  "contract_version": "paper-r2",
  "paper_id": "f6c3c46a-c58c-4e76-a302-0e277cb394fb",
  "arxiv_id": "2602.16943",
  "canonical_route": "/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents",
  "active_tab": "synced from current hash by the drawer client",
  "selected_artifact": "mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents",
  "endpoints": {
    "paper_pack": "/api/v1/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents/paper-pack",
    "build_passport": "/api/v1/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents/build-passport",
    "mcp_resource": "sciencetostartup://surfaces/paper-workspace"
  }
}

Page Freshness

Canonical route, proof status, last verified, refs, sources, and coverage.

Page Freshness

Paper proof surface

Canonical route: /paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents

stale

Proof freshness: stale
Proof status: unverified
Display score: 6/10
Last proof check: 2026-04-02
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 0
Source count: 0
Coverage: 17%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

Endpoint list, payload shape, route context, and copyable handoff data.

Agent Handoff

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Canonical ID mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents | Route /paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents

MCP example

{
  "tool": "get_paper",
  "arguments": {
    "arxiv_id": "2602.16943"
  }
}

source_context

{
  "surface": "paper",
  "mode": "paper",
  "query": "Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents",
  "normalized_query": "2602.16943",
  "route": "/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents",
  "paper_ref": "mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Buildability Receipt

Verdict, compute envelope, blockers, signature state, and receipt links.

Paper proof page receipt window

Watch and verify: Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

/buildability/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents

Watchwatch

Subject: Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Verdict

Watch

Verdict is Watch because viability or proof quality is intermediate and should be re-evaluated before execution.

Time to first demo

Insufficient data

No first-demo timestamp, owner estimate, or elapsed demo receipt is attached to this surface.

Compute envelope

Structured compute envelope

Insufficient data

Source Proof anchors

Visual citations from the paper document graph.

JSON-LD twin

The application/ld+json payload rendered for agents.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "WebPage",
      "@id": "https://sciencetostartup.com/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents#webpage",
      "url": "https://sciencetostartup.com/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents",
      "name": "Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents",
      "description": "Introducing the GAP benchmark to measure and mitigate divergence in text and tool-call safety for LLM agents across multiple domains.",
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      }
    },
    {
      "@type": "ScholarlyArticle",
      "@id": "https://sciencetostartup.com/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents#scholarlyArticle",
      "headline": "Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents",
      "description": "Introducing the GAP benchmark to measure and mitigate divergence in text and tool-call safety for LLM agents across multiple domains.",
      "url": "https://sciencetostartup.com/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents",
      "sameAs": "https://arxiv.org/abs/2602.16943",
      "identifier": {
        "@type": "PropertyValue",
        "propertyID": "arXiv",
        "value": "2602.16943"
      },
      "isAccessibleForFree": true,
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      },
      "datePublished": "2026-02-18T23:17:15.000Z",
      "additionalProperty": [
        {
          "@type": "PropertyValue",
          "propertyID": "viabilityScore",
          "value": 6
        },
        {
          "@type": "PropertyValue",
          "propertyID": "researchDomain",
          "value": "AI Safety Evaluation"
        }
      ]
    },
    {
      "@type": "BreadcrumbList",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "name": "Home",
          "item": "https://sciencetostartup.com"
        },
        {
          "@type": "ListItem",
          "position": 2,
          "name": "AI Safety Evaluation",
          "item": "https://sciencetostartup.com/topics"
        },
        {
          "@type": "ListItem",
          "position": 3,
          "name": "Mind the GAP: Text Safety Does Not Transfer to Tool-Call Saf",
          "item": "https://sciencetostartup.com/paper/mind-the-gap-text-safety-does-not-transfer-to-tool-call-safety-in-llm-agents"
        }
      ]
    }
  ]
}

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Claim map

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Timeline

Timeline

Claim map

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Constellation map

Competitive landscape

Buzz

Available agents

API/MCP endpoints

Tool contracts

Payload preview

Schema validation

Job trace

Evidence map

Page Freshness

Paper proof surface

Agent Handoff

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Buildability Receipt

Watch and verify: Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Compute envelope

Source Proof anchors

JSON-LD twin

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Evidence ids

Freshness

Hash state

Signature state

Blockers