Do Phone-Use Agents Respect Your Privacy?

Do Phone-Use Agents Respect Your Privacy? | ScienceToStartup

PROBLEM

This research introduces a framework to evaluate and improve the privacy-preserving capabilities of phone-use AI agents, addressing a critical gap in current AI deployment. This question has remained hard to answer because privacy-compliant behavior…

METHOD

Full abstract

We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution. To make this question measurable, we introduce MyPhoneBench, a verifiable evaluation framework for privacy behavior in mobile agents. We operationalize privacy-respecting phone use as permissioned access, minimal disclosure, and user-controlled memory through a minimal privacy contract, iMy, and pair it with instrumented mock apps plus rule-based auditing that make unnecessary permission requests, deceptive re-disclosure, and unnecessary form filling observable and reproducible. Across five frontier models on 10 mobile apps and 300 tasks, we find that task success, privacy-compliant task completion, and later-session use of saved preferences are distinct capabilities, and no single model dominates all three. Evaluating success and privacy jointly reshuffles the model ordering relative to either metric alone. The most persistent failure mode across models is simple data minimization: agents still fill optional personal entries that the task does not require. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use agents. All code, mock apps, and agent trajectories are publicly available at~ https://github.com/tangzhy/MyPhoneBench.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use…

WHY NOW

Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Paper Pack

10.48550/arXiv.2604.00986

Do Phone-Use Agents Respect Your Privacy?

This research introduces a framework to evaluate and improve the privacy-preserving capabilities of phone-use AI agents, addressing a critical gap in current AI deployment.

Abstract

Source availability

PDF linked

The paper record includes a public PDF URL.

Extraction status

Parse run linked

A document parse run is attached to this paper.

Proof status

partial

10 refs; 4 sources; 83% coverage.

What was readable

linkedon file8 anchorsderived fallbacknot indexednot indexed

Derived fallback: Estimated from adjacent evidence; not verified from source.

Viability

7.0

Time to MVP

MVP estimate missing

Commercial

coderepo url

Export

lens / agent

RESULT

PROBLEM

METHOD

WHY NOW

Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Claim map

Abstract-backed public claims while anchored extraction refreshes.

Strong 0Mixed 0Weak 4

Evidencepartial
This research introduces a framework to evaluate and improve the privacy-preserving capabilities of phone-use AI agents, addressing a critical gap in current AI deployment. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
ScienceToStartup currently rates this 7.0/10 on the public viability pass. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use agents. A public repository is linked, so build verification can inspect implementation evidence instead of treating the paper as PDF-only.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial

PDF

Preview the source document here, or use the hero PDF action for a new tab.

REFERENCES

Reference metadata is not materialized in the public index yet. The source PDF remains the authority; cache refresh is optional.

CITED BY

No citing papers are indexed in the public S2S graph yet. This is an explicit zero-signal state, not a hidden lookup.

Foundation

Prior WorkGot a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

7.0

Prior WorkDifferential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

7.0

Prior WorkWebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

7.0

Extension

Related Resources

AI research agents(glossary)
Agents(glossary)
TransportAgents(glossary)
What is the future of AI agents according to Nothing's CEO?(question)
How do LLM efficiency advancements impact the development of AI agents?(question)
How does AgentXRay contribute to the explainability of AI agents in complex decision-making processes?(question)
Agents – Use Cases(use_case)
AI Agents – Use Cases(use_case)

Owned Distribution

Subscribe to the weekly brief

Get the weekly shortlist of commercializable papers, benchmark movers, and proof receipts that matter for product execution.

Paper Pack

10.48550/arXiv.2604.00986

Do Phone-Use Agents Respect Your Privacy?

This research introduces a framework to evaluate and improve the privacy-preserving capabilities of phone-use AI agents, addressing a critical gap in current AI deployment.

Abstract

Source availability

PDF linked

The paper record includes a public PDF URL.

Extraction status

Parse run linked

A document parse run is attached to this paper.

Proof status

partial

10 refs; 4 sources; 83% coverage.

What was readable

linkedon file8 anchorsderived fallbacknot indexednot indexed

Derived fallback: Estimated from adjacent evidence; not verified from source.

Viability

7.0

Time to MVP

MVP estimate missing

Commercial

coderepo url

Export

lens / agent

RESULT

PROBLEM

METHOD

WHY NOW

Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Claim map

Abstract-backed public claims while anchored extraction refreshes.

Strong 0Mixed 0Weak 4

Evidencepartial
This research introduces a framework to evaluate and improve the privacy-preserving capabilities of phone-use AI agents, addressing a critical gap in current AI deployment. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
ScienceToStartup currently rates this 7.0/10 on the public viability pass. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use agents. A public repository is linked, so build verification can inspect implementation evidence instead of treating the paper as PDF-only.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
Agents moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial

PDF

Preview the source document here, or use the hero PDF action for a new tab.

REFERENCES

Reference metadata is not materialized in the public index yet. The source PDF remains the authority; cache refresh is optional.

CITED BY

No citing papers are indexed in the public S2S graph yet. This is an explicit zero-signal state, not a hidden lookup.

Foundation

Prior WorkGot a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

7.0

Prior WorkDifferential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

7.0

Prior WorkWebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

7.0

Extension

Related Resources

AI research agents(glossary)
Agents(glossary)
TransportAgents(glossary)
What is the future of AI agents according to Nothing's CEO?(question)
How do LLM efficiency advancements impact the development of AI agents?(question)
How does AgentXRay contribute to the explainability of AI agents in complex decision-making processes?(question)
Agents – Use Cases(use_case)
AI Agents – Use Cases(use_case)

Owned Distribution

Subscribe to the weekly brief

Get the weekly shortlist of commercializable papers, benchmark movers, and proof receipts that matter for product execution.

Payload preview

Inspect payload

{
  "contract_version": "paper-r2",
  "paper_id": "34c02fa5-0a14-43ed-a8cb-833703063be7",
  "arxiv_id": "2604.00986",
  "canonical_route": "/paper/do-phone-use-agents-respect-your-privacy",
  "active_tab": "synced from current hash by the drawer client",
  "selected_artifact": "do-phone-use-agents-respect-your-privacy",
  "endpoints": {
    "paper_pack": "/api/v1/paper/do-phone-use-agents-respect-your-privacy/paper-pack",
    "build_passport": "/api/v1/paper/do-phone-use-agents-respect-your-privacy/build-passport",
    "mcp_resource": "sciencetostartup://surfaces/paper-workspace"
  }
}

Page Freshness

Canonical route, proof status, last verified, refs, sources, and coverage.

Page Freshness

Paper proof surface

Canonical route: /paper/do-phone-use-agents-respect-your-privacy

stale

Proof freshness: stale
Proof status: partial
Display score: 7/10
Last proof check: 2026-04-03
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 10
Source count: 4
Coverage: 83%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

Endpoint list, payload shape, route context, and copyable handoff data.

Agent Handoff

Do Phone-Use Agents Respect Your Privacy?

Canonical ID do-phone-use-agents-respect-your-privacy | Route /paper/do-phone-use-agents-respect-your-privacy

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/paper/do-phone-use-agents-respect-your-privacy

MCP example

{
  "tool": "get_paper",
  "arguments": {
    "arxiv_id": "2604.00986"
  }
}

source_context

{
  "surface": "paper",
  "mode": "paper",
  "query": "Do Phone-Use Agents Respect Your Privacy?",
  "normalized_query": "2604.00986",
  "route": "/paper/do-phone-use-agents-respect-your-privacy",
  "paper_ref": "do-phone-use-agents-respect-your-privacy",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Buildability Receipt

Verdict, compute envelope, blockers, signature state, and receipt links.