ARXIV:2606.03686 · EMBODIED AI AGENTS · SUBMITTED 03 JUN · 20:42 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

The DeepSpeak-Agentic Dataset

Sarah Barrington · Maty Bohacek · Hany Farid · arXiv

A new dataset and capture system for evaluating and studying human-AI agent interactions and identifying AI-generated content.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A new dataset and capture system for evaluating and studying human-AI agent interactions and identifying AI-generated content.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A new dataset and capture system for evaluating and studying human-AI agent interactions and identifying AI-generated content. We use this dataset to evaluate the automatic forensic identification (audio, video, or text) of AI agents,…

METHOD

Full abstract

We present DeepSpeak-Agentic, a dataset of videos comprising over 37 hours of semi-structured conversations between a human and an embodied AI agent. We use this dataset to evaluate the automatic forensic identification (audio, video, or text) of AI agents, study the nature of human-agent interactions, and provide a benchmark for future advances in the large-language models and AI-generated voices and faces that power embodied AI agents. We also contribute a scalable data-capture system that creates agents, automatically pairs them with human crowd workers, records audiovisual conversations across specified scenarios, and identifies and separates the human and agent in the combined stream.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. We also contribute a scalable data-capture system that creates agents, automatically pairs them with human crowd workers, records audiovisual conversations across specified scenarios, and…

WHY NOW

Embodied AI Agents moved forward this cycle; last verified June 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA new dataset and capture system for evaluating and studying human-AI agent interactions and identifying AI-generated content.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A new dataset and capture system for evaluating and studying human-AI agent interactions and identifying AI-generated content.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A new dataset and capture system for evaluating and studying human-AI agent interactions and identifying AI-generated content.

Segment

Embodied AI Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "456db688-b4e2-4b96-af2b-242195d304a8", "arxiv_id": "2606.03686", "canonical_route": "/paper/the-deepspeak-agentic-dataset", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "the-deepspeak-agentic-dataset", "endpoints": { "paper_pack": "/api/v1/paper/the-deepspeak-agentic-dataset/paper-pack", "build_passport": "/api/v1/paper/the-deepspeak-agentic-dataset/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "The DeepSpeak-Agentic Dataset", "normalized_query": "2606.03686", "route": "/paper/the-deepspeak-agentic-dataset", "paper_ref": "the-deepspeak-agentic-dataset", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/the-deepspeak-agentic-dataset#webpage", "url": "https://sciencetostartup.com/paper/the-deepspeak-agentic-dataset", "name": "The DeepSpeak-Agentic Dataset", "description": "A new dataset and capture system for evaluating and studying human-AI agent interactions and identifying AI-generated content.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/the-deepspeak-agentic-dataset#scholarlyArticle", "headline": "The DeepSpeak-Agentic Dataset", "description": "A new dataset and capture system for evaluating and studying human-AI agent interactions and identifying AI-generated content.", "url": "https://sciencetostartup.com/paper/the-deepspeak-agentic-dataset", "sameAs": "https://arxiv.org/abs/2606.03686", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2606.03686" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-06-02T14:10:18.000Z", "author": [ { "@type": "Person", "name": "Sarah Barrington" }, { "@type": "Person", "name": "Maty Bohacek" }, { "@type": "Person", "name": "Hany Farid" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Embodied AI Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Embodied AI Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "The DeepSpeak-Agentic Dataset", "item": "https://sciencetostartup.com/paper/the-deepspeak-agentic-dataset" } ] } ] }

Competitive landscape

A new dataset and capture system for evaluating and studying human-AI agent interactions and identifying AI-generated content.

Segment

Embodied AI Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

The DeepSpeak-Agentic Dataset

The DeepSpeak-Agentic Dataset

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline