ARXIV:2603.14794 · VIDEO INTERACTION MODELING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling

arXiv

A comprehensive dataset for modeling multi-person interactions in video, enabling advanced conversational AI applications.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain A comprehensive dataset for modeling multi-person interactions in video, enabling advanced conversational AI applications.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A comprehensive dataset for modeling multi-person interactions in video, enabling advanced conversational AI applications. We introduce \textbf{Face-to-Face with Jimmy Fallon (F2F-JF)}, a 70-hour, 14k-clip dataset of two-person talk-show exchanges that preserves the sequential dependency…

METHOD

Full abstract

Modeling the reactive tempo of human conversation remains difficult because most audio-visual datasets portray isolated speakers delivering short monologues. We introduce \textbf{Face-to-Face with Jimmy Fallon (F2F-JF)}, a 70-hour, 14k-clip dataset of two-person talk-show exchanges that preserves the sequential dependency between a guest turn and the host's response. A semi-automatic pipeline combines multi-person tracking, speech diarization, and lightweight human verification to extract temporally aligned host/guest tracks with tight crops and metadata that are ready for downstream modeling. We showcase the dataset with a reactive, speech-driven digital avatar task in which the host video during $[t_1,t_2]$ is generated from their audio plus the guest's preceding video during $[t_0,t_1]$. Conditioning a MultiTalk-style diffusion model on this cross-person visual context yields small but consistent Emotion-FID and FVD gains while preserving lip-sync quality relative to an audio-only baseline. The dataset, preprocessing recipe, and baseline together provide an end-to-end blueprint for studying dyadic, sequential behavior, which we expand upon throughout the paper. Dataset and code will be made publicly available.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. We introduce \textbf{Face-to-Face with Jimmy Fallon (F2F-JF)}, a 70-hour, 14k-clip dataset of two-person talk-show exchanges that preserves the sequential dependency between a guest turn…

WHY NOW

Video Interaction Modeling moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA comprehensive dataset for modeling multi-person interactions in video, enabling advanced conversational AI applications.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

A comprehensive dataset for modeling multi-person interactions in video, enabling advanced conversational AI applications.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

A comprehensive dataset for modeling multi-person interactions in video, enabling advanced conversational AI applications.

Segment

Video Interaction Modeling

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "df9cfb5a-8d7b-498b-91a1-af9068644689", "arxiv_id": "2603.14794", "canonical_route": "/paper/face-to-face-a-video-dataset-for-multi-person-interaction-modeling", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "face-to-face-a-video-dataset-for-multi-person-interaction-modeling", "endpoints": { "paper_pack": "/api/v1/paper/face-to-face-a-video-dataset-for-multi-person-interaction-modeling/paper-pack", "build_passport": "/api/v1/paper/face-to-face-a-video-dataset-for-multi-person-interaction-modeling/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling", "normalized_query": "2603.14794", "route": "/paper/face-to-face-a-video-dataset-for-multi-person-interaction-modeling", "paper_ref": "face-to-face-a-video-dataset-for-multi-person-interaction-modeling", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/face-to-face-a-video-dataset-for-multi-person-interaction-modeling#webpage", "url": "https://sciencetostartup.com/paper/face-to-face-a-video-dataset-for-multi-person-interaction-modeling", "name": "Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling", "description": "A comprehensive dataset for modeling multi-person interactions in video, enabling advanced conversational AI applications.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/face-to-face-a-video-dataset-for-multi-person-interaction-modeling#scholarlyArticle", "headline": "Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling", "description": "A comprehensive dataset for modeling multi-person interactions in video, enabling advanced conversational AI applications.", "url": "https://sciencetostartup.com/paper/face-to-face-a-video-dataset-for-multi-person-interaction-modeling", "sameAs": "https://arxiv.org/abs/2603.14794", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.14794" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-16T03:50:02.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Video Interaction Modeling" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Video Interaction Modeling", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Face-to-Face: A Video Dataset for Multi-Person Interaction M", "item": "https://sciencetostartup.com/paper/face-to-face-a-video-dataset-for-multi-person-interaction-modeling" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Now is the ideal time because demand for AI-driven content creation and virtual interactions is surging, driven by remote work trends and the need for scalable, personalized media, while advances in diffusion models and dataset curation make this technically feasible." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "A virtual talk show host that can interview real-time guests via video, generating appropriate facial expressions and reactions based on the guest's preceding video and audio, for use in automated content creation or interactive entertainment platforms." } } ] } ] }

Competitive landscape

A comprehensive dataset for modeling multi-person interactions in video, enabling advanced conversational AI applications.

Segment

Video Interaction Modeling

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling

Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline