ARXIV:2604.01966 · PERSONALIZED VIDEO QA · SUBMITTED 03 APR · 20:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Ego-Grounding for Personalized Question-Answering in Egocentric Videos

Junbin Xiao · Shenglang Zhang · Pengxiang Zhu · Angela Yao · arXiv

A new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer.

Evidence 0 refs | 0 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer. To this end, we introduce MyEgo, the first egocentric VideoQA dataset…

METHOD

Full abstract

We present the first systematic analysis of multimodal large language models (MLLMs) in personalized question-answering requiring ego-grounding - the ability to understand the camera-wearer in egocentric videos. To this end, we introduce MyEgo, the first egocentric VideoQA dataset designed to evaluate MLLMs' ability to understand, remember, and reason about the camera wearer. MyEgo comprises 541 long videos and 5K personalized questions asking about "my things", "my activities", and "my past". Benchmarking reveals that competitive MLLMs across variants, including open-source vs. proprietary, thinking vs. non-thinking, small vs. large scales all struggle on MyEgo. Top closed- and open-source models (e.g., GPT-5 and Qwen3-VL) achieve only~46% and 36% accuracy, trailing human performance by near 40% and 50% respectively. Surprisingly, neither explicit reasoning nor model scaling yield consistent improvements. Models improve when relevant evidence is explicitly provided, but gains drop over time, indicating limitations in tracking and remembering "me" and "my past". These findings collectively highlight the crucial role of ego-grounding and long-range memory in enabling personalized QA in egocentric videos. We hope MyEgo and our analyses catalyze further progress in these areas for egocentric personalized assistance. Data and code are available at https://github.com/Ryougetsu3606/MyEgo

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Top closed- and open-source models (e.g., GPT-5 and Qwen3-VL) achieve only~46% and 36% accuracy, trailing human performance by near 40% and 50% respectively. A…

WHY NOW

Personalized Video QA moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer.

Evidence0 refs | 0 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

A new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer.

Segment

Personalized Video QA

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "ea36a9f6-5b6a-4b4e-9aa6-fdb061787e22", "arxiv_id": "2604.01966", "canonical_route": "/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "ego-grounding-for-personalized-question-answering-in-egocentric-videos", "endpoints": { "paper_pack": "/api/v1/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos/paper-pack", "build_passport": "/api/v1/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Ego-Grounding for Personalized Question-Answering in Egocentric Videos", "normalized_query": "2604.01966", "route": "/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos", "paper_ref": "ego-grounding-for-personalized-question-answering-in-egocentric-videos", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos#webpage", "url": "https://sciencetostartup.com/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos", "name": "Ego-Grounding for Personalized Question-Answering in Egocentric Videos", "description": "A new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos#scholarlyArticle", "headline": "Ego-Grounding for Personalized Question-Answering in Egocentric Videos", "description": "A new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer.", "url": "https://sciencetostartup.com/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos", "sameAs": "https://arxiv.org/abs/2604.01966", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.01966" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-02T12:29:23.000Z", "author": [ { "@type": "Person", "name": "Junbin Xiao" }, { "@type": "Person", "name": "Shenglang Zhang" }, { "@type": "Person", "name": "Pengxiang Zhu" }, { "@type": "Person", "name": "Angela Yao" } ], "codeRepository": "https://github.com/Ryougetsu3606/MyEgo", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Personalized Video QA" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos#software", "name": "Ego-Grounding for Personalized Question-Answering in Egocentric Videos - Source Code", "description": "A new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer.", "codeRepository": "https://github.com/Ryougetsu3606/MyEgo", "url": "https://github.com/Ryougetsu3606/MyEgo" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Personalized Video QA", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Ego-Grounding for Personalized Question-Answering in Egocent", "item": "https://sciencetostartup.com/paper/ego-grounding-for-personalized-question-answering-in-egocentric-videos" } ] } ] }

Competitive landscape

A new dataset and benchmark for personalized question-answering in egocentric videos, revealing significant limitations in current multimodal LLMs for understanding the camera wearer.

Segment

Personalized Video QA

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Ego-Grounding for Personalized Question-Answering in Egocentric Videos

Ego-Grounding for Personalized Question-Answering in Egocentric Videos

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline