ARXIV:2603.25841 · GAZE-CONDITIONED AI · SUBMITTED 30 MAR · 20:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

Trong Thang Pham · Hien Nguyen · Ngan Le · arXiv

Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications.

Ship in 2-4 weeks›Score7.0Evidence partial

Opportunity summary

Pain Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications.

Evidence 32 refs | 4 sources | 83% coverage

Blocker Evidence partial

Open Build Read PDF Signal Canvas Track

PROBLEM

Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications. We introduce GazeQwen, a parameter efficient approach that equips an open-source MLLM with gaze awareness through hidden-state modulation.

METHOD

Full abstract

Current multimodal large language models (MLLMs) cannot effectively utilize eye-gaze information for video understanding, even when gaze cues are supplied via visual overlays or text descriptions. We introduce GazeQwen, a parameter efficient approach that equips an open-source MLLM with gaze awareness through hidden-state modulation. At its core is a compact gaze resampler (~1-5 M trainable parameters) that encodes V-JEPA 2.1 video features together with fixation-derived positional encodings and produces additive residuals injected into selected LLM decoder layers via forward hooks. An optional second training stage adds low-rank adapters (LoRA) to the LLM for tighter integration. Evaluated on all 10 tasks of the StreamGaze benchmark, GazeQwen reaches 63.9% accuracy, a +16.1 point gain over the same Qwen2.5-VL-7B backbone with gaze as visual prompts and +10.5 points over GPT-4o, the highest score among all open-source and proprietary models tested. These results suggest that learning where to inject gaze within an LLM is more effective than scaling model size or engineering better prompts. All code and checkpoints are available at https://github.com/phamtrongthang123/gazeqwen .

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. These results suggest that learning where to inject gaze within an LLM is more effective than scaling model size or engineering better prompts. A…

WHY NOW

Gaze-Conditioned AI moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainOptimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications.

Evidence32 refs | 4 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

Competitive landscape

Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications.

Segment

Gaze-Conditioned AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "646d18a0-4bd5-4b8a-a49b-935701decc75", "arxiv_id": "2603.25841", "canonical_route": "/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding", "endpoints": { "paper_pack": "/api/v1/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding/paper-pack", "build_passport": "/api/v1/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding", "normalized_query": "2603.25841", "route": "/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding", "paper_ref": "gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding#webpage", "url": "https://sciencetostartup.com/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding", "name": "GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding", "description": "Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding#scholarlyArticle", "headline": "GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding", "description": "Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications.", "url": "https://sciencetostartup.com/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding", "sameAs": "https://arxiv.org/abs/2603.25841", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.25841" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-26T19:03:49.000Z", "author": [ { "@type": "Person", "name": "Trong-Thang Pham", "affiliation": { "@type": "Organization", "name": "University of Arkansas" } }, { "@type": "Person", "name": "Hien Nguyen", "affiliation": { "@type": "Organization", "name": "University of Houston" } }, { "@type": "Person", "name": "Ngan Le", "affiliation": { "@type": "Organization", "name": "University of Arkansas" } } ], "codeRepository": "https://github.com/phamtrongthang123/gazeqwen", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Gaze-Conditioned AI" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding#software", "name": "GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding - Source Code", "description": "Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications.", "codeRepository": "https://github.com/phamtrongthang123/gazeqwen", "url": "https://github.com/phamtrongthang123/gazeqwen" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Gaze-Conditioned AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for St", "item": "https://sciencetostartup.com/paper/gazeqwen-lightweight-gaze-conditioned-llm-modulation-for-streaming-video-understanding" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for St\"?", "acceptedAnswer": { "@type": "Answer", "text": "Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "The product could be an SDK or API allowing integration of gaze-based video analysis into existing platforms, or a standalone application for AR environments where gaze tracking is feasible." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "Develop an AR glasses assistant that utilizes gaze input to provide context-aware notifications and guidance to users, improving user experience in dynamic environments like shopping, navigation, or industrial settings." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "It could replace less accurate gaze-tracking video interpretation systems that do not utilize internal LLM representations for processing gaze data, leading to more precise and responsive applications." } } ] } ] }

Competitive landscape

Optimizing video understanding with gaze-driven LLM modulation for improved accuracy in real-time applications.

Segment

Gaze-Conditioned AI

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline