ARXIV:2604.01881 · VIDEO LLMS · SUBMITTED 03 APR · 20:50 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models

Yansong Guo · Chaoyang Zhu · Jiayi Ji · Jianghang Lin · Liujuan Cao · arXiv

HieraVid drastically reduces computational cost for Video LLMs by intelligently pruning video tokens hierarchically, achieving state-of-the-art performance with significantly fewer resources.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain HieraVid drastically reduces computational cost for Video LLMs by intelligently pruning video tokens hierarchically, achieving state-of-the-art performance with significantly fewer resources.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

HieraVid drastically reduces computational cost for Video LLMs by intelligently pruning video tokens hierarchically, achieving state-of-the-art performance with significantly fewer resources. Existing methods mainly prune video tokens at input level while neglecting the inherent…

METHOD

Full abstract

Video Large Language Models (VideoLLMs) have demonstrated impressive capabilities in video understanding, yet the massive number of input video tokens incurs a significant computational burden for deployment. Existing methods mainly prune video tokens at input level while neglecting the inherent information structure embedded in videos and large language models (LLMs). To address this, we propose HieraVid, a hierarchical pruning framework that progressively and dynamically reduces visual redundancy. Based on two observations that videos possess the segment-frame structure and LLMs internally propagate multi-modal information unidirectionally, we decompose pruning into three levels: 1) segment-level, where video tokens are first temporally segmented and spatially merged; 2) frame-level, where similar frames within the same segment are jointly pruned to preserve diversity; 3) layer-level, redundancy gradually shrinks as LLM layer increases w/o compromising performance. We conduct extensive experiments on four widely used video understanding benchmarks to comprehensively evaluate the effectiveness of HieraVid. Remarkably, with only 30% of tokens retained, HieraVid achieves new state-of-the-art performance, while maintaining over 98% and 99% of the performance of LLaVA-Video-7B and LLaVA-OneVision-7B, respectively.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Remarkably, with only 30% of tokens retained, HieraVid achieves new state-of-the-art performance, while maintaining over 98% and 99% of the performance of LLaVA-Video-7B and…

WHY NOW

Video LLMs moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainHieraVid drastically reduces computational cost for Video LLMs by intelligently pruning video tokens hierarchically, achieving state-of-the-art performance with significantly fewer resources.

Evidence0 refs | 0 sources | 33% coverage

Blockerno shell-level blocker reported

Analysis summary

HieraVid drastically reduces computational cost for Video LLMs by intelligently pruning video tokens hierarchically, achieving state-of-the-art performance with significantly fewer resources.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

HieraVid drastically reduces computational cost for Video LLMs by intelligently pruning video tokens hierarchically, achieving state-of-the-art performance with significantly fewer resources.

Segment

Video LLMs

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "9f16e9ca-7d01-4954-a279-d11e4e9f3057", "arxiv_id": "2604.01881", "canonical_route": "/paper/hieravid-hierarchical-token-pruning-for-fast-video-large-language-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "hieravid-hierarchical-token-pruning-for-fast-video-large-language-models", "endpoints": { "paper_pack": "/api/v1/paper/hieravid-hierarchical-token-pruning-for-fast-video-large-language-models/paper-pack", "build_passport": "/api/v1/paper/hieravid-hierarchical-token-pruning-for-fast-video-large-language-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models", "normalized_query": "2604.01881", "route": "/paper/hieravid-hierarchical-token-pruning-for-fast-video-large-language-models", "paper_ref": "hieravid-hierarchical-token-pruning-for-fast-video-large-language-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/hieravid-hierarchical-token-pruning-for-fast-video-large-language-models#webpage", "url": "https://sciencetostartup.com/paper/hieravid-hierarchical-token-pruning-for-fast-video-large-language-models", "name": "HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models", "description": "HieraVid drastically reduces computational cost for Video LLMs by intelligently pruning video tokens hierarchically, achieving state-of-the-art performance with significantly fewer resources.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/hieravid-hierarchical-token-pruning-for-fast-video-large-language-models#scholarlyArticle", "headline": "HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models", "description": "HieraVid drastically reduces computational cost for Video LLMs by intelligently pruning video tokens hierarchically, achieving state-of-the-art performance with significantly fewer resources.", "url": "https://sciencetostartup.com/paper/hieravid-hierarchical-token-pruning-for-fast-video-large-language-models", "sameAs": "https://arxiv.org/abs/2604.01881", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.01881" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-02T10:40:15.000Z", "author": [ { "@type": "Person", "name": "Yansong Guo" }, { "@type": "Person", "name": "Chaoyang Zhu" }, { "@type": "Person", "name": "Jiayi Ji" }, { "@type": "Person", "name": "Jianghang Lin" }, { "@type": "Person", "name": "Liujuan Cao" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Video LLMs" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Video LLMs", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "HieraVid: Hierarchical Token Pruning for Fast Video Large La", "item": "https://sciencetostartup.com/paper/hieravid-hierarchical-token-pruning-for-fast-video-large-language-models" } ] } ] }

Competitive landscape

HieraVid drastically reduces computational cost for Video LLMs by intelligently pruning video tokens hierarchically, achieving state-of-the-art performance with significantly fewer resources.

Segment

Video LLMs

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models

HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline