ARXIV:2603.16653 · VISION-LANGUAGE OPTIMIZATION · SUBMITTED 19 MAR · 20:22 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models

arXiv

HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.

Blocked on Code›Score8.0Evidence unverified

Opportunity summary

Pain HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.

Evidence 0 refs | 0 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance. We argue that this homogeneity ignores the distinct structural nature of the modalities -- spatial locality in images versus semantic…

METHOD

Full abstract

Adapting large-scale Vision-Language Models (VLMs) like CLIP to downstream tasks often suffers from a "one-size-fits-all" architectural approach, where visual and textual tokens are processed uniformly by wide, generic adapters. We argue that this homogeneity ignores the distinct structural nature of the modalities -- spatial locality in images versus semantic density in text. To address this, we propose HeBA (Heterogeneous Bottleneck Adapter), a unified architectural framework that introduces modality-specific structural inductive biases. HeBA departs from conventional designs through three key architectural innovations: (1) Heterogeneity: It processes visual tokens via 2D depthwise-separable convolutions to preserve spatial correlations, while distinctively processing text tokens via dense linear projections to capture semantic relationships; (2) Bottleneck Regularization: Unlike standard expanding adapters, HeBA employs a compression bottleneck (D -> D/4) that explicitly forces the model to learn compact, robust features and acts as a structural regularizer; and (3) Active Gradient Initialization: We challenge the restrictive zero-initialization paradigm, utilizing a Kaiming initialization strategy that ensures sufficient initial gradient flow to accelerate convergence without compromising the frozen backbone's pre-trained knowledge. Extensive experiments demonstrate that HeBA's architecturally specialized design achieves superior stability and accuracy, establishing a new state-of-the-art on 11 few-shot benchmarks. Code is available at https://github.com/Jahid12012021/VLM-HeBA.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. Extensive experiments demonstrate that HeBA's architecturally specialized design achieves superior stability and accuracy, establishing a new state-of-the-art on 11 few-shot benchmarks. A public repository…

WHY NOW

Vision-Language Optimization moved forward this cycle; last verified April 2026. Public score 8.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainHeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.

Evidence0 refs | 0 sources | 50% coverage

Blockermissing authors

Analysis summary

HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.

Segment

Vision-Language Optimization

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "a695c640-a637-4f2d-8031-933789de8ab4", "arxiv_id": "2603.16653", "canonical_route": "/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models", "endpoints": { "paper_pack": "/api/v1/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models/paper-pack", "build_passport": "/api/v1/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models", "normalized_query": "2603.16653", "route": "/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models", "paper_ref": "heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models#webpage", "url": "https://sciencetostartup.com/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models", "name": "HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models", "description": "HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models#scholarlyArticle", "headline": "HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models", "description": "HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.", "url": "https://sciencetostartup.com/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models", "sameAs": "https://arxiv.org/abs/2603.16653", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.16653" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-17T15:23:04.000Z", "author": [ { "@type": "Person", "name": "Md Jahidul Islam", "affiliation": { "@type": "Organization", "name": "Bangladesh University of Engineering and Technology" } } ], "codeRepository": "https://github.com/Jahid12012021/", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision-Language Optimization" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models#software", "name": "HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models - Source Code", "description": "HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.", "codeRepository": "https://github.com/Jahid12012021/", "url": "https://github.com/Jahid12012021/" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision-Language Optimization", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-La", "item": "https://sciencetostartup.com/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-La\"?", "acceptedAnswer": { "@type": "Answer", "text": "HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Package HeBA as a modular extension for existing VLMs like CLIP. Offer it to businesses needing custom image-text processing capabilities, such as e-commerce platforms for better product labeling and categorization." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "Develop a plugin for popular VLMs that enables efficient adaptation to work with industry-specific datasets (e.g., medical imaging, retail inventory) without extensive retraining." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "HeBA has the potential to replace existing parameter-efficient fine-tuning methods that are either too parameter-heavy or inflexible, providing a streamlined, effective alternative for adapting VLMs." } } ] } ] }

Competitive landscape

HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.

Segment

Vision-Language Optimization

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models

HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline