ARXIV:2603.06054 · VISION-LANGUAGE MODELS IN AUTOMATED DRIVING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

arXiv

This research investigates the encoding of visual concepts in Vision-Language Models to understand failures in automated driving applications.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain This research investigates the encoding of visual concepts in Vision-Language Models to understand failures in automated driving applications.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

This research investigates the encoding of visual concepts in Vision-Language Models to understand failures in automated driving applications. However, these models often fail on simple visual questions that are highly relevant to automated driving,…

METHOD

Full abstract

The use of Vision-Language Models (VLMs) in automated driving applications is becoming increasingly common, with the aim of leveraging their reasoning and generalisation capabilities to handle long tail scenarios. However, these models often fail on simple visual questions that are highly relevant to automated driving, and the reasons behind these failures remain poorly understood. In this work, we examine the intermediate activations of VLMs and assess the extent to which specific visual concepts are linearly encoded, with the goal of identifying bottlenecks in the flow of visual information. Specifically, we create counterfactual image sets that differ only in a targeted visual concept and then train linear probes to distinguish between them using the activations of four state-of-the-art (SOTA) VLMs. Our results show that concepts such as the presence of an object or agent in a scene are explicitly and linearly encoded, whereas other spatial visual concepts, such as the orientation of an object or agent, are only implicitly encoded by the spatial structure retained by the vision encoder. In parallel, we observe that in certain cases, even when a concept is linearly encoded in the model's activations, the model still fails to answer correctly. This leads us to identify two failure modes. The first is perceptual failure, where the visual information required to answer a question is not linearly encoded in the model's activations. The second is cognitive failure, where the visual information is present but the model fails to align it correctly with language semantics. Finally, we show that increasing the distance of the object in question quickly degrades the linear separability of the corresponding visual concept. Overall, our findings improve our understanding of failure cases in VLMs on simple visual tasks that are highly relevant to automated driving.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. Our results show that concepts such as the presence of an object or agent in a scene are explicitly and linearly encoded, whereas other…

WHY NOW

Vision-Language Models in Automated Driving moved forward this cycle; last verified April 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainThis research investigates the encoding of visual concepts in Vision-Language Models to understand failures in automated driving applications.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

This research investigates the encoding of visual concepts in Vision-Language Models to understand failures in automated driving applications.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

This research investigates the encoding of visual concepts in Vision-Language Models to understand failures in automated driving applications.

Segment

Vision-Language Models in Automated Driving

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "ca676ba6-61ee-4ea4-a5da-f7579a666ac6", "arxiv_id": "2603.06054", "canonical_route": "/paper/probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving", "endpoints": { "paper_pack": "/api/v1/paper/probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving/paper-pack", "build_passport": "/api/v1/paper/probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving", "normalized_query": "2603.06054", "route": "/paper/probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving", "paper_ref": "probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving#webpage", "url": "https://sciencetostartup.com/paper/probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving", "name": "Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving", "description": "This research investigates the encoding of visual concepts in Vision-Language Models to understand failures in automated driving applications.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving#scholarlyArticle", "headline": "Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving", "description": "This research investigates the encoding of visual concepts in Vision-Language Models to understand failures in automated driving applications.", "url": "https://sciencetostartup.com/paper/probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving", "sameAs": "https://arxiv.org/abs/2603.06054", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.06054" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-06T09:07:57.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision-Language Models in Automated Driving" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision-Language Models in Automated Driving", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Probing Visual Concepts in Lightweight Vision-Language Model", "item": "https://sciencetostartup.com/paper/probing-visual-concepts-in-lightweight-vision-language-models-for-automated-driving" } ] } ] }

Competitive landscape

This research investigates the encoding of visual concepts in Vision-Language Models to understand failures in automated driving applications.

Segment

Vision-Language Models in Automated Driving

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline