ARXIV:2601.18188 · VISION-LANGUAGE NAVIGATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation

arXiv

Build efficient VLN agents with NaVIDA, enhancing vision-action causality for better navigation performance in robots.

Blocked on Code›Score5.0Evidence unverified

Opportunity summary

Pain Build efficient VLN agents with NaVIDA, enhancing vision-action causality for better navigation performance in robots.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Build efficient VLN agents with NaVIDA, enhancing vision-action causality for better navigation performance in robots. However, most existing methods rely on reactive state-action mappings without explicitly modeling how actions causally transform subsequent visual observations.

METHOD

Full abstract

Vision-and-Language Navigation (VLN) requires agents to interpret natural language instructions and act coherently in visually rich environments. However, most existing methods rely on reactive state-action mappings without explicitly modeling how actions causally transform subsequent visual observations. Lacking such vision-action causality, agents cannot anticipate the visual changes induced by its own actions, leading to unstable behaviors, weak generalization, and cumulative error along trajectory. To address these issues, we introduce \textsc{NaVIDA} (\textbf{Nav}igation with \textbf{I}nverse \textbf{D}ynamics \textbf{A}ugmentation), a unified VLN framework that couples policy learning with action-grounded visual dynamics and adaptive execution. \textsc{NaVIDA} augments training with chunk-based inverse-dynamics supervision to learn causal relationship between visual changes and corresponding actions. To structure this supervision and extend the effective planning range, \textsc{NaVIDA} employs hierarchical probabilistic action chunking (HPAC), which organizes trajectories into multi-step chunks and provides discriminative, longer-range visual-change cues. To further curb error accumulation and stabilize behavior at inference, an entropy-guided mechanism adaptively sets the execution horizon of action chunks. Extensive experiments show that \textsc{NaVIDA} achieves superior navigation performance compared to state-of-the-art methods with fewer parameters (3B vs. 8B). Real-world robot evaluations further validate the practical feasibility and effectiveness of our approach. Code and data will be available upon acceptance.

RESULT

ScienceToStartup currently rates this 5.0/10 on the public viability pass. Extensive experiments show that \textsc{NaVIDA} achieves superior navigation performance compared to state-of-the-art methods with fewer parameters (3B vs.

WHY NOW

Vision-Language Navigation moved forward this cycle; last verified April 2026. Public score 5.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainBuild efficient VLN agents with NaVIDA, enhancing vision-action causality for better navigation performance in robots.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

Build efficient VLN agents with NaVIDA, enhancing vision-action causality for better navigation performance in robots.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

Build efficient VLN agents with NaVIDA, enhancing vision-action causality for better navigation performance in robots.

Segment

Vision-Language Navigation

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "f9383dfa-4d73-405d-920e-d2e3d3ce27f7", "arxiv_id": "2601.18188", "canonical_route": "/paper/textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation", "endpoints": { "paper_pack": "/api/v1/paper/textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation/paper-pack", "build_passport": "/api/v1/paper/textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "\\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation", "normalized_query": "2601.18188", "route": "/paper/textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation", "paper_ref": "textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation#webpage", "url": "https://sciencetostartup.com/paper/textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation", "name": "\\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation", "description": "Build efficient VLN agents with NaVIDA, enhancing vision-action causality for better navigation performance in robots.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation#scholarlyArticle", "headline": "\\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation", "description": "Build efficient VLN agents with NaVIDA, enhancing vision-action causality for better navigation performance in robots.", "url": "https://sciencetostartup.com/paper/textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation", "sameAs": "https://arxiv.org/abs/2601.18188", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2601.18188" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-01-26T06:16:17.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 5 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision-Language Navigation" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision-Language Navigation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "\\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dyn", "item": "https://sciencetostartup.com/paper/textsc-navida-vision-language-navigation-with-inverse-dynamics-augmentation" } ] } ] }

Competitive landscape

Build efficient VLN agents with NaVIDA, enhancing vision-action causality for better navigation performance in robots.

Segment

Vision-Language Navigation

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation

\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline