ARXIV:2603.24696 · MULTIMODAL AI FOR SCIENCE · SUBMITTED 27 MAR · 20:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration

Gokce Inal · Pouyan Navard · Alper Yilmaz · arXiv

A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis.

Ship in 2-4 weeks›Score8.0Evidence partial

Opportunity summary

Pain A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis.

Evidence 0 refs | 0 sources | 50% coverage

Blocker Evidence partial

Open Build Read PDF Signal Canvas Track

PROBLEM

A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis. A key hindrance is the absence of large-scale datasets that pair real planetary imagery with detailed scientific descriptions.

METHOD

Full abstract

Recent advances in multimodal vision-language models (VLMs) have enabled joint reasoning over visual and textual information, yet their application to planetary science remains largely unexplored. A key hindrance is the absence of large-scale datasets that pair real planetary imagery with detailed scientific descriptions. In this work, we introduce LLaVA-LE (Large Language-and-Vision Assistant for Lunar Exploration), a vision-language model specialized for lunar surface and subsurface characterization. To enable this capability, we curate a new large-scale multimodal lunar dataset, LUCID (LUnar Caption Image Dataset) consisting of 96k high-resolution panchromatic images paired with detailed captions describing lunar terrain characteristics, and 81k question-answer (QA) pairs derived from approximately 20k images in the LUCID dataset. Leveraging this dataset, we fine-tune LLaVA using a two-stage training curriculum: (1) concept alignment for domain-specific terrain description, and (2) instruction-tuned visual question answering. We further design evaluation benchmarks spanning multiple levels of reasoning complexity relevant to lunar terrain analysis. Evaluated against GPT and Gemini judges, LLaVA-LE achieves a 3.3x overall performance gain over Base LLaVA and 2.1x over our Stage 1 model, with a reasoning score of 1.070, exceeding the judge's own reference score, highlighting the effectiveness of domain-specific multimodal data and instruction tuning to advance VLMs in planetary exploration. Code is available at https://github.com/OSUPCVLab/LLaVA-LE.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. To enable this capability, we curate a new large-scale multimodal lunar dataset, LUCID (LUnar Caption Image Dataset) consisting of 96k high-resolution panchromatic images paired…

WHY NOW

Multimodal AI for Science moved forward this cycle; last verified April 2026. Public score 8.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainA specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis.

Evidence0 refs | 0 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

Competitive landscape

A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis.

Segment

Multimodal AI for Science

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "e8f5429d-66a9-4402-ac6f-0d6634f468cc", "arxiv_id": "2603.24696", "canonical_route": "/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "llava-le-large-language-and-vision-assistant-for-lunar-exploration", "endpoints": { "paper_pack": "/api/v1/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration/paper-pack", "build_passport": "/api/v1/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration", "normalized_query": "2603.24696", "route": "/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration", "paper_ref": "llava-le-large-language-and-vision-assistant-for-lunar-exploration", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration#webpage", "url": "https://sciencetostartup.com/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration", "name": "LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration", "description": "A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration#scholarlyArticle", "headline": "LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration", "description": "A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis.", "url": "https://sciencetostartup.com/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration", "sameAs": "https://arxiv.org/abs/2603.24696", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.24696" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-25T18:17:53.000Z", "author": [ { "@type": "Person", "name": "Gokce Inal" }, { "@type": "Person", "name": "Pouyan Navard" }, { "@type": "Person", "name": "Alper Yilmaz" } ], "codeRepository": "https://github.com/OSUPCVLab/LLaVA-LE", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Multimodal AI for Science" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration#software", "name": "LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration - Source Code", "description": "A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis.", "codeRepository": "https://github.com/OSUPCVLab/LLaVA-LE", "url": "https://github.com/OSUPCVLab/LLaVA-LE" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Multimodal AI for Science", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "LLaVA-LE: Large Language-and-Vision Assistant for Lunar Expl", "item": "https://sciencetostartup.com/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration" } ] } ] }

Competitive landscape

A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and analysis.

Segment

Multimodal AI for Science

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration

LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline