ARXIV:2603.24575 · VISION-LANGUAGE MODELS · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Qijia He · Xunmei Liu · Hammaad Memon · Ziang Li · Zixian Ma · Jaemin Cho · +3 at arXiv

VFIG converts rasterized images of complex figures into editable SVGs using a novel vision-language model and a large-scale dataset, bridging the gap for designers and researchers.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain VFIG converts rasterized images of complex figures into editable SVGs using a novel vision-language model and a large-scale dataset, bridging the gap for designers and researchers.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

VFIG converts rasterized images of complex figures into editable SVGs using a novel vision-language model and a large-scale dataset, bridging the gap for designers and researchers. In practice, however, original vector source files are…

METHOD

Full abstract

Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or JPEG) that are difficult to modify or scale. Manually reconstructing these figures is a prohibitively labor-intensive process, requiring specialized expertise to recover the original geometric intent. To bridge this gap, we propose VFIG, a family of Vision-Language Models trained for complex and high-fidelity figure-to-SVG conversion. While this task is inherently data-driven, existing datasets are typically small-scale and lack the complexity of professional diagrams. We address this by introducing VFIG-DATA, a large-scale dataset of 66K high-quality figure-SVG pairs, curated from a diverse mix of real-world paper figures and procedurally generated diagrams. Recognizing that SVGs are composed of recurring primitives and hierarchical local structures, we introduce a coarse-to-fine training curriculum that begins with supervised fine-tuning (SFT) to learn atomic primitives and transitions to reinforcement learning (RL) refinement to optimize global diagram fidelity, layout consistency, and topological edge cases. Finally, we introduce VFIG-BENCH, a comprehensive evaluation suite with novel metrics designed to measure the structural integrity of complex figures. VFIG achieves state-of-the-art performance among open-source models and performs on par with GPT-5.2, achieving a VLM-Judge score of 0.829 on VFIG-BENCH.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. VFIG achieves state-of-the-art performance among open-source models and performs on par with GPT-5.2, achieving a VLM-Judge score of 0.829 on VFIG-BENCH. Code availability is…

WHY NOW

Vision-Language Models moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainVFIG converts rasterized images of complex figures into editable SVGs using a novel vision-language model and a large-scale dataset, bridging the gap for designers and researchers.

Evidence0 refs | 0 sources | 17% coverage

Blockerno shell-level blocker reported

Analysis summary

VFIG converts rasterized images of complex figures into editable SVGs using a novel vision-language model and a large-scale dataset, bridging the gap for designers and researchers.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

VFIG converts rasterized images of complex figures into editable SVGs using a novel vision-language model and a large-scale dataset, bridging the gap for designers and researchers.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "8c6b946a-ff76-414f-9477-6cfa534b4675", "arxiv_id": "2603.24575", "canonical_route": "/paper/vfig-vectorizing-complex-figures-in-svg-with-vision-language-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "vfig-vectorizing-complex-figures-in-svg-with-vision-language-models", "endpoints": { "paper_pack": "/api/v1/paper/vfig-vectorizing-complex-figures-in-svg-with-vision-language-models/paper-pack", "build_passport": "/api/v1/paper/vfig-vectorizing-complex-figures-in-svg-with-vision-language-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models", "normalized_query": "2603.24575", "route": "/paper/vfig-vectorizing-complex-figures-in-svg-with-vision-language-models", "paper_ref": "vfig-vectorizing-complex-figures-in-svg-with-vision-language-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/vfig-vectorizing-complex-figures-in-svg-with-vision-language-models#webpage", "url": "https://sciencetostartup.com/paper/vfig-vectorizing-complex-figures-in-svg-with-vision-language-models", "name": "VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models", "description": "VFIG converts rasterized images of complex figures into editable SVGs using a novel vision-language model and a large-scale dataset, bridging the gap for designers and researchers.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/vfig-vectorizing-complex-figures-in-svg-with-vision-language-models#scholarlyArticle", "headline": "VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models", "description": "VFIG converts rasterized images of complex figures into editable SVGs using a novel vision-language model and a large-scale dataset, bridging the gap for designers and researchers.", "url": "https://sciencetostartup.com/paper/vfig-vectorizing-complex-figures-in-svg-with-vision-language-models", "sameAs": "https://arxiv.org/abs/2603.24575", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.24575" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-25T17:52:23.000Z", "author": [ { "@type": "Person", "name": "Qijia He", "affiliation": { "@type": "Organization", "name": "University of Washington" } }, { "@type": "Person", "name": "Xunmei Liu", "affiliation": { "@type": "Organization", "name": "University of Washington" } }, { "@type": "Person", "name": "Hammaad Memon", "affiliation": { "@type": "Organization", "name": "University of Washington" } }, { "@type": "Person", "name": "Ziang Li", "affiliation": { "@type": "Organization", "name": "University of Washington" } }, { "@type": "Person", "name": "Zixian Ma", "affiliation": { "@type": "Organization", "name": "Allen Institute for Artificial Intelligence" } }, { "@type": "Person", "name": "Jaemin Cho", "affiliation": { "@type": "Organization", "name": "Allen Institute for Artificial Intelligence" } }, { "@type": "Person", "name": "Jason Ren", "affiliation": { "@type": "Organization", "name": "University of Washington" } }, { "@type": "Person", "name": "Daniel S Weld", "affiliation": { "@type": "Organization", "name": "Allen Institute for Artificial Intelligence" } }, { "@type": "Person", "name": "Ranjay Krishna", "affiliation": { "@type": "Organization", "name": "University of Washington" } } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision-Language Models" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision-Language Models", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "VFIG: Vectorizing Complex Figures in SVG with Vision-Languag", "item": "https://sciencetostartup.com/paper/vfig-vectorizing-complex-figures-in-svg-with-vision-language-models" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"VFIG: Vectorizing Complex Figures in SVG with Vision-Languag\"?", "acceptedAnswer": { "@type": "Answer", "text": "VFig automates conversion of complex raster images into editable SVGs, streamlining vector graphics editing and scaling." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Package the technology as a plugin or extension for popular design software such as Adobe Illustrator, enabling users to convert raster images to SVGs seamlessly." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "A design tool for graphic designers in engineering and education to quickly convert lost vector source files back into SVGs, allowing for easy updates and modifications." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "This tool could replace manual vectorization services and disrupt current graphic design workflows by automating a time-consuming process." } } ] } ] }

Competitive landscape

VFIG converts rasterized images of complex figures into editable SVGs using a novel vision-language model and a large-scale dataset, bridging the gap for designers and researchers.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline