ARXIV:2604.19667 · LLM AGENTS · SUBMITTED 22 APR · 20:32 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

Yi Zhong · Buqiang Xu · Yijun Wang · Zifei Shan · Shuofei Qiao · Guozhou Zheng · +1 at arXiv

A benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development.

Ship in 2-4 weeks›Score6.0Evidence partial

Opportunity summary

Pain A benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development.

Evidence 0 refs | 4 sources | 83% coverage

Blocker Evidence partial

Open Build Read PDF Signal Canvas Track

PROBLEM

A benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must…

METHOD

Full abstract

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write prompts for each step, and repeatedly revise the logic as requirements evolve-making development costly, time-consuming, and error-prone. To study whether large language models can automate this multi-round interaction process, we introduce Chat2Workflow, a benchmark for generating executable visual workflows directly from natural language, and propose a robust agentic framework to mitigate recurrent execution errors. Chat2Workflow is built from a large collection of real-world business workflows, with each instance designed so that the generated workflow can be transformed and directly deployed to practical workflow platforms such as Dify and Coze. Experimental results show that while state-of-the-art language models can often capture high-level intent, they struggle to generate correct, stable, and executable workflows, especially under complex or changing requirements. Although our agentic framework yields up to 5.34% resolve rate gains, the remaining real-world gap positions Chat2Workflow as a foundation for advancing industrial-grade automation. Code is available at https://github.com/zjunlp/Chat2Workflow.

RESULT

ScienceToStartup currently rates this 6.0/10 on the public viability pass. Experimental results show that while state-of-the-art language models can often capture high-level intent, they struggle to generate correct, stable, and executable workflows, especially under…

WHY NOW

LLM Agents moved forward this cycle; last verified April 2026. Public score 6.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score6.0

PainA benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development.

Evidence0 refs | 4 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

A benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

Competitive landscape

A benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development.

Segment

LLM Agents

Adoption evidence

Public code linked for build inspection

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "5b729562-7b4f-457e-85a3-0eaa87ab6a70", "arxiv_id": "2604.19667", "canonical_route": "/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language", "endpoints": { "paper_pack": "/api/v1/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language/paper-pack", "build_passport": "/api/v1/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language", "normalized_query": "2604.19667", "route": "/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language", "paper_ref": "chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language#webpage", "url": "https://sciencetostartup.com/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language", "name": "Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language", "description": "A benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language#scholarlyArticle", "headline": "Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language", "description": "A benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development.", "url": "https://sciencetostartup.com/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language", "sameAs": "https://arxiv.org/abs/2604.19667", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.19667" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-21T16:49:11.000Z", "author": [ { "@type": "Person", "name": "Yi Zhong" }, { "@type": "Person", "name": "Buqiang Xu" }, { "@type": "Person", "name": "Yijun Wang" }, { "@type": "Person", "name": "Zifei Shan" }, { "@type": "Person", "name": "Shuofei Qiao" }, { "@type": "Person", "name": "Guozhou Zheng" }, { "@type": "Person", "name": "Ningyu Zhang" } ], "codeRepository": "https://github.com/zjunlp/Chat2Workflow", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 6 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language#software", "name": "Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language - Source Code", "description": "A benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development.", "codeRepository": "https://github.com/zjunlp/Chat2Workflow", "url": "https://github.com/zjunlp/Chat2Workflow" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Chat2Workflow: A Benchmark for Generating Executable Visual ", "item": "https://sciencetostartup.com/paper/chat2workflow-a-benchmark-for-generating-executable-visual-workflows-with-natural-language" } ] } ] }

Competitive landscape

A benchmark and agentic framework for generating executable visual workflows from natural language, aiming to automate costly manual development.

Segment

LLM Agents

Adoption evidence

Public code linked for build inspection

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline