ARXIV:2603.26174 · CREATIVE IMAGE MANIPULATION · SUBMITTED 30 MAR · 21:54 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions

Chonghuinan Wang · Zihan Chen · Yuxiang Wei · Tianyi Jiang · Xiaohe Wu · Fan Li · +2 at arXiv

An automated evaluation framework and benchmark for creative image manipulation that provides reliable metrics and identifies key challenges for model development.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain An automated evaluation framework and benchmark for creative image manipulation that provides reliable metrics and identifies key challenges for model development.

Evidence 109 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

An automated evaluation framework and benchmark for creative image manipulation that provides reliable metrics and identifies key challenges for model development. However, existing evaluation methods lack a systematic and human-aligned framework for assessing model…

METHOD

Full abstract

Instruction-based multimodal image manipulation has recently made rapid progress. However, existing evaluation methods lack a systematic and human-aligned framework for assessing model performance on complex and creative editing tasks. To address this gap, we propose CREval, a fully automated question-answer (QA)-based evaluation pipeline that overcomes the incompleteness and poor interpretability of opaque Multimodal Large Language Models (MLLMs) scoring. Simultaneously, we introduce CREval-Bench, a comprehensive benchmark specifically designed for creative image manipulation under complex instructions. CREval-Bench covers three categories and nine creative dimensions, comprising over 800 editing samples and 13K evaluation queries. Leveraging this pipeline and benchmark, we systematically evaluate a diverse set of state-of-the-art open and closed-source models. The results reveal that while closed-source models generally outperform open-source ones on complex and creative tasks, all models still struggle to complete such edits effectively. In addition, user studies demonstrate strong consistency between CREval's automated metrics and human judgments. Therefore, CREval provides a reliable foundation for evaluating image editing models on complex and creative image manipulation tasks, and highlights key challenges and opportunities for future research.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. The results reveal that while closed-source models generally outperform open-source ones on complex and creative tasks, all models still struggle to complete such edits…

WHY NOW

Creative Image Manipulation moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAn automated evaluation framework and benchmark for creative image manipulation that provides reliable metrics and identifies key challenges for model development.

Evidence109 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

An automated evaluation framework and benchmark for creative image manipulation that provides reliable metrics and identifies key challenges for model development.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

An automated evaluation framework and benchmark for creative image manipulation that provides reliable metrics and identifies key challenges for model development.

Segment

Creative Image Manipulation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "9206f72e-714f-4fad-998a-8a1d41d5dc04", "arxiv_id": "2603.26174", "canonical_route": "/paper/creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions", "endpoints": { "paper_pack": "/api/v1/paper/creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions/paper-pack", "build_passport": "/api/v1/paper/creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions", "normalized_query": "2603.26174", "route": "/paper/creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions", "paper_ref": "creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions#webpage", "url": "https://sciencetostartup.com/paper/creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions", "name": "CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions", "description": "An automated evaluation framework and benchmark for creative image manipulation that provides reliable metrics and identifies key challenges for model development.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions#scholarlyArticle", "headline": "CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions", "description": "An automated evaluation framework and benchmark for creative image manipulation that provides reliable metrics and identifies key challenges for model development.", "url": "https://sciencetostartup.com/paper/creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions", "sameAs": "https://arxiv.org/abs/2603.26174", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26174" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T08:42:09.000Z", "author": [ { "@type": "Person", "name": "Chonghuinan Wang" }, { "@type": "Person", "name": "Zihan Chen" }, { "@type": "Person", "name": "Yuxiang Wei" }, { "@type": "Person", "name": "Tianyi Jiang" }, { "@type": "Person", "name": "Xiaohe Wu" }, { "@type": "Person", "name": "Fan Li" }, { "@type": "Person", "name": "Wangmeng Zuo" }, { "@type": "Person", "name": "Hongxun Yao" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Creative Image Manipulation" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Creative Image Manipulation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "CREval: An Automated Interpretable Evaluation for Creative I", "item": "https://sciencetostartup.com/paper/creval-an-automated-interpretable-evaluation-for-creative-image-manipulation-under-complex-instructions" } ] } ] }

Competitive landscape

An automated evaluation framework and benchmark for creative image manipulation that provides reliable metrics and identifies key challenges for model development.

Segment

Creative Image Manipulation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions

CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline