Recent advancements in generative image editing are increasingly focused on enhancing control and precision across a variety of applications, addressing commercial needs for tailored content creation. Lightweight models, such as those utilizing Stacked Channel Bridging, are demonstrating competitive performance with significantly reduced computational costs, making advanced editing accessible to a broader range of users. Systems like Pinterest Canvas are refining image generation by fine-tuning task-specific variants, which has shown to improve user engagement metrics substantially. Meanwhile, frameworks like ColourCrafter and PixelSmile are pushing the boundaries of color and expression editing, respectively, allowing for more nuanced and accurate modifications. The integration of scene graph-based methods in SimGraph is also notable, as it facilitates structured control over object relationships, enhancing spatial coherence in generated content. Collectively, these developments indicate a shift toward more efficient, user-friendly tools that cater to specific editing needs, paving the way for broader adoption in creative industries.
Current unified multimodal models for image generation and editing typically rely on massive parameter scales (e.g., >10B), entailing prohibitive training costs and deployment footprints. In this work...
While recent image generation models demonstrate a remarkable ability to handle a wide variety of image generation tasks, this flexibility makes them hard to control via prompting or simple inference ...
Colour is one of the most perceptually salient yet least controllable attributes in image generation. Although recent diffusion models can modify object colours from user instructions, their results o...
The recent surge in popularity of Nano-Banana and Seedream 4.0 underscores the community's strong interest in multi-image composition tasks. Compared to single-image editing, multi-image composition p...
Recent generative image editing methods adopt layered representations to mitigate the entangled nature of raster images and improve controllability, typically relying on object-based segmentation. How...
Pre-trained flow-based models excel at synthesizing complex scenes yet lack a direct mechanism for disentangling and customizing their underlying concepts from one-shot real-world sources. To demystif...
Scene text editing seeks to modify textual content in natural images while maintaining visual realism and semantic consistency. Existing methods often require task-specific training or paired data, li...
Unified diffusion editors often rely on a fixed, shared backbone for diverse tasks, suffering from task interference and poor adaptation to heterogeneous demands (e.g., local vs global, semantic vs ph...
Recent advancements in Generative Artificial Intelligence (GenAI) have significantly enhanced the capabilities of both image generation and editing. However, current approaches often treat these tasks...
Visual autoregressive (VAR) models have recently emerged as a promising family of generative models, enabling a wide range of downstream vision tasks such as text-guided image editing. By shifting the...