Recent advancements in generative AI are focused on enhancing efficiency, safety, and control in synthetic data generation. A notable trend is the integration of silicon photonics to accelerate diffusion models, achieving significant improvements in energy efficiency and throughput, which is crucial for commercial applications where computational costs are a concern. Concurrently, researchers are developing robust techniques for concept erasure, allowing for precise removal of undesired content in image generation, thereby addressing ethical concerns surrounding AI misuse. The introduction of unified reasoning frameworks is also noteworthy, as they combine text-to-image generation and editing into a cohesive process, improving the quality of outputs through enhanced reasoning capabilities. Additionally, dynamic methods for fusing subject and style representations are emerging, enabling more coherent and contextually relevant image synthesis. Collectively, these efforts reflect a maturation of the field, prioritizing practical deployment and user safety while pushing the boundaries of creative AI applications.
Text-to-image diffusion models have achieved high visual fidelity, yet precise control over scene semantics and fine-grained affective tone remains challenging. Human visual affect arises from the rap...
Subject-Driven Text-to-Image (T2I) Generation aims to preserve a subject's identity while editing its context based on a text prompt. A core challenge in this task is the "similarity-controllability p...
A photorealistic and immersive human avatar experience demands capturing fine, person-specific details such as cloth and hair dynamics, subtle facial expressions, and characteristic motion patterns. A...
Despite significant progress in text-to-image generation, aligning outputs with complex prompts remains challenging, particularly for fine-grained semantics and spatial relations. This difficulty stem...
In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that in...
Concept erasure in text-to-image diffusion models seeks to remove undesired concepts while preserving overall generative capability. Localized erasure methods aim to restrict edits to the spatial regi...
Diffusion models have revolutionized generative AI, with their inherent capacity to generate highly realistic state-of-the-art synthetic data. However, these models employ an iterative denoising proce...
Concept erasure is extensively utilized in image generation to prevent text-to-image models from generating undesired content. Existing methods can effectively erase narrow concepts that are specific ...
Reinforcement learning (RL) has emerged as a promising paradigm for enhancing image editing and text-to-image (T2I) generation. However, current reward models, which act as critics during RL, often su...
Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing as isolated capabilities rather than in...