Recent advancements in dataset distillation are focusing on enhancing the efficiency and effectiveness of training deep learning models by creating compact synthetic datasets that retain essential information. New frameworks are emerging that optimize both the precision and compactness of datasets, addressing the limitations of existing methods that primarily target sample reduction. Techniques such as difficulty-guided sampling are being developed to align the distillation process more closely with downstream tasks, ensuring that the generated datasets are not only smaller but also more relevant for specific applications. Additionally, specialized methods for spatio-temporal data are gaining traction, allowing for significant reductions in training time and resource consumption. The integration of hierarchical semantics and early vision-language fusion in generative models is also improving the quality of synthetic data, leading to better performance across various tasks. Collectively, these innovations are poised to solve pressing commercial challenges related to data efficiency and model training in diverse fields, from autonomous systems to healthcare analytics.
Dataset Distillation (DD) compresses large datasets into compact synthetic ones that maintain training performance. However, current methods mainly target sample reduction, with limited consideration ...
Dataset distillation (DD) compresses a large training set into a small synthetic set, reducing storage and training cost, and has shown strong results on general benchmarks. Decoupled DD further impro...
Spatio-temporal time series are widely used in real-world applications, including traffic prediction and weather forecasting. They are sequences of observations over extensive periods and multiple loc...
Dataset distillation (DD) aims to synthesize compact training sets that enable models to achieve high accuracy with significantly fewer samples. Recent diffusion-based DD methods commonly introduce se...
Training machine learning models on massive datasets is expensive and time-consuming. Dataset distillation addresses this by creating a small synthetic dataset that achieves the same performance as th...
Dataset distillation often prioritizes global semantic proximity when creating small surrogate datasets for original large-scale ones. However, object semantics are inherently hierarchical. For exampl...
In this paper, we propose difficulty-guided sampling (DGS) to bridge the target gap between the distillation objective and the downstream task, therefore improving the performance of dataset distillat...
Dataset distillation (DD) aims to compress large-scale datasets into compact synthetic counterparts for efficient model training. However, existing DD methods exhibit substantial performance degradati...