Current research in audio processing is increasingly focused on enhancing the robustness and efficiency of audio manipulation techniques, addressing both commercial and practical challenges. Recent work has introduced innovative frameworks for speech editing detection and content localization, utilizing large language models to improve semantic accuracy while maintaining naturalness. In watermarking, new methods are being developed to ensure audio integrity against advanced neural resynthesis, which is crucial for copyright protection in digital media. Additionally, advancements in sound source localization are tackling real-world deployment issues by addressing data imbalance, thereby improving accuracy in dynamic environments. The integration of Lipschitz continuity into audio signal processing architectures is enhancing stability, while novel approaches to bandwidth extension are optimizing speech clarity in low-bandwidth scenarios. These developments suggest a concerted effort to refine audio technologies for applications in entertainment, communication, and security, making them more resilient and user-friendly in increasingly complex audio landscapes.
While existing audio watermarking techniques have achieved strong robustness against traditional digital signal processing (DSP) attacks, they remain vulnerable to neural resynthesis. This occurs beca...
Digital audio workstations expose rich effect chains, yet a semantic gap remains between perceptual user intent and low-level signal-processing parameters. We study retrieval-grounded audio effect con...
Speech editing achieves semantic inversion by performing fine-grained segment-level manipulation on original utterances, while preserving global perceptual naturalness. Existing detection studies main...
Sound source localization (SSL) demonstrates remarkable results in controlled settings but struggles in real-world deployment due to dual imbalance challenges: intra-task imbalance arising from long-t...
Neural audio codecs are at the core of modern conversational speech technologies, converting continuous speech into sequences of discrete tokens that can be processed by LLMs. However, existing codecs...
Speech Bandwidth Extension improves clarity and intelligibility by restoring/inferring appropriate high-frequency content for low-bandwidth speech. Existing methods often rely on spectrogram or wavefo...
Neural audio codecs (NACs) typically encode the short-term energy (gain) and normalized structure (shape) of speech/audio signals jointly within the same latent space. As a result, they are poorly rob...