ARXIV:2606.05678 · UNCATEGORIZED · SUBMITTED 06 JUN · 03:21 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

Yifan Liao · Zongmin Zhang · Zhen Sun · Yuhui Sun · Xinhu Zheng · Xinlei He · arXiv

ScienceToStartup currently rates this 0.0/10 on the public viability pass. Extensive experiments show that, when optimized only on raw Whisper-small as a public surrogate model, our attack transfers effectively to…

Blocked on Code›Score0.0Evidence unverified

Opportunity summary

Pain customer pain not on file

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Automatic speech recognition (ASR) systems have become widely used for multilingual speech-to-text transcription.

METHOD

Full abstract

Automatic speech recognition (ASR) systems have become widely used for multilingual speech-to-text transcription. Their robustness to adversarial attacks has become an important topic for the community. Existing adversarial attacks directly add adversarial noise to the speech audio. However, prior work has shown that existing adversarial attacks face two limitations: they often transfer poorly to black-box ASR systems and are increasingly mitigated by defenses tailored to input-space perturbations. In this work, we propose a Clean-Referenced Feature-Vocoder Attack, a surrogate-based black-box attack that moves the adversarial search space from raw waveforms to self-supervised learning (SSL) representations. To address the transferability limitation, we perturb more generalizable acoustic-phonetic representations rather than low-level waveform samples, reducing dependence on surrogate-specific waveform gradients and encouraging adversarial perturbations that generalize across ASR systems. To bypass different defenses, we shift the adversarial signal from explicit additive waveform noise to SSL feature-space perturbations and reconstruct them through a vocoder into speech-like waveform adversarial signals, making the resulting samples less aligned with waveform-bounded defenses. Extensive experiments show that, when optimized only on raw Whisper-small as a public surrogate model, our attack transfers effectively to black-box ASR models with a +26.6 WER improvement over the SOTA baseline, while also remaining effective against multiple training defenses with a +36.2 WER improvement. These results reveal a blind spot in current ASR robustness evaluation.

RESULT

WHY NOW

Uncategorized moved forward this cycle; last verified June 2026. Public score 0.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score0.0

Paincustomer pain not on file

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

Yifan Liao · Zongmin Zhang · Zhen Sun · Yuhui Sun · Xinhu Zheng · Xinlei He · arXiv

Competitive landscape

No named competitor graph is public yet; the page still exposes the segment, adoption evidence, and score state so the commercial read is not blank.

Segment

Uncategorized

Adoption evidence

No public code link in the paper record yet

Commercial read

0.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "6f6a88f2-a0de-401d-9ddd-2d08ac6a4453", "arxiv_id": "2606.05678", "canonical_route": "/paper/beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition", "endpoints": { "paper_pack": "/api/v1/paper/beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition/paper-pack", "build_passport": "/api/v1/paper/beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition", "normalized_query": "2606.05678", "route": "/paper/beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition", "paper_ref": "beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition#webpage", "url": "https://sciencetostartup.com/paper/beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition", "name": "Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition", "description": "Automatic speech recognition (ASR) systems have become widely used for multilingual speech-to-text transcription. Their robustness to adversarial attacks has become an important topic for the community. Existing adversarial attacks directly add adversarial noise to the speech audio. However, prior work has shown that existing adversarial attacks face two limitations: they often transfer poorly to black-box ASR systems and are increasingly mitigated by defenses tailored to input-space perturbations. In this work, we propose a Clean-Referenced Feature-Vocoder Attack, a surrogate-based black-box attack that moves the adversarial search space from raw waveforms to self-supervised learning (SSL) representations. To address the transferability limitation, we perturb more generalizable acoustic-phonetic representations rather than low-level waveform samples, reducing dependence on surrogate-specific waveform gradients and encouraging adversarial perturbations that generalize across ASR systems. To bypass different defenses, we shift the adversarial signal from explicit additive waveform noise to SSL feature-space perturbations and reconstruct them through a vocoder into speech-like waveform adversarial signals, making the resulting samples less aligned with waveform-bounded defenses. Extensive experiments show that, when optimized only on raw Whisper-small as a public surrogate model, our attack transfers effectively to black-box ASR models with a +26.6 WER improvement over the SOTA baseline, while also remaining effective against multiple training defenses with a +36.2 WER improvement. These results reveal a blind spot in current ASR robustness evaluation.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition#scholarlyArticle", "headline": "Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition", "description": "Automatic speech recognition (ASR) systems have become widely used for multilingual speech-to-text transcription. Their robustness to adversarial attacks has become an important topic for the community. Existing adversarial attacks directly add adversarial noise to the speech audio. However, prior work has shown that existing adversarial attacks face two limitations: they often transfer poorly to black-box ASR systems and are increasingly mitigated by defenses tailored to input-space perturbati…", "url": "https://sciencetostartup.com/paper/beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition", "sameAs": "https://arxiv.org/abs/2606.05678", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2606.05678" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-06-04T04:00:48.000Z", "author": [ { "@type": "Person", "name": "Yifan Liao" }, { "@type": "Person", "name": "Zongmin Zhang" }, { "@type": "Person", "name": "Zhen Sun" }, { "@type": "Person", "name": "Yuhui Sun" }, { "@type": "Person", "name": "Xinhu Zheng" }, { "@type": "Person", "name": "Xinlei He" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Uncategorized" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Uncategorized", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Beyond Waveform Robustness: Robust Feature-Vocoder Adversari", "item": "https://sciencetostartup.com/paper/beyond-waveform-robustness-robust-feature-vocoder-adversarial-attacks-on-automatic-speech-recognition" } ] } ] }

Competitive landscape

No named competitor graph is public yet; the page still exposes the segment, adoption evidence, and score state so the commercial read is not blank.

Segment

Uncategorized

Adoption evidence

No public code link in the paper record yet

Commercial read

0.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline