Learning to Evade: Adaptive Attacks on Audio Watermarking

Abstract: Advances in generative audio have intensified copyright concerns, making audio watermarking increasingly important for asserting ownership. However, existing audio watermarking methods are vulnerable to adversarial attacks. We find that watermark decoder message probabilities follow normal distributions, a property exploited by defenses to detect manipulations. This paper introduces AWM, an adaptive audio watermark attack method designed to bypass existing defense strategies. AWM uses a two-stage optimization: the first stage ensures attack success, while the second improves audio quality. To evade detection, it estimates normal distribution parameters from limited samples of the target audio, and then adaptively steers decoded probabilities back into the estimated range. Evaluated on two watermarking methods across three voice datasets, AWM achieves high success while bypassing state-of-the-art detectors: detection rates are below 10% for replacement and creation, and 0% for removal./

Demonstration

We compare our attack method with AudioMarkBench: AuioMarkBench [NeurIPS 2024] a benchmark designed to evaluate the robustness of audio watermarking against adversarial attacks

We evaluate two audio watermarking methods:

  1. Timbre [NDSS 2024]
  2. AudioSeal [ICML 2024]

Distribution

For both clean/unwatermarked and watermarked audio, we observe two distinct distributions, each following a normal distribution pattern. We show the distributions of AudioMarkBench and Our Attack in the watermark replacement scenario, the red boxes show the outliers in the AudioMarkBench

AudioSeal Timbre
Clean/Unwatermarked Watermark Pipeline Watermark Pipeline
Watermarked Watermark Pipeline Watermark Pipeline
AudioMarkBench Watermark Pipeline Watermark Pipeline
Our Attack Watermark Pipeline Watermark Pipeline

Some explanations for Spectrogram and Distribution in the Watermark Replacement, Watermark Creation, and Watermark Removal:

  1. Spectrogram. Red Box: some noticeable noise; Green Box: some noticeable noise is reduced.
  2. Distribution. Orange Color: outliers; Blue Color: not outliers.

In the Distribution, the examplations for the message probabilities:

  1. AudioSeal: If the message probabilities > 0.5, the watermark message (binary message format) bit is 1; otherwise, the watermark message bit is 0.
  2. Timbre: If the message probabilities >= 0, the watermark message (binary message format) bit is 1; otherwise, the watermark message bit is 0.

Watermark Replacement

Watermark replacement aims to replace an existing watermark with a different one

Attack (AudioSeal) Original (Clean) Watermark AudioMarkBench Ours Ours (+opt)
Audios
Spectrogram Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline
Watermark Message ---------------- 000001110101100 1111111100000000 1111111100000000 1111111100000000
Distribution ---------------- Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline
Attack (Timbre) Original (Clean) Watermark AudioMarkBench Ours Ours (+opt)
Audios
Spectrogram Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline
Watermark Message ---------------- 1111010111111110 1111111100000000 1111111100000000 1111111100000000
Distribution ---------------- Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline

Watermark Creation

Watermark creation aims to embed a new watermark into clean audio

Attack (AudioSeal) Original (Clean) Watermark AudioMarkBench Ours Ours (+opt)
Audios
Spectrogram Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline
Watermark Message ---------------- 1111111011011111 1111111100000000 1111111100000000 1111111100000000
Distribution ---------------- Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline
Attack (Timbre) Original (Clean) Watermark AudioMarkBench Ours Ours (+opt)
Audios
Spectrogram Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline
Watermark Message ---------------- 1011010010100000 1111111100000000 1111111100000000 1111111100000000
Distribution ---------------- Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline

Watermark Removal

watermark removal aims to eliminate the original watermark from a watermarked audio

Attack (AudioSeal) Original (Clean) Watermark AudioMarkBench Ours Ours (+opt)
Audios
Spectrogram Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline
Watermark Message ---------------- 0011001110110010 0011001110111010 0110110110100101 0111011110100000
Distribution ---------------- Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline
Attack (Timbre) Original (Clean) Watermark AudioMarkBench Ours Ours (+opt)
Audios
Spectrogram Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline
Watermark Message ---------------- 1011101010000010 0100011101111101 0100110011011000 0100111011000000
Distribution ---------------- Watermark Pipeline Watermark Pipeline Watermark Pipeline Watermark Pipeline

Others

Wavmark [arXiv]

We present three distributions in the Wavmark.

  1. Left. This the distribuion for the message probabilities for the clean samples.
  2. Middle. This the distribuion for the message probabilities for the watermarked samples. The decoded binary message is 0.
  3. Right. This the distribuion for the message probabilities for the watermarked samples. The decoded binary message is 1.
Clean Distribution Watemark Distribution (bit=0) Watemark Distribution (bit=1)
Watermark Pipeline Watermark Pipeline Watermark Pipeline

Collaborative-watermarking-with-codecs [ICASSP 2025]

We use the zero-bit watermark: the watermark is present in watermarked audio and not present in clean speech. If the message probabilities > 0.5, it is classified as watermarked; otherwise, it is classified as clean. The distribution for the blue color is the true message probabilities distribution. We think it is still can be shown for the normal distribution.

  1. Left (Blue Color). This is the distribution (right-truncated normal distribution) for the decoded clean message probabilies. The left side is the true distribution and the mean is 0.
  2. Right (Blue Color). This is the distribution (left-truncated normal distribution) for the decoded watermark message probabilies. The right side is the true distribution and the mean is 1.
Clean Distribution Watemark Distribution
Watermark Pipeline Watermark Pipeline

SilentCipher [Interspeech 2024]

If the confidence score >= 0.95, it is classified as watermarked; otherwise, it is classified as clean.

  1. Left. The decoded clean message probabilies follow the normal distribution, and the mean is around 0.47.
  2. Right. The decoded watermark message probabilities for both binary message 0 and 1 tend to 1.
Clean Distribution Watemark Distribution
Watermark Pipeline Watermark Pipeline