Learning to Evade: Adaptive Attacks on Audio Watermarking

Abstract: Advances in generative audio have intensified copyright concerns, making audio watermarking increasingly important for asserting ownership. However, existing audio watermarking methods are vulnerable to adversarial attacks. We find that watermark decoder message probabilities follow normal distributions, a property exploited by defenses to detect manipulations. This paper introduces AWM, an adaptive audio watermark attack method designed to bypass existing defense strategies. AWM uses a two-stage optimization: the first stage ensures attack success, while the second improves audio quality. To evade detection, it estimates normal distribution parameters from limited samples of the target audio, and then adaptively steers decoded probabilities back into the estimated range. Evaluated on two watermarking methods across three voice datasets, AWM achieves high success while bypassing state-of-the-art detectors: detection rates are below 10% for replacement and creation, and 0% for removal./

Demonstration

We compare our attack method with AudioMarkBench: AuioMarkBench [NeurIPS 2024] a benchmark designed to evaluate the robustness of audio watermarking against adversarial attacks

We evaluate two audio watermarking methods:

Timbre [NDSS 2024]
AudioSeal [ICML 2024]

Distribution

For both clean/unwatermarked and watermarked audio, we observe two distinct distributions, each following a normal distribution pattern. We show the distributions of AudioMarkBench and Our Attack in the watermark replacement scenario, the red boxes show the outliers in the AudioMarkBench

	AudioSeal	Timbre
Clean/Unwatermarked
Watermarked
AudioMarkBench
Our Attack

Some explanations for Spectrogram and Distribution in the Watermark Replacement, Watermark Creation, and Watermark Removal:

Spectrogram. Red Box: some noticeable noise; Green Box: some noticeable noise is reduced.
Distribution. Orange Color: outliers; Blue Color: not outliers.

In the Distribution, the examplations for the message probabilities:

AudioSeal: If the message probabilities > 0.5, the watermark message (binary message format) bit is 1; otherwise, the watermark message bit is 0.
Timbre: If the message probabilities >= 0, the watermark message (binary message format) bit is 1; otherwise, the watermark message bit is 0.

Watermark Replacement

Watermark replacement aims to replace an existing watermark with a different one

Attack (AudioSeal)	Original (Clean)	Watermark	AudioMarkBench	Ours	Ours (+opt)
Audios
Spectrogram
Watermark Message	----------------	000001110101100	1111111100000000	1111111100000000	1111111100000000
Distribution	----------------

Attack (Timbre)	Original (Clean)	Watermark	AudioMarkBench	Ours	Ours (+opt)
Audios
Spectrogram
Watermark Message	----------------	1111010111111110	1111111100000000	1111111100000000	1111111100000000
Distribution	----------------

Watermark Creation

Watermark creation aims to embed a new watermark into clean audio

Attack (AudioSeal)	Original (Clean)	Watermark	AudioMarkBench	Ours	Ours (+opt)
Audios
Spectrogram
Watermark Message	----------------	1111111011011111	1111111100000000	1111111100000000	1111111100000000
Distribution	----------------

Attack (Timbre)	Original (Clean)	Watermark	AudioMarkBench	Ours	Ours (+opt)
Audios
Spectrogram
Watermark Message	----------------	1011010010100000	1111111100000000	1111111100000000	1111111100000000
Distribution	----------------

Watermark Removal

watermark removal aims to eliminate the original watermark from a watermarked audio

Attack (AudioSeal)	Original (Clean)	Watermark	AudioMarkBench	Ours	Ours (+opt)
Audios
Spectrogram
Watermark Message	----------------	0011001110110010	0011001110111010	0110110110100101	0111011110100000
Distribution	----------------

Attack (Timbre)	Original (Clean)	Watermark	AudioMarkBench	Ours	Ours (+opt)
Audios
Spectrogram
Watermark Message	----------------	1011101010000010	0100011101111101	0100110011011000	0100111011000000
Distribution	----------------

Others

Wavmark [arXiv]

We present three distributions in the Wavmark.

Left. This the distribuion for the message probabilities for the clean samples.
Middle. This the distribuion for the message probabilities for the watermarked samples. The decoded binary message is 0.
Right. This the distribuion for the message probabilities for the watermarked samples. The decoded binary message is 1.

Clean Distribution	Watemark Distribution (bit=0)	Watemark Distribution (bit=1)

Collaborative-watermarking-with-codecs [ICASSP 2025]

We use the zero-bit watermark: the watermark is present in watermarked audio and not present in clean speech. If the message probabilities > 0.5, it is classified as watermarked; otherwise, it is classified as clean. The distribution for the blue color is the true message probabilities distribution. We think it is still can be shown for the normal distribution.

Left (Blue Color). This is the distribution (right-truncated normal distribution) for the decoded clean message probabilies. The left side is the true distribution and the mean is 0.
Right (Blue Color). This is the distribution (left-truncated normal distribution) for the decoded watermark message probabilies. The right side is the true distribution and the mean is 1.

Clean Distribution	Watemark Distribution

SilentCipher [Interspeech 2024]

If the confidence score >= 0.95, it is classified as watermarked; otherwise, it is classified as clean.

Left. The decoded clean message probabilies follow the normal distribution, and the mean is around 0.47.
Right. The decoded watermark message probabilities for both binary message 0 and 1 tend to 1.

Clean Distribution	Watemark Distribution