Watermarking — AIght

A text watermark biases the model's token sampling toward a hidden pattern. The pattern is invisible to readers but detectable statistically by anyone with the secret key.

Mode:

Themodelsamplesthenexttokenfromaprobabilitydistribution.Asecretkeysplitsvocabularyintogreenandredlists.Greentokenswin.Always

Green-list ratio19 / 30 tokens (63%)

0%detection threshold ~60%100%

Readers see no difference. A watermark detector reads the ratio.

The basic idea

At each generation step, the model picks the next token from a probability distribution. A watermarking scheme uses a pseudo-random function (seeded by recent tokens) to split the vocabulary into "green" and "red" lists. The model is nudged toward green tokens. Over a long enough text, the green-token frequency becomes statistically detectable.

Why detection is hard

Short text. Statistical signals need length to emerge. A tweet can't carry a robust watermark.
Editing. Replacing 20% of the tokens often breaks the pattern.
Multilingual translation. Round-tripping through another language destroys the watermark entirely.
Mixed authorship. Human-edited AI text falls between the two distributions; detectors give ambiguous scores.

What this means practically

Don't trust AI-text detectors for adversarial use cases (academic fraud, deepfake provenance). They have real false-positive rates on honest human writers — especially non-native English speakers and people who write in genre patterns the model also produces.

Watermarks may help in cooperative contexts: a platform that voluntarily marks its own outputs so downstream systems can detect them.

What to read next

Model collapse is the recursive-training problem watermarking partly addresses. Synthetic data is the related curation question.

A text watermark biases the model's token sampling toward a hidden pattern. The pattern is invisible to readers but detectable statistically by anyone with the secret key.

Mode:

Themodelsamplesthenexttokenfromaprobabilitydistribution.Asecretkeysplitsvocabularyintogreenandredlists.Greentokenswin.Always

Green-list ratio19 / 30 tokens (63%)

0%detection threshold ~60%100%

Readers see no difference. A watermark detector reads the ratio.

The basic idea

Why detection is hard

Short text. Statistical signals need length to emerge. A tweet can't carry a robust watermark.
Editing. Replacing 20% of the tokens often breaks the pattern.
Multilingual translation. Round-tripping through another language destroys the watermark entirely.
Mixed authorship. Human-edited AI text falls between the two distributions; detectors give ambiguous scores.

What this means practically

Watermarks may help in cooperative contexts: a platform that voluntarily marks its own outputs so downstream systems can detect them.

What to read next

Model collapse is the recursive-training problem watermarking partly addresses. Synthetic data is the related curation question.