AIght_
ToolsLearnFieldsUniverseSignalHumanAbout
Take the quiz
← All concepts

Concept

Watermarking

Invisible signatures on AI-generated text — and why most don't survive contact with reality.

Mankaran Singh·Updated May 17, 2026

Where this idea lives

PREREQUISITESTOOLS THAT SHOW ITWatermarkingTemperature & SamplingTemperature & Sampling — Why 'more creative' is not the same as 'more random' — and the knobs that actually matter.Model CollapseModel Collapse — What happens when models train on text written by other models — recursively.Synthetic DataSynthetic Data — Models training on text other models wrote — and why this isn't always bad.AI Safety & AlignmentAI Safety & Alignment — The problem of building AI that reliably does what you actually wanted — not what you literally asked forChatGPTChatGPTGeminiGeminiCommon misconception: Watermarks can identify any AI text.Common misconception: Paraphrasing breaks watermarks.Common misconception: If a watermark detector says human, it's definitely human.
prereqsrelatedtoolsmisconceptions
shows up in:Journalism & MediaEducation & TeachingSocial Work & Public PolicyLaw & Legal
You might think:Watermarks can identify any AI text.Paraphrasing breaks watermarks.If a watermark detector says human, it's definitely human.

Common misconception

“Watermarks reliably identify AI-generated text.”

The best published watermarks survive small edits and short paraphrases. They don't survive heavy rewrites, translation through another language, or determined obfuscation. False negatives are common. False positives can also happen on unusual human writing — and that's the worse failure for things like school plagiarism cases.

A text watermark biases the model's token sampling toward a hidden pattern. The pattern is invisible to readers but detectable statistically by anyone with the secret key.

The basic idea

At each generation step, the model picks the next token from a probability distribution. A watermarking scheme uses a pseudo-random function (seeded by recent tokens) to split the vocabulary into "green" and "red" lists. The model is nudged toward green tokens. Over a long enough text, the green-token frequency becomes statistically detectable.

Why detection is hard

  • Short text. Statistical signals need length to emerge. A tweet can't carry a robust watermark.
  • Editing. Replacing 20% of the tokens often breaks the pattern.
  • Multilingual translation. Round-tripping through another language destroys the watermark entirely.
  • Mixed authorship. Human-edited AI text falls between the two distributions; detectors give ambiguous scores.

What this means practically

Don't trust AI-text detectors for adversarial use cases (academic fraud, deepfake provenance). They have real false-positive rates on honest human writers — especially non-native English speakers and people who write in genre patterns the model also produces.

Watermarks may help in cooperative contexts: a platform that voluntarily marks its own outputs so downstream systems can detect them.

What to read next

Model collapse is the recursive-training problem watermarking partly addresses. Synthetic data is the related curation question.

← Back to all conceptsBrowse tools →
beginner
Read time4 min read
UpdatedMay 2026
Sources4

Read next

  1. Model Collapse →
  2. Synthetic Data →
  3. AI Safety & Alignment →