Model collapse is the failure mode where each generation of a model, trained on the output of the previous generation, gradually loses variety. Rare patterns disappear; the distribution narrows. After a few generations, the model can produce confident, fluent text that has lost contact with the original data's diversity.
Why it happens
Models are probability distributions. Sampling from them tends to draw from the high-probability mass. Train on those samples, and the next model's distribution shifts toward the high-mass region. Repeat, and the tails — the rare correct cases, the weird-but-true facts — vanish.
Why it matters now
The web is filling up with AI-generated text. The next generation of models will train on it whether labs want to or not. Distinguishing human-written text from model-generated text is the live problem.
What labs do
- Heavy filtering of training data — model detectors, source whitelists.
- Synthetic data only where it's verified (math, code, structured tasks).
- Curated human-written corpora (books, news archives, expert communities).
- Watermarking attempts (still nascent).
What to read next
Synthetic data is the broader topic. Watermarking is one attempt at making AI-generated text detectable.