Every AI model you've ever used started as random noise.
Billions of parameters — numerical weights inside a neural network — initialised to essentially nothing. Then, through a process of repeated prediction and correction running for weeks on thousands of specialised chips, those weights gradually organised into something that can write, reason, and converse.
Understanding how this happens doesn't require a PhD. It requires one key insight: a model gets better by being wrong.
Phase 1 — Pretraining
The first phase is . The model sees an enormous quantity of text — web pages, books, code, academic papers — and learns a single task: predict the next token.
Feed text
The cat sat on the...
Predict
Model guesses next token: "mat"
Compare
Check against actual next token
Update
Adjust weights to reduce error
Repeat
Trillions of times across all training data
That's it. No labels, no human feedback, no explicit teaching of facts. Just: see text, predict next token, get corrected, update, repeat. At a massive enough scale, this simple task forces the model to learn grammar, facts, reasoning patterns, and world knowledge — because all of those are required to predict text well.
A model that doesn't understand causality will predict poorly. A model that doesn't know common facts will predict poorly. Prediction accuracy is a proxy for understanding.
INSIGHT
Pretraining is unsupervised — it requires no human labels. This is why models can be trained on internet-scale data. The "label" for every piece of text is just the next word.
How much compute and how much data? [·] give surprisingly clean answers. For a given compute budget, there's an optimal ratio of model size to training tokens — too big a model with too little data wastes compute; too small a model with too much data wastes compute the other way. The Chinchilla paper recalibrated the whole industry's intuition about this.
After pretraining, the model is a powerful but raw text predictor. It will complete your sentence — not necessarily in the way you intended, and not necessarily helpfully.
Phase 2 — Instruction tuning
Raw pretraining produces a model that can continue text. It doesn't produce a model that follows instructions. For that, a second phase is needed.
(also called supervised fine-tuning, SFT) trains the model on curated examples of instruction-following. A dataset of prompts and ideal responses — written or reviewed by humans — teaches the model to behave as an assistant rather than a text completer.
The result is a model that responds to "Summarise this article" by summarising — not by continuing the article.
Phase 3 — RLHF
The final phase is where modern models get their characteristic polish: .[·]
The process works in two steps:
Reward model training. Humans compare pairs of model outputs and choose which is better. These preferences train a separate "reward model" that learns to predict human preference scores for any given output.
Policy optimisation. The main model is then fine-tuned using reinforcement learning — it generates outputs, the reward model scores them, and the main model's weights are updated to produce higher-scoring outputs over time.
It's also imperfect: human raters introduce their own biases, and reward hacking (the model finding ways to score well that diverge from real helpfulness) remains a real problem.[·]
What training doesn't give you
Training gives the model compressed knowledge from its training data. It doesn't give the model:
- Knowledge of events after the training cutoff
- Access to real-time information
- The ability to know what it doesn't know
- Guaranteed factual accuracy on rare topics
These aren't flaws that will be engineered away. They're the nature of a statistical model trained on a fixed dataset. Knowing this shapes how you use AI well — you bring current information, verify critical facts, and treat the model's knowledge cutoff as a hard boundary.
The model is not a database. It's a compressed, reasoning-capable representation of what it was trained on. Use it accordingly.