How AI Models Are Trained

Every AI model you've ever used started as random noise.

Billions of parameters — numerical weights inside a neural network — initialised to essentially nothing. Then, through a process of repeated prediction and correction running for weeks on thousands of specialised chips, those weights gradually organised into something that can write, reason, and converse.

Understanding how this happens doesn't require a PhD. It requires one key insight: a model gets better by being wrong.

Step 1 of 4

Phase 1 — Pretraining

The raw foundation

TRILLIONS OF TOKENS

The model sees trillions of tokens from the web, books, and code. One job: predict the next token. Loss falls steeply as weights learn language structure.

Phase 1 — Pretraining

The first phase is . The model sees an enormous quantity of text — web pages, books, code, academic papers — and learns a single task: predict the next token.

Feed text

The cat sat on the...

›

Predict

Model guesses next token: "mat"

›

Compare

Check against actual next token

›

Update

Adjust weights to reduce error

›

Repeat

Trillions of times across all training data

That's it. No labels, no human feedback, no explicit teaching of facts. Just: see text, predict next token, get corrected, update, repeat. At a massive enough scale, this simple task forces the model to learn grammar, facts, reasoning patterns, and world knowledge — because all of those are required to predict text well.

A model that doesn't understand causality will predict poorly. A model that doesn't know common facts will predict poorly. Prediction accuracy is a proxy for understanding.

INSIGHT

Pretraining is unsupervised — it requires no human labels. This is why models can be trained on internet-scale data. The "label" for every piece of text is just the next word.

How much compute and how much data? [·] give surprisingly clean answers. For a given compute budget, there's an optimal ratio of model size to training tokens — too big a model with too little data wastes compute; too small a model with too much data wastes compute the other way. The Chinchilla paper recalibrated the whole industry's intuition about this.

After pretraining, the model is a powerful but raw text predictor. It will complete your sentence — not necessarily in the way you intended, and not necessarily helpfully.

Phase 2 — Instruction tuning

Raw pretraining produces a model that can continue text. It doesn't produce a model that follows instructions. For that, a second phase is needed.

(also called supervised fine-tuning, SFT) trains the model on curated examples of instruction-following. A dataset of prompts and ideal responses — written or reviewed by humans — teaches the model to behave as an assistant rather than a text completer.

The result is a model that responds to "Summarise this article" by summarising — not by continuing the article.

Phase 3 — RLHF

The final phase is where modern models get their characteristic polish: .[·]

The process works in two steps:

Reward model training. Humans compare pairs of model outputs and choose which is better. These preferences train a separate "reward model" that learns to predict human preference scores for any given output.

Policy optimisation. The main model is then fine-tuned using reinforcement learning — it generates outputs, the reward model scores them, and the main model's weights are updated to produce higher-scoring outputs over time.

With RLHF

Without RLHF

Refusals

Calibrated — declines harmful requests, handles edge cases

None or naive — either refuses too much or too little

Tone

Consistently helpful, appropriately hedged

Can be terse, overconfident, or verbose randomly

Format

Structured when structure helps, conversational when not

Inconsistent

Safety

Trained on human judgments of harm

Only pretraining data distribution

It's also imperfect: human raters introduce their own biases, and reward hacking (the model finding ways to score well that diverge from real helpfulness) remains a real problem.[·]

What training doesn't give you

Training gives the model compressed knowledge from its training data. It doesn't give the model:

Knowledge of events after the training cutoff
Access to real-time information
The ability to know what it doesn't know
Guaranteed factual accuracy on rare topics

These aren't flaws that will be engineered away. They're the nature of a statistical model trained on a fixed dataset. Knowing this shapes how you use AI well — you bring current information, verify critical facts, and treat the model's knowledge cutoff as a hard boundary.

The model is not a database. It's a compressed, reasoning-capable representation of what it was trained on. Use it accordingly.

Every AI model you've ever used started as random noise.

Understanding how this happens doesn't require a PhD. It requires one key insight: a model gets better by being wrong.

Step 1 of 4

Phase 1 — Pretraining

The raw foundation

TRILLIONS OF TOKENS

The model sees trillions of tokens from the web, books, and code. One job: predict the next token. Loss falls steeply as weights learn language structure.

Phase 1 — Pretraining

The first phase is . The model sees an enormous quantity of text — web pages, books, code, academic papers — and learns a single task: predict the next token.

Feed text

The cat sat on the...

›

Predict

Model guesses next token: "mat"

›

Compare

Check against actual next token

›

Update

Adjust weights to reduce error

›

Repeat

Trillions of times across all training data

A model that doesn't understand causality will predict poorly. A model that doesn't know common facts will predict poorly. Prediction accuracy is a proxy for understanding.

INSIGHT

Pretraining is unsupervised — it requires no human labels. This is why models can be trained on internet-scale data. The "label" for every piece of text is just the next word.

After pretraining, the model is a powerful but raw text predictor. It will complete your sentence — not necessarily in the way you intended, and not necessarily helpfully.

Phase 2 — Instruction tuning

Raw pretraining produces a model that can continue text. It doesn't produce a model that follows instructions. For that, a second phase is needed.

The result is a model that responds to "Summarise this article" by summarising — not by continuing the article.

Phase 3 — RLHF

The final phase is where modern models get their characteristic polish: .[·]

The process works in two steps:

With RLHF

Without RLHF

Refusals

Calibrated — declines harmful requests, handles edge cases

None or naive — either refuses too much or too little

Tone

Consistently helpful, appropriately hedged

Can be terse, overconfident, or verbose randomly

Format

Structured when structure helps, conversational when not

Inconsistent

Safety

Trained on human judgments of harm

Only pretraining data distribution

It's also imperfect: human raters introduce their own biases, and reward hacking (the model finding ways to score well that diverge from real helpfulness) remains a real problem.[·]

What training doesn't give you

Training gives the model compressed knowledge from its training data. It doesn't give the model:

Knowledge of events after the training cutoff
Access to real-time information
The ability to know what it doesn't know
Guaranteed factual accuracy on rare topics

The model is not a database. It's a compressed, reasoning-capable representation of what it was trained on. Use it accordingly.