AIght_
ToolsLearnFieldsUniverseSignalHumanAbout
Take the quiz
← All concepts

Concept

Fine-Tuning

Teaching a model new habits, not new knowledge

Mankaran Singh·Updated May 17, 2026

Where this idea lives

PREREQUISITESTOOLS THAT SHOW ITFine-TuningHow AI Models Are TrainedHow AI Models Are Trained — From random noise to a model that can reason — the actual pipelineRLHFRLHF — Humans rate, model learns, weird things happen — the post-training that made models pleasant to talk to.Retrieval-Augmented GenerationRetrieval-Augmented Generation — How AI learned to look things up before opening its mouthPrompt EngineeringPrompt Engineering — The craft of talking to a model that will take you exactly as literally as it decides toDPODPO — The cheaper, often-as-good RLHF alternative — and why most labs quietly moved to it.ChatGPTChatGPTClaudeClaudeCommon misconception: Fine-tuning is the answer when prompts fail.Common misconception: Fine-tuning teaches the model new facts.Common misconception: Fine-tuned models are private by default.
prereqsrelatedtoolsmisconceptions
shows up in:Medicine & HealthcareLaw & LegalMarketing & AdvertisingBiology & Life Sciences
You might think:Fine-tuning is the answer when prompts fail.Fine-tuning teaches the model new facts.Fine-tuned models are private by default.

There's a version of this explanation that will make you want to fine-tune everything. And a version that will leave you thinking most people never need to. Both are true. The useful question is which one applies to you — and being honest about that requires understanding what fine-tuning actually does.

Base models like GPT-4 or Claude are trained on enormous amounts of text, which gives them broad, general capabilities. takes one of those models and continues training it on a smaller, specific dataset — your customer support transcripts, your legal documents, your company's writing style guide — to shift its behavior toward your particular context.

If you find yourself writing the same 800-word system prompt for every call, fine-tuning is probably worth considering. If you don't, it probably isn't.

What changes isn't the model's intelligence. And — importantly — it isn't primarily the model's knowledge either. What fine-tuning changes is behavior: tone, format, consistency, the particular way the model responds.


What fine-tuning actually adjusts

Think of a base model as someone with a strong general education and no particular professional context. They can write in many styles, explain many topics, follow many kinds of instructions. Now put them through six months at a specific company with a specific house style, handling a specific set of customer interactions.

Their capabilities don't grow. Their habits change. They start to respond the way your context expects without being told each time.

That's the outcome: reliable style and format, without having to specify it in every prompt.

from openai import OpenAI

client = OpenAI()

# Upload a JSONL file of training examples
# Each line: {"messages": [{"role": "user", ...}, {"role": "assistant", ...}]}
job = client.fine_tuning.jobs.create(
    training_file="file-abc123",
    model="gpt-4o-mini",
    hyperparameters={"n_epochs": 3},
)

# The resulting model ID looks like:
# ft:gpt-4o-mini:your-org:custom-name:abc123
print(job.id)
“Most people who think they need to fine-tune just need a better system prompt.

Fine-tuning is for when you've genuinely exhausted prompting — when you need consistency at scale, across thousands of calls, that a prompt can't reliably deliver.


◉ INTERACTIVE

Base model

|

Fine-tuned

|

Fine-tuning changes how a model responds, not what it fundamentally knows.

LoRA and the cheaper path

The original way to fine-tune a model was full fine-tuning: update every parameter. For a 70-billion-parameter model, that means moving 70 billion numbers — expensive, slow, and you need a copy of the whole model per task.

is the reason fine-tuning is accessible at all to anyone not running a research lab. Before LoRA, this was an industrial-scale operation.

(Low-Rank Adaptation)[·] changed that. Instead of training all the parameters, LoRA freezes the base model and trains a small number of additional parameters layered on top — often less than 1% of the original size. The result is almost as good as full fine-tuning, costs a fraction, and you can swap LoRA "adapters" in and out for different tasks without touching the base model.

This is what most modern fine-tuning APIs actually do under the hood. When OpenAI or Anthropic offer fine-tuning, you're almost always getting some flavor of parameter-efficient fine-tuning, not a full retrain.


When fine-tuning makes sense

You have a consistent format or structure requirement. If every response needs to be valid JSON with a specific schema, or formatted in a particular way that's hard to enforce through prompting alone, fine-tuning can make that format reliable.

You need a very specific tone. If your brand has a distinctive voice — particular vocabulary, specific warmth, a writing style that's hard to describe but easy to demonstrate — fine-tuning on examples transfers it more reliably than prompt instructions.

You're making thousands of calls. A fine-tuned model can often achieve good results with shorter prompts than a base model, because some of the context is baked in. At high volume, that's real cost savings.[·]


When it doesn't

You want the model to learn facts. Fine-tuning is poor at reliably encoding new factual knowledge. The model may appear to learn facts during training but will hallucinate around them in unpredictable ways. For factual grounding, use — give the model the information at inference time, don't try to bake it in.

The "fine-tune vs RAG" decision tree is mostly: are you trying to change how the model responds, or what it knows? FT for the first. RAG for the second.

Your prompt already gets you there. This is the majority of cases. If careful prompting gets you 90% of the way, fine-tuning is expensive and slow for marginal gains. The data collection and training process takes significant time; the prompt can be iterated in minutes.

Your use case is still changing. Fine-tuned models are snapshots. Every time your requirements shift, you need new training data and a new fine-tuning run. A prompt is cheaper to update.


What it doesn't do

Fine-tuning doesn't make a smaller model as capable as a larger one. If the base model can't perform a reasoning task, fine-tuning won't unlock that capability — it wasn't there to be taught.

It also doesn't give you complete control. The fine-tuned model's underlying tendencies come from , which is orders of magnitude larger than your dataset. You're nudging behavior within the space the base model defines, not rewriting it from scratch.


The analogy holds: fine-tuning shapes professional habits, not raw intelligence. Used for the right reasons — format, tone, consistent style at scale — it delivers. Used as a substitute for the harder work of designing good prompts and choosing the right model, it mostly costs money and time.

Know which situation you're in before you start collecting training data.

← Back to all conceptsBrowse tools →
intermediate
Read time8 min read
UpdatedMay 2026
Sources2

Read next

  1. Retrieval-Augmented Generation →
  2. Prompt Engineering →
  3. How AI Models Are Trained →
  4. RLHF →
  5. DPO →