MLOps

What is Fine-tuning?

Further training a pre-trained AI model on a specific dataset to specialize it for a particular task or style.

Definition

Fine-tuning is the process of taking a pre-trained foundation model and continuing to train it on a smaller, task-specific dataset. The result is a model that retains the general knowledge of the base model but behaves differently in specific contexts — following a particular format, adopting a specific tone, or excelling at a narrow task. Fine-tuning updates the model's weights, unlike RAG which retrieves external information at inference time.

Why it matters

Fine-tuning is the right tool when you need consistent, low-latency behavior at a specific task at high volume. It is how companies build specialized models: a legal AI that always reasons like a lawyer, a coding AI that always generates TypeScript in your style, or a customer service AI that knows your product deeply. PMs evaluating AI product architectures need to understand when fine-tuning is worth the cost versus alternatives like RAG or prompt engineering.

How it works

The fine-tuning process: (1) collect training examples (input/output pairs showing the desired behavior), (2) format them for the training API (most providers use JSONL files), (3) submit the training job (OpenAI, Anthropic, HuggingFace, etc. all offer fine-tuning APIs), (4) the provider runs additional training passes on your data, (5) you receive a custom model endpoint. Cost is proportional to dataset size and number of training steps.

Examples in practice

Brand-voice content model

A media company fine-tunes GPT-4o-mini on 5,000 examples of their editorial style. The fine-tuned model produces on-brand content with 90% less editing than prompting the base model.

Code format specialization

An engineering team fine-tunes a code model on their internal codebase conventions. The model now generates code that matches their patterns, naming conventions, and architecture without extensive prompting.

Common questions about Fine-tuning

When should I fine-tune instead of using RAG?

Use fine-tuning when you need to change HOW the model behaves (style, format, reasoning patterns). Use RAG when you need to change WHAT the model knows (facts, documents, current data). Fine-tuning is expensive to update when data changes; RAG is cheap to update but has retrieval overhead.

How much data do I need to fine-tune a model?

You can get meaningful results with as few as 50–100 high-quality examples for simple formatting tasks. For substantive behavior change (reasoning style, domain knowledge), 500–5,000 examples are typical. Quality matters far more than quantity — 100 excellent examples beat 10,000 mediocre ones.

How much does fine-tuning cost?

It varies by provider and model size. OpenAI fine-tuning GPT-4o-mini costs roughly $3–$25 for a typical dataset of 1,000 examples. The fine-tuned model then costs the same per-token as the base model. For very high inference volumes, a fine-tuned smaller model can be much cheaper than prompting a larger one.