rebecca writings thoughts ships

LLMs from Scratch - Pt. 5

Nov 5, 2025

·

2 min read

tl;dr: A trained language model is powerful, but not automatically helpful. This final post explains how models are adapted for real tasks, how they’re evaluated, and where their limits are.

A Trained Model Is Still Generic

After training, a language model is good at predicting text.

But that doesn’t mean it:

  • Answers questions the way you expect
  • Follows instructions
  • Uses the right tone
  • Stays focused on a task

To make it useful, we usually add another step.


Fine-Tuning: Teaching the Model a Job

Fine-tuning means continuing training on a smaller, more specific dataset.

Examples:

  • Question → answer pairs
  • Support conversations
  • Domain-specific documents

The model already understands language. Fine-tuning nudges it toward a particular behavior.

Think of it like:

  • Learning English
  • Then learning how to write emails
  • Then learning how to write professional emails

Instruction Training

Some fine-tuning focuses on teaching the model to follow instructions.

Instead of raw text, the model sees examples like:

  • Prompt: “Explain this simply”
  • Response: a clear explanation

This is how models learn to:

  • Be helpful
  • Be concise
  • Match human expectations

It’s still prediction — just applied to instruction patterns.


How We Know If a Model Is Good

Models are evaluated differently depending on the task:

  • Can it predict unseen text well?
  • Does it answer correctly?
  • Does it stay consistent?

Automatic metrics help, but human evaluation is still important — especially for quality and safety.


Training vs Using the Model

Once deployed, the model stops learning.

At that point:

  • It only runs forward
  • It doesn’t update its weights
  • It predicts one token at a time

This phase is called inference.

Everything users experience — speed, cost, context length — comes from inference, not training.


Important Limitations

Even very large models:

  • Don’t truly “understand” content
  • Can sound confident while being wrong
  • Reflect patterns and biases in their data

They are tools — not sources of truth.

Understanding their limits is part of using them responsibly.


Final Wrap-Up

Across this series, we’ve followed the full path:

Text → Tokens → Attention → Transformers → Training → Fine-Tuning → Use

Once you understand the basics, the mystery fades.

What’s left is a powerful system — built on simple ideas, scaled carefully, and used best with clarity and intention.