Deep Dive

How LLMs Learn

From reading the internet to following your instructions

Temperature controls randomness in generation. But how did the model learn what words to generate in the first place? Let's explore how LLMs are trained.

Review: Temperature Playground

Large Language Models go through multiple stages of training before they can have a conversation with you. First, they learn to predict text by reading billions of web pages. Then, they're fine-tuned for specific tasks. Finally, human feedback shapes their behavior to be helpful and safe. Let's explore each stage.

1. Learning to Predict

LLMs learn by playing a simple game millions of times: given some text, predict the next word. This is called self-supervised learning because no human labels are needed - the training data labels itself.

Self-supervised learning: The model learns by predicting the next word millions of times, adjusting when it's wrong.

The scale is staggering: The US Library of Congress contains about 10 terabytes of text — every book, every manuscript, every document. Modern LLMs train on roughly 1,000 times that amount. Imagine reading a thousand Libraries of Congress, then being quizzed on predicting the next word... billions of times.

Example 1 of 3

Given this context, what word comes next?

Thecatsatonthe___

Connection to Training

Self-supervised training teaches the model probability distributions over possible next words. Temperature controls whether we always pick the most likely word (deterministic) or sample more randomly (creative). This is why the same prompt can give different answers!

Temperature Playground

Adjust how the AI places its bets on the next word

When predicting the next word, AI models calculate betting odds for every possible word:

After "The quarterly report shows..."

"growth"35% odds"revenue"28% odds"profits"20% odds"losses"12% odds"puppies"0.001% odds

Temperature 0

Always bet on the favorite. "Growth" wins every time. Consistent but predictable.

Temperature 0.5

Mostly bet on favorites, but occasionally take a chance. Usually "growth," sometimes "revenue." Balanced variety.

Temperature 1.0

Take more chances on underdogs. "Losses" gets a real shot. More surprising - sometimes brilliant, sometimes odd.

Key Insight

The same prompt at different temperatures reveals that LLMs don't 'retrieve' answers - they're placing bets on what word comes next, and temperature adjusts how risky those bets are.

Try it yourself

Select a creative prompt and see how temperature affects the output.

Temperature 0

Deterministic

No output yet

Temperature 0.5

Balanced

No output yet

Temperature 1

Creative

No output yet

Reflection Questions

When would you use temperature 0 vs higher values in your work?

Think about different tasks in your field and what qualities matter most in the output.

Why might the same prompt give different answers?

Think about what the temperature parameter is actually doing to word selection.

How does this change your understanding of AI 'knowing' things?

Consider what it means that the same question can produce different answers.

Business Applications

How these training techniques power real products

Training techniques enable different business applications. Understanding HOW models are trained helps you choose the right tool.

Customer Service Chatbots

AI assistants that handle customer inquiries 24/7, reducing support costs while improving response times.

Click to see training connection

Document Summarization

Automatically condense long reports, emails, or articles into key points for busy professionals.

Click to see training connection

Code Generation

AI pair programmers that suggest code completions, explain code, and help debug issues.

Click to see training connection

Content Creation

Generate marketing copy, product descriptions, social media posts, and other business content.

Click to see training connection

Data Analysis

Ask questions about data in plain English and get insights, charts, and explanations.

Click to see training connection
Self-supervised learning
Fine-tuning
RLHF

Reflection Questions

What surprised you about how LLMs learn from text alone?

Consider that no one labeled "this word comes next" - the model learned patterns from billions of examples.

How might fine-tuning help your organization create specialized AI tools?

Think about company-specific knowledge, tone of voice, and workflows that a general model wouldn't know.

Why is human feedback (RLHF) important for AI safety?

Consider what happens when models optimize for predictions without human values guiding behavior.

Key Insight

Understanding how LLMs are trained helps you make better decisions about AI deployment: when to use off-the-shelf models, when to fine-tune, and why human oversight matters.