Deep Dive

How LLMs Learn

From reading the internet to following your instructions

Temperature controls randomness in generation. But how did the model learn what words to generate in the first place? Let's explore how LLMs are trained.

Review: Temperature Playground

Large Language Models go through multiple stages of training before they can have a conversation with you. First, they learn to predict text by reading billions of web pages. Then, they're fine-tuned for specific tasks. Finally, human feedback shapes their behavior to be helpful and safe. Let's explore each stage.

1. Learning to Predict

LLMs learn by playing a simple game millions of times: given some text, predict the next word. This is called self-supervised learning because no human labels are needed - the training data labels itself.

Self-supervised learning: The model learns by predicting the next word millions of times, adjusting when it's wrong.

The scale is staggering: The US Library of Congress contains about 10 terabytes of text — every book, every manuscript, every document. Modern LLMs train on roughly 1,000 times that amount. Imagine reading a thousand Libraries of Congress, then being quizzed on predicting the next word... billions of times.

Example 1 of 3

Given this context, what word comes next?

Thecatsatonthe___

Connection to Training

Self-supervised training teaches the model probability distributions over possible next words. Temperature controls whether we always pick the most likely word (deterministic) or sample more randomly (creative). This is why the same prompt can give different answers!

Temperature Playground

Adjust how the AI places its bets on the next word

When predicting the next word, AI models calculate betting odds for every possible word:

After "The quarterly report shows..."

"growth"35% odds"revenue"28% odds"profits"20% odds"losses"12% odds"puppies"0.001% odds

Temperature 0

Always bet on the favorite. "Growth" wins every time. Consistent but predictable.

Temperature 0.5

Mostly bet on favorites, but occasionally take a chance. Usually "growth," sometimes "revenue." Balanced variety.

Temperature 1.0

Take more chances on underdogs. "Losses" gets a real shot. More surprising - sometimes brilliant, sometimes odd.

Key Insight

The same prompt at different temperatures reveals that LLMs don't 'retrieve' answers - they're placing bets on what word comes next, and temperature adjusts how risky those bets are.

Try it yourself

Select a creative prompt and see how temperature affects the output.

Select a prompt

Or enter custom prompt

Temperature 0

Deterministic

No output yet

Temperature 0.5

Balanced

No output yet

Temperature 1

Creative

No output yet

Reflection Questions

When would you use temperature 0 vs higher values in your work?

Think about different tasks in your field and what qualities matter most in the output.

Why might the same prompt give different answers?

Think about what the temperature parameter is actually doing to word selection.

How does this change your understanding of AI 'knowing' things?

Consider what it means that the same question can produce different answers.

Business Applications

How these training techniques power real products

Training techniques enable different business applications. Understanding HOW models are trained helps you choose the right tool.

Customer Service Chatbots

AI assistants that handle customer inquiries 24/7, reducing support costs while improving response times.