How LLMs Learn
From reading the internet to following your instructions
Temperature controls randomness in generation. But how did the model learn what words to generate in the first place? Let's explore how LLMs are trained.
Review: Temperature PlaygroundLarge Language Models go through multiple stages of training before they can have a conversation with you. First, they learn to predict text by reading billions of web pages. Then, they're fine-tuned for specific tasks. Finally, human feedback shapes their behavior to be helpful and safe. Let's explore each stage.
1. Learning to Predict
LLMs learn by playing a simple game millions of times: given some text, predict the next word. This is called self-supervised learning because no human labels are needed - the training data labels itself.
Self-supervised learning: The model learns by predicting the next word millions of times, adjusting when it's wrong.
The scale is staggering: The US Library of Congress contains about 10 terabytes of text — every book, every manuscript, every document. Modern LLMs train on roughly 1,000 times that amount. Imagine reading a thousand Libraries of Congress, then being quizzed on predicting the next word... billions of times.
Given this context, what word comes next?
Connection to Training
Self-supervised training teaches the model probability distributions over possible next words. Temperature controls whether we always pick the most likely word (deterministic) or sample more randomly (creative). This is why the same prompt can give different answers!
Temperature Playground
Adjust how the AI places its bets on the next word
When predicting the next word, AI models calculate betting odds for every possible word:
After "The quarterly report shows..."
Temperature 0
Always bet on the favorite. "Growth" wins every time. Consistent but predictable.
Temperature 0.5
Mostly bet on favorites, but occasionally take a chance. Usually "growth," sometimes "revenue." Balanced variety.
Temperature 1.0
Take more chances on underdogs. "Losses" gets a real shot. More surprising - sometimes brilliant, sometimes odd.
Key Insight
The same prompt at different temperatures reveals that LLMs don't 'retrieve' answers - they're placing bets on what word comes next, and temperature adjusts how risky those bets are.
Try it yourself
Select a creative prompt and see how temperature affects the output.
Temperature 0
Deterministic
Temperature 0.5
Balanced
Temperature 1
Creative
Reflection Questions
When would you use temperature 0 vs higher values in your work?
Think about different tasks in your field and what qualities matter most in the output.
Why might the same prompt give different answers?
Think about what the temperature parameter is actually doing to word selection.
How does this change your understanding of AI 'knowing' things?
Consider what it means that the same question can produce different answers.
Business Applications
How these training techniques power real products
Training techniques enable different business applications. Understanding HOW models are trained helps you choose the right tool.
Customer Service Chatbots
AI assistants that handle customer inquiries 24/7, reducing support costs while improving response times.
Document Summarization
Automatically condense long reports, emails, or articles into key points for busy professionals.
Code Generation
AI pair programmers that suggest code completions, explain code, and help debug issues.
Content Creation
Generate marketing copy, product descriptions, social media posts, and other business content.
Data Analysis
Ask questions about data in plain English and get insights, charts, and explanations.
Reflection Questions
What surprised you about how LLMs learn from text alone?
Consider that no one labeled "this word comes next" - the model learned patterns from billions of examples.
How might fine-tuning help your organization create specialized AI tools?
Think about company-specific knowledge, tone of voice, and workflows that a general model wouldn't know.
Why is human feedback (RLHF) important for AI safety?
Consider what happens when models optimize for predictions without human values guiding behavior.
Key Insight
Understanding how LLMs are trained helps you make better decisions about AI deployment: when to use off-the-shelf models, when to fine-tune, and why human oversight matters.