The LLM Era
Explore the building blocks of large language models through hands-on experiments that demystify how these systems work.
Learning Objectives
After this week, you'll be able to:
- 1
Evaluate tokenization trade-offs when deploying AI across languages
Why this matters: Token counts directly impact API costs and response quality. A model that fragments Chinese into 3x more tokens than English will cost more and potentially perform worse.
Shanahan: Tokenization reveals what Shanahan calls "statistical processing" — the model sees token patterns, not words or meanings. Different tokenization means different patterns.
- 2
Explain to stakeholders how embeddings capture word relationships, not definitions
Why this matters: Executives ask "does the AI understand our industry?" Explaining embeddings helps set accurate expectations — the model learned associations from text, not expertise.
Shanahan: Shanahan emphasizes that LLMs encode correlations from training data. Embeddings are literally this — king:queen::man:woman emerges from word co-occurrence, not understanding of royalty.
- 3
Demonstrate how attention lets models use context for disambiguation
Why this matters: Knowing that attention = context-gathering helps you predict AI success: tasks with rich context (document summarization) vs. tasks needing external knowledge (fact-checking).
Shanahan: Attention is the mechanism behind Shanahan's core claim: LLMs predict "what words typically follow what other words." Each word attends to relevant context to inform that prediction.
- 4
Compare training approaches to select appropriate AI tools for business needs
Why this matters: Vendor claims like "fine-tuned for enterprise" or "RLHF-aligned" become actionable when you understand what these techniques actually provide.
Shanahan: Training choices determine whether outputs merely "take the form of a proposition" (Shanahan) or actually meet your quality bar. Understanding training = understanding AI limitations.
Required Readings
Shanahan, M. (2024)
Talking About Large Language Models
Communications of the ACM
Vaswani et al. (2017)
Attention Is All You Need
NeurIPS (excerpt)
"The best way to think of a large language model is as a device for performing next-word prediction on internet-scale quantities of text."
- Murray Shanahan (2024)
How LLMs Process Text
Each module below explores one step in this pipeline.
Key Concepts at a Glance
Terms you'll encounter this week — quick definitions before the deep dives
How LLMs Learn
LLMs learn by predicting the next word in massive amounts of text
This is called self-supervised learning — no human labels needed. The model sees billions of sentences and learns patterns. Given "The cat sat on the ___", it learns "mat" is more likely than "elephant".
Attention
LLMs can focus on relevant parts of the context
When processing "The bank was flooded with water", attention helps the model focus on "water" to understand this is a river bank, not a financial institution.
Fine-tuning
Specializing a general model for specific tasks or domains
Like training a general doctor to become a heart specialist. The base knowledge stays, but the model learns to respond in domain-specific ways.
RAG
Giving LLMs access to external documents before responding
Retrieval-Augmented Generation retrieves relevant documents first, then generates a response using that context. This is how ChatGPT can search the web or answer questions about your uploaded files.
Act 1: Foundations of Neural Networks
From single neurons to language models
Act 2: Language Processing
From text to meaning
Act 3: Training & Generation
How LLMs become useful
Tips for This Week
Follow the Foundations First
Start with Act 1 - the four modules build on each other. Perceptron → Multi-Layer Networks → Training → LLMs. Each one prepares you for the next.
Connect to Readings
As you explore: How do these visualizations connect to Shanahan's point about statistical processing? What does it mean that AI "sees" word pieces, not words?