Week 2

The LLM Era

Explore the building blocks of large language models through hands-on experiments that demystify how these systems work.

Learning Objectives

After this week, you'll be able to:

1
Evaluate tokenization trade-offs when deploying AI across languages
Why this matters: Token counts directly impact API costs and response quality. A model that fragments Chinese into 3x more tokens than English will cost more and potentially perform worse.
Shanahan: Tokenization reveals what Shanahan calls "statistical processing" — the model sees token patterns, not words or meanings. Different tokenization means different patterns.
2
Explain to stakeholders how embeddings capture word relationships, not definitions
Why this matters: Executives ask "does the AI understand our industry?" Explaining embeddings helps set accurate expectations — the model learned associations from text, not expertise.
Shanahan: Shanahan emphasizes that LLMs encode correlations from training data. Embeddings are literally this — king:queen::man:woman emerges from word co-occurrence, not understanding of royalty.
3
Demonstrate how attention lets models use context for disambiguation
Why this matters: Knowing that attention = context-gathering helps you predict AI success: tasks with rich context (document summarization) vs. tasks needing external knowledge (fact-checking).
Shanahan: Attention is the mechanism behind Shanahan's core claim: LLMs predict "what words typically follow what other words." Each word attends to relevant context to inform that prediction.
4
Compare training approaches to select appropriate AI tools for business needs
Why this matters: Vendor claims like "fine-tuned for enterprise" or "RLHF-aligned" become actionable when you understand what these techniques actually provide.
Shanahan: Training choices determine whether outputs merely "take the form of a proposition" (Shanahan) or actually meet your quality bar. Understanding training = understanding AI limitations.

Required Readings

Shanahan, M. (2024)
Talking About Large Language Models
Communications of the ACM
Vaswani et al. (2017)
Attention Is All You Need
NeurIPS (excerpt)

"The best way to think of a large language model is as a device for performing next-word prediction on internet-scale quantities of text."

- Murray Shanahan (2024)

Progress: 0 / 9 modules

How LLMs Process Text

Tokenization

Text to tokens

Embeddings

Tokens to vectors

Attention

Context weighting

Generation

Probability sampling

Each module below explores one step in this pipeline.

Key Concepts at a Glance

Terms you'll encounter this week — quick definitions before the deep dives

How LLMs Learn

LLMs learn by predicting the next word in massive amounts of text

This is called self-supervised learning — no human labels needed. The model sees billions of sentences and learns patterns. Given "The cat sat on the ___", it learns "mat" is more likely than "elephant".

Attention

LLMs can focus on relevant parts of the context

When processing "The bank was flooded with water", attention helps the model focus on "water" to understand this is a river bank, not a financial institution.

Fine-tuning

Specializing a general model for specific tasks or domains

Like training a general doctor to become a heart specialist. The base knowledge stays, but the model learns to respond in domain-specific ways.

RAG

Giving LLMs access to external documents before responding

Retrieval-Augmented Generation retrieves relevant documents first, then generates a response using that context. This is how ChatGPT can search the web or answer questions about your uploaded files.

Act 1: Foundations of Neural Networks

From single neurons to language models

The Perceptron

The simplest "neuron" - weighted inputs, threshold, decision

8-10 min

Multi-Layer Networks

Why hidden layers matter and how signals flow

10-15 min

How Networks Learn

Training, backpropagation, and finding the right weights

12-15 min

From Networks to LLMs

How neural networks become language models

10-12 min

Act 2: Language Processing

From text to meaning

Tokenizer Playground

How text becomes tokens

10-15 min

Embedding Space Visualizer

How words become vectors

15-20 min

Attention Mechanism Visualizer

How transformers attend to context

15-20 min

Act 3: Training & Generation

How LLMs become useful

Temperature Playground

Randomness in text generation

10-15 min

How LLMs Learn

Self-supervised learning, fine-tuning, and RLHF

20-25 min

Tips for This Week

Follow the Foundations First

Start with Act 1 - the four modules build on each other. Perceptron → Multi-Layer Networks → Training → LLMs. Each one prepares you for the next.

Connect to Readings

As you explore: How do these visualizations connect to Shanahan's point about statistical processing? What does it mean that AI "sees" word pieces, not words?