Week 3

Critical AI Literacy

Develop the critical thinking skills to evaluate AI capabilities, verify outputs, and question claims about what these systems can and cannot do.

Learning Objectives

After this week, you'll be able to:

  • 1

    Identify overfitting patterns and explain why more data does not always mean better models

    Why this matters: Vendors often claim "trained on more data" as a selling point. Understanding overfitting helps you ask better questions about model generalization.

    Critical lens: Training data quantity vs. quality is central to evaluating AI claims. A model can memorize without learning.

  • 2

    Fact-check LLM outputs using systematic verification techniques

    Why this matters: LLMs confidently produce plausible-sounding but incorrect information. Knowing how to verify outputs is essential for professional use.

    Critical lens: The "stochastic parrot" critique highlights that fluency does not equal accuracy. Pattern matching produces grammatical nonsense.

  • 3

    Analyze training data composition and its impact on model behavior

    Why this matters: Models reflect their training data biases. Understanding what data went in helps predict problematic outputs before they affect your work.

    Critical lens: Training data archaeology reveals what the model actually learned versus what we assume it knows.

  • 4

    Evaluate claims of emergent AI capabilities using appropriate metrics and skepticism

    Why this matters: Media coverage often hypes "emergent" abilities. Distinguishing genuine capabilities from measurement artifacts protects against AI over-reliance.

    Critical lens: Emergence claims often dissolve under scrutiny of how benchmarks are designed and measured.

Required Readings

  • Bender et al. (2021)

    On the Dangers of Stochastic Parrots

    FAccT Conference

  • Schaeffer et al. (2023)

    Are Emergent Abilities of LLMs a Mirage?

    NeurIPS

"A language model is a system trained on vast quantities of text data in order to produce human language text output. This says nothing about the system understanding the meaning of the text."

- Bender & Koller (2020)

Progress: 0 / 4 modules

Key Concepts at a Glance

Terms you'll encounter this week — quick definitions before the deep dives

Overfitting

When a model memorizes training examples instead of learning general patterns

An overfit model performs great on training data but fails on new data. It is like memorizing test answers without understanding the subject.

Stochastic Parrot

A system that produces plausible text without understanding meaning

The term comes from Bender et al. (2021). It highlights that fluent output does not imply comprehension — the model predicts likely next tokens based on patterns.

Training Data

The examples a model learns patterns from determine what it can do

Models are shaped by their training data. Web scrapes include misinformation, biases, and copyrighted material. Output quality cannot exceed input quality.

Emergence

Capabilities that appear suddenly at certain model scales

Some abilities seem to emerge only in large models. But recent research suggests this may be a measurement artifact — continuous improvements can look like sudden jumps depending on how we measure.

Act 1: Understanding Training

How models learn and fail to learn

Training Visualizer

Interactive neural network training with decision boundaries, overfitting, and train/test splits

20-30 min

Act 2: Critical Evaluation

Tools for questioning AI outputs and claims

Stochastic Parrot Tester

Fact-check LLM outputs and document patterns

20-25 min

Training Data Archaeology

Explore what is actually in AI training data

15-20 min

Emergence Meter

See how metrics create the illusion of emergence

15-20 min

Tips for This Week

Question Everything

This week is about developing critical thinking skills. When you see a claim about AI capabilities, ask: How was this measured? What could go wrong?

Connect to Your Work

Think about AI tools you use professionally. How would you fact-check their outputs? What biases might exist in their training data?