Multi-Layer Networks
Discover why hidden layers matter and how information flows through neural networks
From The Perceptron: A single perceptron can only learn simple patterns. What happens when we connect many together into layers?
From One to Many
Think of an organization. Individual contributors share their work with teams, who synthesize and pass insights to department heads, who inform executive decisions. Each layer transforms and combines information. Neural networks work the same way - one neuron is limited, but layers create powerful pattern recognition.
Hands-On: A Simple Network
Before diving into theory, try solving a problem a single perceptron can't. Adjust weights manually, then watch the network learn.
Scenario: Loan Approval Committee
A bank evaluates loan applications based on 3 factors. The rule: approve if at least 2 factors are positive. This "majority vote" logic is non-linear — a single perceptron can't learn it, but a network with a hidden layer can.
2+ years at current job
No defaults or late payments
Debt-to-income < 40%
Output: 1 = Approve loan, 0 = Deny loan
Network Structure (3-2-1)
All Applicant Scenarios
| Job | Credit | Debt | Should | Network | OK? |
|---|---|---|---|---|---|
| ✗ | ✗ | ✗ | No | 0.61 | ✗ |
| ✓ | ✗ | ✗ | No | 0.63 | ✗ |
| ✗ | ✓ | ✗ | No | 0.60 | ✗ |
| ✓ | ✓ | ✗ | Yes | 0.62 | ✓ |
| ✗ | ✗ | ✓ | No | 0.58 | ✗ |
| ✓ | ✗ | ✓ | Yes | 0.60 | ✓ |
| ✗ | ✓ | ✓ | Yes | 0.57 | ✓ |
| ✓ | ✓ | ✓ | Yes | 0.59 | ✓ |
Decision Rule: Majority Vote
The bank's rule: approve if 2+ factors are positive. Watch the network learn this step function — it must output low (deny) for 0-1 factors, high (approve) for 2-3 factors.
The transition at 2 factors is non-linear — that's what makes this problem interesting
Adjust Weights & Biases
Input → Hidden Weights
To hidden 1:
To hidden 2:
Hidden → Output Weights
Biases
Training
Epoch
0
Loss
0.2650
Loss over time:
What you're seeing: The hidden neurons learn to "count" positive factors in different ways. One might fire strongly when it sees multiple good signals, the other when it sees risky combinations. Together, they implement the bank's "2 out of 3" rule — a non-linear decision a single perceptron can't make.
Signal Flow Through Networks
Watch how data flows through different network architectures. Each scenario shows how the same principles apply to different decisions.
Why One Neuron Isn't Enough
Before diving into multi-layer networks, let's see why a single perceptron can't handle many real-world patterns.
A resting heart rate that's too low OR too high indicates danger. The perceptron can only learn "higher is worse" or "higher is better" - not both.
Adjust to see how risk changes (bpm (normalized))
Healthy resting range
Risk Level
How 2 hidden neurons solve this:
Hidden neuron 1 detects "is this dangerously LOW?" (activates gradually as HR drops). Hidden neuron 2 detects "is this too HIGH?" (activates as HR rises). Output combines both signals into overall risk.
The insight: A single perceptron can only draw one line. Heart rate risk has a "sweet spot" in the middle - it needs hidden neurons to detect dangerous values at BOTH ends of the spectrum.
How Depth Solves the Problem
Adding hidden layers lets networks learn increasingly complex patterns. See how accuracy improves with depth.
Why Depth Matters: Curve Fitting
Predict default risk from debt-to-income ratio. The pattern isn't simply "more debt = more risk".
Can only learn a straight line: "more debt = more risk" - misses the nuanced pattern.
Learns the rising risk pattern but struggles with the plateau at very high values.
Captures the full pattern: low risk zone, rising middle, and the plateau for experienced high-debt borrowers.
The insight: Real-world risk isn't linear. Experienced borrowers with high debt (like mortgages) often perform better than their debt ratio suggests. Depth captures these patterns.
Signal Flow in Neural Networks
Now let's see how information actually flows through these layers. Think of it like information moving through an organization.
Credit decisions require weighing payment history, income, and debt - different factors combine in complex ways.
Adjust Inputs
Click "Use Preset" to try custom input values
Network Topology
A 2-4-1 network for binary classification (outputs probability, then thresholded to yes/no)
Step 1: Analysts collect: 85% payment history, 60% income, 35% debt load
Three factors enter: excellent payment history compensates for moderate income
Understanding Forward Propagation
Forward propagation is the process of passing inputs through the network to get a prediction. Each layer transforms the signal before passing it to the next.
1. Input (Data Analysts)
2. Hidden 1 (Risk Assessment)
3. Output (Credit Committee)
Key insight: Hidden layers extract useful patterns from raw data that help make accurate predictions.
What the colors mean
Why Different Architectures for Different Tasks?
The shape of your data determines the best network structure
Image Recognition
Images
Convolutional Neural Network (CNN)
Nearby pixels matter! A CNN looks at small patches and builds up to larger patterns - edges, then shapes, then objects.
Examples: Photo tagging, quality inspection, medical imaging
Text Understanding
Text/Language
Transformer
Word order and context matter! Transformers use "attention" to let any word look at any other word, understanding relationships across sentences.
Examples: ChatGPT, translation, document summarization
Tabular Data
Spreadsheet/Database rows
Multi-Layer Perceptron (MLP)
Each column is independent information. No spatial relationship (unlike images) or sequence (unlike text). Just combine all features together.
Examples: Loan approval, churn prediction, pricing models
Key insight: You don't need to memorize architectures - just remember that different data shapes need different structures. Images have spatial relationships, text has sequential context, and spreadsheets have independent columns.
Key insight: Hidden layers let networks learn complex, non-linear patterns that a single perceptron cannot capture. But how does the network learn the right weights for each connection?
What you learned
- •Information flows through networks layer by layer (forward propagation)
- •Hidden layers extract intermediate patterns - they create features the output layer uses
- •Non-linear relationships (like U-shaped risks) require hidden layers to capture
- •Deeper networks can learn more complex patterns, but need more data to train