Foundations

Multi-Layer Networks

Discover why hidden layers matter and how information flows through neural networks

From The Perceptron: A single perceptron can only learn simple patterns. What happens when we connect many together into layers?

From One to Many

Think of an organization. Individual contributors share their work with teams, who synthesize and pass insights to department heads, who inform executive decisions. Each layer transforms and combines information. Neural networks work the same way - one neuron is limited, but layers create powerful pattern recognition.

Hands-On: A Simple Network

Before diving into theory, try solving a problem a single perceptron can't. Adjust weights manually, then watch the network learn.

Scenario: Loan Approval Committee

A bank evaluates loan applications based on 3 factors. The rule: approve if at least 2 factors are positive. This "majority vote" logic is non-linear — a single perceptron can't learn it, but a network with a hidden layer can.

x1: Stable Job

2+ years at current job

x2: Good Credit

No defaults or late payments

x3: Low Debt

Debt-to-income < 40%

Output: 1 = Approve loan, 0 = Deny loan

Network Structure (3-2-1)

Positive weight

Negative weight

All Applicant Scenarios

Job	Credit	Debt	Should	Network	OK?
✗	✗	✗	No	0.61	✗
✓	✗	✗	No	0.63	✗
✗	✓	✗	No	0.60	✗
✓	✓	✗	Yes	0.62	✓
✗	✗	✓	No	0.58	✗
✓	✗	✓	Yes	0.60	✓
✗	✓	✓	Yes	0.57	✓
✓	✓	✓	Yes	0.59	✓

Decisions correct:50%

Decision Rule: Majority Vote

The bank's rule: approve if 2+ factors are positive. Watch the network learn this step function — it must output low (deny) for 0-1 factors, high (approve) for 2-3 factors.

Target: Approve

Target: Deny

Network correct

Network wrong

The transition at 2 factors is non-linear — that's what makes this problem interesting

Adjust Weights & Biases

Input → Hidden Weights

To hidden 1:

w1→h1: 0.8

w2→h1: 0.6

w3→h1: 0.7

To hidden 2:

w1→h2: 0.3

w2→h2: -0.2

w3→h2: -0.7

Hidden → Output Weights

h1→out: 0.0

h2→out: 0.8

Biases

b_h1: -0.8

b_h2: -0.2

b_out: 0.0

Training

Epoch

Loss

0.2650

Loss over time:

Click "Train" to see loss decrease

What you're seeing: The hidden neurons learn to "count" positive factors in different ways. One might fire strongly when it sees multiple good signals, the other when it sees risky combinations. Together, they implement the bank's "2 out of 3" rule — a non-linear decision a single perceptron can't make.

Signal Flow Through Networks

Watch how data flows through different network architectures. Each scenario shows how the same principles apply to different decisions.

Why One Neuron Isn't Enough

Before diving into multi-layer networks, let's see why a single perceptron can't handle many real-world patterns.

A resting heart rate that's too low OR too high indicates danger. The perceptron can only learn "higher is worse" or "higher is better" - not both.

Resting Heart Rate0.50

Adjust to see how risk changes (bpm (normalized))

Optimal

Healthy resting range

Risk Level

True Risk (needs hidden layer)

How 2 hidden neurons solve this:

Hidden neuron 1 detects "is this dangerously LOW?" (activates gradually as HR drops). Hidden neuron 2 detects "is this too HIGH?" (activates as HR rises). Output combines both signals into overall risk.

The insight: A single perceptron can only draw one line. Heart rate risk has a "sweet spot" in the middle - it needs hidden neurons to detect dangerous values at BOTH ends of the spectrum.

How Depth Solves the Problem

Adding hidden layers lets networks learn increasingly complex patterns. See how accuracy improves with depth.

Why Depth Matters: Curve Fitting

Predict default risk from debt-to-income ratio. The pattern isn't simply "more debt = more risk".

How well can each network learn the pattern?

Target Pattern

No Hidden Layer

Out

Accuracy:58%

Can only learn a straight line: "more debt = more risk" - misses the nuanced pattern.

One Hidden Layer

Out

Accuracy:82%

Learns the rising risk pattern but struggles with the plateau at very high values.

Deep Network

Out

Accuracy:94%

Captures the full pattern: low risk zone, rising middle, and the plateau for experienced high-debt borrowers.

The insight: Real-world risk isn't linear. Experienced borrowers with high debt (like mortgages) often perform better than their debt ratio suggests. Depth captures these patterns.

Signal Flow in Neural Networks

Now let's see how information actually flows through these layers. Think of it like information moving through an organization.

Network: 3 → 4 → 1

Credit decisions require weighing payment history, income, and debt - different factors combine in complex ways.

Adjust Inputs

Click "Use Preset" to try custom input values

Payment History85%

Income Level60%

Debt Ratio35%

Network Topology

A 2-4-1 network for binary classification (outputs probability, then thresholded to yes/no)

Step 1: Analysts collect: 85% payment history, 60% income, 35% debt load

Three factors enter: excellent payment history compensates for moderate income

Understanding Forward Propagation

Forward propagation is the process of passing inputs through the network to get a prediction. Each layer transforms the signal before passing it to the next.

1. Input (Data Analysts)

Active

Payment :85%

Income L:60%

Debt Rat:35%

2. Hidden 1 (Risk Assessment)

3. Output (Credit Committee)

Key insight: Hidden layers extract useful patterns from raw data that help make accurate predictions.

What the colors mean

Positive weight

Negative weight

Low activation

High activation

Why Different Architectures for Different Tasks?

The shape of your data determines the best network structure

Image Recognition

Images

Convolutional Neural Network (CNN)

Nearby pixels matter! A CNN looks at small patches and builds up to larger patterns - edges, then shapes, then objects.

Examples: Photo tagging, quality inspection, medical imaging

Text Understanding

Text/Language

Transformer

Word order and context matter! Transformers use "attention" to let any word look at any other word, understanding relationships across sentences.

Examples: ChatGPT, translation, document summarization

Tabular Data

Spreadsheet/Database rows

Multi-Layer Perceptron (MLP)

Each column is independent information. No spatial relationship (unlike images) or sequence (unlike text). Just combine all features together.

Examples: Loan approval, churn prediction, pricing models

Key insight: You don't need to memorize architectures - just remember that different data shapes need different structures. Images have spatial relationships, text has sequential context, and spreadsheets have independent columns.

Key insight: Hidden layers let networks learn complex, non-linear patterns that a single perceptron cannot capture. But how does the network learn the right weights for each connection?

What you learned

•Information flows through networks layer by layer (forward propagation)
•Hidden layers extract intermediate patterns - they create features the output layer uses
•Non-linear relationships (like U-shaped risks) require hidden layers to capture
•Deeper networks can learn more complex patterns, but need more data to train