flowchart TD
celsius[celsius] --> matmul["matmul (@)"]
w[w] --> transpose["transpose (.T)"] --> matmul
matmul --> add["add (+)"]
b[b] --> add
add --> pred[pred]
pred --> mse[MSE]
targets[targets] --> mse
mse --> loss[loss]
5 The Computational Graph
To compute gradients, we need to remember how outputs were computed. Enter the computational graph.
5.1 The Big Picture
When we compute:
pred = celsius @ w.T + b
loss = mse_loss(pred, targets)We’re building a graph:
Each node remembers its inputs, so we can trace backwards.
5.2 Tracking Computation
We need to modify our Tensor class to remember:
- What operation created it? (grad_fn)
- What were the inputs? (parents)
class Tensor:
def __init__(self, data, requires_grad=False, grad_fn=None, parents=None):
self.data = np.array(data, dtype=np.float32)
self.requires_grad = requires_grad
self.grad = None # Will store gradient
# For computational graph
self.grad_fn = grad_fn # Operation that created this tensor
self.parents = parents or [] # Input tensors5.3 The requires_grad Flag
Not all tensors need gradients:
# Input data - no gradients needed
celsius = Tensor([[0.0], [100.0]], requires_grad=False)
# Parameters - we want to optimize these!
w = Tensor([[0.5]], requires_grad=True)
b = Tensor([10.0], requires_grad=True)Only tensors with requires_grad=True will accumulate gradients.
5.4 Operators That Track History
Let’s modify addition to track the computation:
class Tensor:
def __add__(self, other):
if not isinstance(other, Tensor):
other = Tensor(other)
# Compute result
result_data = self.data + other.data
# Should we track gradients?
requires_grad = self.requires_grad or other.requires_grad
# Create result tensor with history
result = Tensor(
result_data,
requires_grad=requires_grad,
grad_fn='add' if requires_grad else None,
parents=[self, other] if requires_grad else []
)
return resultNow result remembers it came from self + other.
5.5 Matrix Multiplication with Tracking
class Tensor:
def __matmul__(self, other):
if not isinstance(other, Tensor):
other = Tensor(other)
result_data = self.data @ other.data
requires_grad = self.requires_grad or other.requires_grad
result = Tensor(
result_data,
requires_grad=requires_grad,
grad_fn='matmul' if requires_grad else None,
parents=[self, other] if requires_grad else []
)
return result5.6 Visualizing the Graph
Let’s trace through our temperature example:
celsius = Tensor([[0.0], [100.0]]) # Leaf (no parents)
w = Tensor([[0.5]], requires_grad=True) # Leaf (parameter)
b = Tensor([10.0], requires_grad=True) # Leaf (parameter)
# Step 1: w.T
w_t = w.T
# w_t.grad_fn = 'transpose'
# w_t.parents = [w]
# Step 2: celsius @ w_t
matmul_out = celsius @ w_t
# matmul_out.grad_fn = 'matmul'
# matmul_out.parents = [celsius, w_t]
# Step 3: matmul_out + b
pred = matmul_out + b
# pred.grad_fn = 'add'
# pred.parents = [matmul_out, b]We can print the graph:
def print_graph(tensor, indent=0):
prefix = " " * indent
grad_fn = tensor.grad_fn or "leaf"
print(f"{prefix}{tensor.shape} <- {grad_fn}")
for parent in tensor.parents:
print_graph(parent, indent + 1)
print_graph(pred)Output:
(2, 1) <- add
(2, 1) <- matmul
(2, 1) <- leaf
(1, 1) <- transpose
(1, 1) <- leaf
(1,) <- leaf
5.7 Why Track History?
The graph enables backpropagation:
- Start at the loss
- Walk backwards through the graph
- At each node, compute gradient using the chain rule
- Pass gradients to parent nodes
flowchart RL
loss[loss] -->|grad| mse[MSE]
mse -->|grad| pred[pred]
pred -->|grad| add[add]
add -->|grad| matmul[matmul]
add -->|grad| b[b]
matmul -->|grad| w[w]
5.8 Leaf Tensors vs Non-Leaf Tensors
| Type | grad_fn | parents | Example |
|---|---|---|---|
| Leaf | None | [] | w, b, celsius |
| Non-Leaf | operation name | [inputs] | pred, loss |
Leaf tensors are the starting points (data or parameters). Non-leaf tensors are computed from other tensors.
5.9 The Complete Tensor Class (So Far)
import numpy as np
class Tensor:
def __init__(self, data, requires_grad=False, grad_fn=None, parents=None):
self.data = np.array(data, dtype=np.float32)
self.requires_grad = requires_grad
self.grad = None
self.grad_fn = grad_fn
self.parents = parents or []
@property
def shape(self):
return self.data.shape
@property
def T(self):
result = Tensor(
self.data.T,
requires_grad=self.requires_grad,
grad_fn='transpose' if self.requires_grad else None,
parents=[self] if self.requires_grad else []
)
return result
def __add__(self, other):
if not isinstance(other, Tensor):
other = Tensor(other)
requires_grad = self.requires_grad or other.requires_grad
return Tensor(
self.data + other.data,
requires_grad=requires_grad,
grad_fn='add' if requires_grad else None,
parents=[self, other] if requires_grad else []
)
def __matmul__(self, other):
if not isinstance(other, Tensor):
other = Tensor(other)
requires_grad = self.requires_grad or other.requires_grad
return Tensor(
self.data @ other.data,
requires_grad=requires_grad,
grad_fn='matmul' if requires_grad else None,
parents=[self, other] if requires_grad else []
)
# ... more operations ...Code Reference: See src/tensorweaver/autodiff/tensor.py for the complete implementation.
5.10 Summary
- Computational graph tracks how tensors are created
- Each tensor stores
grad_fn(operation) andparents(inputs) requires_grad=Truemarks tensors that need gradients- The graph enables walking backwards to compute gradients
Next: Using the graph to implement backpropagation.