5  The Computational Graph

To compute gradients, we need to remember how outputs were computed. Enter the computational graph.

5.1 The Big Picture

When we compute:

pred = celsius @ w.T + b
loss = mse_loss(pred, targets)

We’re building a graph:

flowchart TD
    celsius[celsius] --> matmul["matmul (@)"]
    w[w] --> transpose["transpose (.T)"] --> matmul
    matmul --> add["add (+)"]
    b[b] --> add
    add --> pred[pred]
    pred --> mse[MSE]
    targets[targets] --> mse
    mse --> loss[loss]

Each node remembers its inputs, so we can trace backwards.

5.2 Tracking Computation

We need to modify our Tensor class to remember:

  1. What operation created it? (grad_fn)
  2. What were the inputs? (parents)
class Tensor:
    def __init__(self, data, requires_grad=False, grad_fn=None, parents=None):
        self.data = np.array(data, dtype=np.float32)
        self.requires_grad = requires_grad
        self.grad = None  # Will store gradient

        # For computational graph
        self.grad_fn = grad_fn      # Operation that created this tensor
        self.parents = parents or []  # Input tensors

5.3 The requires_grad Flag

Not all tensors need gradients:

# Input data - no gradients needed
celsius = Tensor([[0.0], [100.0]], requires_grad=False)

# Parameters - we want to optimize these!
w = Tensor([[0.5]], requires_grad=True)
b = Tensor([10.0], requires_grad=True)

Only tensors with requires_grad=True will accumulate gradients.

5.4 Operators That Track History

Let’s modify addition to track the computation:

class Tensor:
    def __add__(self, other):
        if not isinstance(other, Tensor):
            other = Tensor(other)

        # Compute result
        result_data = self.data + other.data

        # Should we track gradients?
        requires_grad = self.requires_grad or other.requires_grad

        # Create result tensor with history
        result = Tensor(
            result_data,
            requires_grad=requires_grad,
            grad_fn='add' if requires_grad else None,
            parents=[self, other] if requires_grad else []
        )

        return result

Now result remembers it came from self + other.

5.5 Matrix Multiplication with Tracking

class Tensor:
    def __matmul__(self, other):
        if not isinstance(other, Tensor):
            other = Tensor(other)

        result_data = self.data @ other.data
        requires_grad = self.requires_grad or other.requires_grad

        result = Tensor(
            result_data,
            requires_grad=requires_grad,
            grad_fn='matmul' if requires_grad else None,
            parents=[self, other] if requires_grad else []
        )

        return result

5.6 Visualizing the Graph

Let’s trace through our temperature example:

celsius = Tensor([[0.0], [100.0]])              # Leaf (no parents)
w = Tensor([[0.5]], requires_grad=True)         # Leaf (parameter)
b = Tensor([10.0], requires_grad=True)          # Leaf (parameter)

# Step 1: w.T
w_t = w.T
# w_t.grad_fn = 'transpose'
# w_t.parents = [w]

# Step 2: celsius @ w_t
matmul_out = celsius @ w_t
# matmul_out.grad_fn = 'matmul'
# matmul_out.parents = [celsius, w_t]

# Step 3: matmul_out + b
pred = matmul_out + b
# pred.grad_fn = 'add'
# pred.parents = [matmul_out, b]

We can print the graph:

def print_graph(tensor, indent=0):
    prefix = "  " * indent
    grad_fn = tensor.grad_fn or "leaf"
    print(f"{prefix}{tensor.shape} <- {grad_fn}")
    for parent in tensor.parents:
        print_graph(parent, indent + 1)

print_graph(pred)

Output:

(2, 1) <- add
  (2, 1) <- matmul
    (2, 1) <- leaf
    (1, 1) <- transpose
      (1, 1) <- leaf
  (1,) <- leaf

5.7 Why Track History?

The graph enables backpropagation:

  1. Start at the loss
  2. Walk backwards through the graph
  3. At each node, compute gradient using the chain rule
  4. Pass gradients to parent nodes

flowchart RL
    loss[loss] -->|grad| mse[MSE]
    mse -->|grad| pred[pred]
    pred -->|grad| add[add]
    add -->|grad| matmul[matmul]
    add -->|grad| b[b]
    matmul -->|grad| w[w]

5.8 Leaf Tensors vs Non-Leaf Tensors

Type grad_fn parents Example
Leaf None [] w, b, celsius
Non-Leaf operation name [inputs] pred, loss

Leaf tensors are the starting points (data or parameters). Non-leaf tensors are computed from other tensors.

5.9 The Complete Tensor Class (So Far)

import numpy as np

class Tensor:
    def __init__(self, data, requires_grad=False, grad_fn=None, parents=None):
        self.data = np.array(data, dtype=np.float32)
        self.requires_grad = requires_grad
        self.grad = None
        self.grad_fn = grad_fn
        self.parents = parents or []

    @property
    def shape(self):
        return self.data.shape

    @property
    def T(self):
        result = Tensor(
            self.data.T,
            requires_grad=self.requires_grad,
            grad_fn='transpose' if self.requires_grad else None,
            parents=[self] if self.requires_grad else []
        )
        return result

    def __add__(self, other):
        if not isinstance(other, Tensor):
            other = Tensor(other)
        requires_grad = self.requires_grad or other.requires_grad
        return Tensor(
            self.data + other.data,
            requires_grad=requires_grad,
            grad_fn='add' if requires_grad else None,
            parents=[self, other] if requires_grad else []
        )

    def __matmul__(self, other):
        if not isinstance(other, Tensor):
            other = Tensor(other)
        requires_grad = self.requires_grad or other.requires_grad
        return Tensor(
            self.data @ other.data,
            requires_grad=requires_grad,
            grad_fn='matmul' if requires_grad else None,
            parents=[self, other] if requires_grad else []
        )

    # ... more operations ...
Note

Code Reference: See src/tensorweaver/autodiff/tensor.py for the complete implementation.

5.10 Summary

  • Computational graph tracks how tensors are created
  • Each tensor stores grad_fn (operation) and parents (inputs)
  • requires_grad=True marks tensors that need gradients
  • The graph enables walking backwards to compute gradients

Next: Using the graph to implement backpropagation.