Skip to main content

Implementation of Operator

In mathematics, function is a mapping from input to output. In deep learning, the operator is the function that takes one or more tensors as input and produces one or more tensors as output.

Since the neural network can compute from two different directions: forward and backward, the operator also need to provide both forward and backward methods to perform the computation.

The forward method is used to compute the output of the operator. It takes the input tensors and returns the output tensors.

The backward method is used to compute the gradient of the output with respect to the input. It takes the gradient of the output tensor and returns the gradient of the input tensors.

Implement Operator

The source code of the Operator class can be found in the operator.py file.

Here is the simplified version of the Operator class:

class Operator:
def forward(self, *inputs):
# do some forward computation here
pass

def backward(self, *output_grads):
# do some backward computation here
pass

def __call__(self, *inputs):
return self.forward(*inputs)

To readers who are not familiar with the __call__ method, it allows the operator instance to be called like a function. For example, add_op = Add() and add_op(a, b) is equivalent to add_op.forward(a, b).

Usage of Operator

A computational graph can have multiple operators instances of same type. For example, there are multiple Add operators in the graph.

a = torch.tensor(1.0)
b = torch.tensor(2.0)
c = torch.add(a, b)
d = torch.add(c, b)

You can notice that in previous section, we defined a Add operator class, but we didn't use it to create an operator instance and call it. This is because for user's convenience, helper functions like torch.add is provided to create an operator instance and call it in one line. It equals to:

def add(a, b):
return Add()(a, b)

In this case, the creator of tensors c and d are different torch.add operator instances. This is because each operator instance takes different input tensors, in different positions of the computational graph. So only different operator instances can exactly represent the different nodes in the computational graph. Luckily, each time you call torch.add, a new Add operator instance will be created. The framework automatically handles this for you.

Operator is the basic building block of network

In deep learning, the operator is the basic building block of network. The network is formed by connecting the operator instances together. Like lego blocks, the operator is the basic shape like a triangle, square, circle, etc.

Operators should be carefully designed to be so fundamental that the combination of them can form any complex network. Just like any vector in the vector space can be formed by the combination of basis vectors (unit vectors). In deep learning, the operators are the basis vectors.

Although the operators can combine together to form any complex network, building complex networks by merely directly using operators is not practical, is tedious and is error-prone. We need a higher-level (larger granularity) components that it's more convenient to use, more easy to share and more easy to organize. This is the purpose of the Module class. We will introduce the Module class later.