Model Builder
While at system level, autodiff engine makes graph building possible, at user level, we need to provide a module-based system to allow user to build and share their their deep learning models easily and flexibly.
Distinguish between intermediate variables and model's parameters
There are three types of variables involved in computing inside the model:
- User's input: What the user feed into the model.
- Model's parameters: The parameters of the model. This is the part that model optimizes.
- Intermediate variables: The variables computed during the forward pass. They are temporary holders of the computed values from both user's input and model's parameters.
We need to distinguish these three types of variables because: Model needs to know what should be optimized and what should not be optimized.
In TensorWeaver, we use Parameter
to represent model's parameters. Parameter is a subclass of Tensor
. TensorWeaver will automatically track all parameters, optimize them during training and save them to the final model.
Module
In deep learning framework, computation is organized in the form of operator
s. For general purpose, these operators are generic and small-grained. User can use these operators to build any complex model they want. But one tedious thing is that, users always want to think in different abstraction levels. For example, when building a transformer model, users think in terms of layers
and blocks
. But the operators in framework are small-grained and think in terms of matrix multiplications and activations. Modules are designed to solve this problem. Instead of using small-grained operators directly, users can define their own modules at a higher level of abstraction. So the final model is more readable and easier to understand.
In other point of view, modules are big-grained operators. It build from small-grained operators. In compare with real world, modules are more like pre-built tubes while operators are like raw materials such as sand, water, cement and iron.