flowchart LR
TW[TensorWeaver] --> ONNX[ONNX Model]
PT[PyTorch] --> ONNX
TF[TensorFlow] --> ONNX
ONNX --> ORT[ONNX Runtime]
ONNX --> Mobile[Mobile]
ONNX --> Web[Web/WASM]
ONNX --> Edge[Edge Devices]
20 Introduction to ONNX
Training is done. Now let’s deploy to production.
20.1 What is ONNX?
Open Neural Network Exchange — a standard format for neural networks.
20.2 Why ONNX?
| Problem | ONNX Solution |
|---|---|
| Framework lock-in | Portable format |
| Slow inference | Optimized runtimes |
| Deployment complexity | Universal format |
| Hardware variety | Provider abstraction |
20.3 ONNX Model Structure
An ONNX model is a computational graph:
ONNX Model
├── Graph
│ ├── Nodes (operations)
│ ├── Inputs (model inputs)
│ ├── Outputs (model outputs)
│ └── Initializers (weights)
└── Metadata (version, producer, etc.)
20.4 A Simple Example
Our temperature model in ONNX:
Inputs: celsius [batch, 1]
Initializers: weight [1, 1] = 1.8
bias [1] = 32.0
Nodes:
1. MatMul(celsius, weight) -> temp1
2. Add(temp1, bias) -> fahrenheit
Outputs: fahrenheit [batch, 1]
20.5 ONNX Operators
ONNX defines standard operators:
| Category | Operators |
|---|---|
| Math | Add, Sub, Mul, Div, MatMul |
| Activations | Relu, Sigmoid, Tanh, Softmax |
| Normalization | BatchNorm, LayerNorm |
| Shape | Reshape, Transpose, Concat |
| Reduction | ReduceMean, ReduceSum |
Full list: onnx.ai/onnx/operators/
20.6 ONNX Runtime
The official inference engine:
import onnxruntime as ort
# Load model
session = ort.InferenceSession("model.onnx")
# Get input/output names
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
# Run inference
result = session.run(
[output_name],
{input_name: input_data}
)ONNX Runtime is highly optimized: - Graph optimization - Operator fusion - Hardware acceleration (CPU, GPU, NPU)
20.7 Execution Providers
ONNX Runtime supports multiple backends:
# CPU (default)
session = ort.InferenceSession("model.onnx",
providers=['CPUExecutionProvider'])
# NVIDIA GPU
session = ort.InferenceSession("model.onnx",
providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
# Intel
session = ort.InferenceSession("model.onnx",
providers=['OpenVINOExecutionProvider'])
# Apple Silicon
session = ort.InferenceSession("model.onnx",
providers=['CoreMLExecutionProvider'])20.8 Our Export Pipeline
flowchart LR
TW[TensorWeaver Model] --> Trace[Trace Forward]
Trace --> Build[Build ONNX Graph]
Build --> Save[Save .onnx File]
Save --> ORT[ONNX Runtime]
- Trace: Execute forward pass, record operations
- Build: Convert to ONNX nodes
- Save: Write to file
- Run: Use ONNX Runtime
20.9 Installing Dependencies
pip install onnx onnxruntime20.10 Verifying ONNX Models
import onnx
# Load and check model
model = onnx.load("model.onnx")
onnx.checker.check_model(model) # Raises if invalid
# Print model info
print(f"IR version: {model.ir_version}")
print(f"Opset: {model.opset_import[0].version}")
print(f"Inputs: {[i.name for i in model.graph.input]}")
print(f"Outputs: {[o.name for o in model.graph.output]}")20.11 Summary
- ONNX = portable neural network format
- ONNX Runtime = fast inference engine
- Export once, run everywhere
- Supports CPU, GPU, mobile, edge
Next: Building the ONNX exporter.