20  Introduction to ONNX

Training is done. Now let’s deploy to production.

20.1 What is ONNX?

Open Neural Network Exchange — a standard format for neural networks.

flowchart LR
    TW[TensorWeaver] --> ONNX[ONNX Model]
    PT[PyTorch] --> ONNX
    TF[TensorFlow] --> ONNX
    ONNX --> ORT[ONNX Runtime]
    ONNX --> Mobile[Mobile]
    ONNX --> Web[Web/WASM]
    ONNX --> Edge[Edge Devices]

20.2 Why ONNX?

Problem ONNX Solution
Framework lock-in Portable format
Slow inference Optimized runtimes
Deployment complexity Universal format
Hardware variety Provider abstraction

20.3 ONNX Model Structure

An ONNX model is a computational graph:

ONNX Model
├── Graph
│   ├── Nodes (operations)
│   ├── Inputs (model inputs)
│   ├── Outputs (model outputs)
│   └── Initializers (weights)
└── Metadata (version, producer, etc.)

20.4 A Simple Example

Our temperature model in ONNX:

Inputs: celsius [batch, 1]
Initializers: weight [1, 1] = 1.8
              bias [1] = 32.0
Nodes:
  1. MatMul(celsius, weight) -> temp1
  2. Add(temp1, bias) -> fahrenheit
Outputs: fahrenheit [batch, 1]

20.5 ONNX Operators

ONNX defines standard operators:

Category Operators
Math Add, Sub, Mul, Div, MatMul
Activations Relu, Sigmoid, Tanh, Softmax
Normalization BatchNorm, LayerNorm
Shape Reshape, Transpose, Concat
Reduction ReduceMean, ReduceSum

Full list: onnx.ai/onnx/operators/

20.6 ONNX Runtime

The official inference engine:

import onnxruntime as ort

# Load model
session = ort.InferenceSession("model.onnx")

# Get input/output names
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

# Run inference
result = session.run(
    [output_name],
    {input_name: input_data}
)

ONNX Runtime is highly optimized: - Graph optimization - Operator fusion - Hardware acceleration (CPU, GPU, NPU)

20.7 Execution Providers

ONNX Runtime supports multiple backends:

# CPU (default)
session = ort.InferenceSession("model.onnx",
    providers=['CPUExecutionProvider'])

# NVIDIA GPU
session = ort.InferenceSession("model.onnx",
    providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

# Intel
session = ort.InferenceSession("model.onnx",
    providers=['OpenVINOExecutionProvider'])

# Apple Silicon
session = ort.InferenceSession("model.onnx",
    providers=['CoreMLExecutionProvider'])

20.8 Our Export Pipeline

flowchart LR
    TW[TensorWeaver Model] --> Trace[Trace Forward]
    Trace --> Build[Build ONNX Graph]
    Build --> Save[Save .onnx File]
    Save --> ORT[ONNX Runtime]

  1. Trace: Execute forward pass, record operations
  2. Build: Convert to ONNX nodes
  3. Save: Write to file
  4. Run: Use ONNX Runtime

20.9 Installing Dependencies

pip install onnx onnxruntime

20.10 Verifying ONNX Models

import onnx

# Load and check model
model = onnx.load("model.onnx")
onnx.checker.check_model(model)  # Raises if invalid

# Print model info
print(f"IR version: {model.ir_version}")
print(f"Opset: {model.opset_import[0].version}")
print(f"Inputs: {[i.name for i in model.graph.input]}")
print(f"Outputs: {[o.name for o in model.graph.output]}")

20.11 Summary

  • ONNX = portable neural network format
  • ONNX Runtime = fast inference engine
  • Export once, run everywhere
  • Supports CPU, GPU, mobile, edge

Next: Building the ONNX exporter.