def add(a, b):
return a + b
def fancy_func(a, b, c, d):
e = add(a, b)
f = add(c, d)
g = add(e, f)
return g
print(fancy_func(1, 2, 3, 4))PyTorch / MXNet imperative — eager execution. Every line of Python issues a kernel and waits. Easy to debug, but costs you Python-loop overhead and prevents whole-graph optimization.
The fix: trace or script the model into a graph, then let the framework JIT-compile it (TorchScript, MXNet Hybridize, TF @tf.function, JAX jit). Result: 10–100× less Python overhead, plus operator fusion and memory-layout optimization.
Imperative execution: each line dispatches a separate kernel.
Imperative: Python-controlled, easy to print/debug, expensive per op. Symbolic: graph captured, compiled once, runs as fused kernels. Modern frameworks let you switch between modes:
def add_():
return '''
def add(a, b):
return a + b
'''
def fancy_func_():
return '''
def fancy_func(a, b, c, d):
e = add(a, b)
f = add(c, d)
g = add(e, f)
return g
'''
def evoke_():
return add_() + fancy_func_() + 'print(fancy_func(1, 2, 3, 4))'
prog = evoke_()
print(prog)
y = compile(prog, '', 'exec')
exec(y)Build the same MLP as a regular module, then opt into graph mode (PyTorch: torch.jit.script; MXNet: HybridSequential.hybridize(); TF: @tf.function):
from d2l import mxnet as d2l
from mxnet import np, npx
from mxnet.gluon import nn
npx.set_np()
# Factory for networks
def get_net():
net = nn.HybridSequential()
net.add(nn.Dense(256, activation='relu'),
nn.Dense(128, activation='relu'),
nn.Dense(2))
net.initialize()
return net
x = np.random.normal(size=(1, 512))
net = get_net()
net(x)Wall-clock benchmark, eager vs hybridized. The exact ratio depends on model size and op count, but the win is usually substantial:
A graph is portable: save it once, load and run from C++, mobile, or another language without Python in the loop. Models in production almost always ship the graph form:
The compiled module exposes its computation graph for inspection (or further optimization):