2026-04-19 engineering 6 min read

What BEST Routing Actually Saves

Empirical speedups from hybrid EML/EDL/EXL dispatch. The numbers, honestly reported.

The EML operator family has three members:

Each generates all elementary functions, but they have different node counts for different primitives. BEST (Binary Expression Select & Transform) routing is the idea of dispatching each operation to whichever operator computes it cheapest.

Node count table

These are verified minimum node counts — the exact number of EML-family calls required:

Function Pure EML BEST Savings
exp(x) 1 1
ln(x) 4 1 −75%
x + y 11 11
x × y 13 7 −46%
x ÷ y 15 1 −93%
9 5 −44%
sin(x) over ℂ 108 1 −99%

The big wins are ln(x) (75% savings via EXL), division (93% savings via EDL), and complex sin (99% via CBEST). Addition has no savings — it genuinely costs 11 nodes no matter which operator you use.

Wall-clock speedups

Node count savings translate to wall-clock speedups, but not uniformly. The crossover threshold is about 20% node savings — below that, the dispatch overhead exceeds the compute savings.

Workload Node savings Speedup
sin/cos Taylor (TinyMLP) 74% 2.8×
Polynomial x⁴+x³+x² 54% 2.1×
GELU (Transformer FFN) 18% 0.93×

GELU is a negative result, honestly reported: 18% node savings falls below the crossover threshold, and BEST routing is slower on GELU-heavy workloads. If you're building a transformer FFN replacement, BEST routing doesn't help.

Performance kernels

We also have a Rust-accelerated kernel and a fused Python kernel. These compound with BEST:

Backend ms/step Speedup
Standard Python 8.3
FusedEMLActivation 2.3 3.6×
Fused + torch.compile 1.9 4.4×
Rust (monogate-core) 1.4 5.9×

Benchmark: EMLLayer depth=2, 256→256, batch=1024, CPU. The Rust kernel uses PyO3 bindings.

What BEST routing is not

BEST routing is a node-count optimization within the EML framework. It doesn't make EML competitive with purpose-built implementations of individual functions. A native sin(x) in hardware is faster than any EML tree, even a 1-node complex CBEST.

What BEST routing does: reduces the cost of expressing arbitrary combinations of elementary functions as EML trees, when you're already committed to the EML framework for other reasons (symbolic manipulation, gradient-free search, interpretability).

Install: pip install monogate · Rust source: monogate-core/