Shortly after joining the PyTorch compiler team, I was part of a team that decid...

Shortly after joining the PyTorch compiler team, I was part of a team that decided that we should build our own tensor-expression compiler for PyTorch (called NNC, although it wasn’t well-publicized) instead of using an existing one like Halide or TVM.

We ended up sinking two years into it, and never ended up with a particularly good compiler (although we did absolutely crush a couple toy benchmarks).

Arguably both sides of that tradeoff were wrong, though, as the eventually successful PyTorch 2.0 compiler (TorchInductor) was based on Triton (plus some custom higher-level scheduling logic).