Shortly after joining the PyTorch compiler team, I was part of a team that decided that we should build our own tensor-expression compiler for PyTorch (called NNC, although it wasn’t well-publicized) instead of using an existing one like Halide or TVM.
We ended up sinking two years into it, and never ended up with a particularly good compiler (although we did absolutely crush a couple toy benchmarks).
Arguably both sides of that tradeoff were wrong, though, as the eventually successful PyTorch 2.0 compiler (TorchInductor) was based on Triton (plus some custom higher-level scheduling logic).
We ended up sinking two years into it, and never ended up with a particularly good compiler (although we did absolutely crush a couple toy benchmarks).
Arguably both sides of that tradeoff were wrong, though, as the eventually successful PyTorch 2.0 compiler (TorchInductor) was based on Triton (plus some custom higher-level scheduling logic).