I run a local Valhalla build cluster to power the https://sidecar.clutch.engineering routing engine. The cluster runs daily and takes a significant amount of wall-clock time to build the entire planet. That's about 50% of my CI time; the other 50% is presubmits + App Store builds for Sidecar + CANStudio / ELMCheck.
Using GitHub actions to coordinate the Valhalla builds was a nice-to-have, but this is a deal-breaker for my pull request workflows.
I found that implementing a local cache on the runners has been helpful. Ingress/egress on local network is hella slow, especially when each build has ~10-20GB of artifacts to manage.
ZeroFS looks really good. I know a bit about this design space but hadn't run across ZeroFS yet. Do you do testing of the error recovery behavior (connectivity etc)?
This has been mostly manual testing for now.
ZeroFS currently lacks automatic fault injection and proper crash tests, and it’s an area I plan to focus on.
SlateDB, the lower layer, already does DST as well as fault injection though.
GH actions templates don’t build all branches by default. I guess it’s due to them not wanting the free tier to use to much resources. But I consider it an anti-pattern to not build everything at each push.