You just ruined my day. The post makes it sound like gel is now dead. The post by Vercel does not give me much hope either [1]. Last commit on the gel repo was two weeks ago.
> There has been a ton of interest expressed this week about potential community maintenance of Gel moving forward. To help organize and channel these hopes, I'm putting out a call for volunteers to join a Gel Community Fork Working Group (...GCFWG??). We are looking for 3-5 enthusiastic, trustworthy, and competent engineers to form a working group to create a "blessed" community-maintained fork of Gel. I would be available as an advisor to the WG, on a limited basis, in the beginning.
> The goal would be to produce a fork with its own build and distribution infrastructure and a credible commitment to maintainership. If successful, we will link to the project from the old Gel repos before archiving them, and potentially make the final CLI release support upgrading to the community fork.
It stays in on the hbm but it need to get shuffled to the place where it can actually do the computation. It’s a lot like a normal cpu. The cpu can’t do anything with data in the system memory, it has to be loaded into a cpu register.
For every token that is generated, a dense llm has to read every parameter in the model.
GPUs might not be bandwidth starved most of the time, but they absolutely are when generating text from an llm.
It’s the whole reason why low precision floating point numbers are being pushed by nvidia.
They are not even remotely equivalent. tinygrad is a toy.
If you are serious, I would be interested to hear how you see tinygrad replacing CUDA. I could see a tiny grad zealot arguing that it is gong to replace torch, but CUDA??
Have you looked into AMD support in torch? I would wager that like for like, a torch/amd implementation of a models is going to run rings around a tinygrad/amd implementation.
I few people have mentioned dagster and I took a look at that for some machine learning things I was playing with but I found dvc (data version control [1]) and I think it is fantastic. I think it also has more applications than just machine learning but really anything with data. If you have a bunch of shell scripts that write to files to pass data around, then dvc might be a good fit. it will do things like only rerun steps if it needs to.
Also for totally non-data stuff, Prefect is great.
I would assume that my private key for pricing my identity to random websites is different than that used to make financial transactions. Why would I ever elect to keep it the same?
reply