Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know, I know, 'not every story needs to be about ML' but.... I can only imagine how unlocking the GIL will change the nature of ML training and inference. There is so much waste and complexity in passing memory around and coordinating processes. I know that libraries have made it (somewhat) easier and more efficient but I can't wait to see what can be done with things like pytorch when optimized for this.


It'll mostly help for debugging and lowering RAM (not VRAM) usage. Otherwise it won't impact ML much.


Pretty universally I have seen performance improvements in code when complexity is reduced and this could drop complexity considerably. I wouldn't be surprised to see a double digit percent improvement in tokens per sec when an optimized pytorch eventually comes out with this. There may even be hidden gains on GPU memory usage that come out of this as people clean up code and start implementing better tricks because of it.


Yeah, one of the dumbest things about Dataloaders running in a different process is that you are logging into the void.


huh?

Any python library that cares about performance is written in C/C++/Rust/Fortran and only provides a python interface.

ML will have 0 benefit from this.


Have you done any multi-gpu training? Generally every GPU gets a process. Coordinating between them and passing around data between them is complex and can easily have performance issues since normal communication between python processes requires some sort of serialization/de-serialization of objects (there are many * here when it comes to GPU training). This has the potential to simplify all of that and remove a lot of inter-process communication which is just pure overhead.


Of course ML will benefit from it. Soon you will be able to run your dataloaders/data preprocessing in different threads which will not starve your GPUs of data.


If you have done ML with PyTorch or Tensorflow you will know how much multithreading can improve data loading performance. Currently multiprocessing provides the necessary parallelization of data loading but it is painful and riddle with bugs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: