I know, I know, 'not every story needs to be about ML' but.... I can only imagin...

ipsum2 · on July 13, 2024

It'll mostly help for debugging and lowering RAM (not VRAM) usage. Otherwise it won't impact ML much.

jmward01 · on July 13, 2024

Pretty universally I have seen performance improvements in code when complexity is reduced and this could drop complexity considerably. I wouldn't be surprised to see a double digit percent improvement in tokens per sec when an optimized pytorch eventually comes out with this. There may even be hidden gains on GPU memory usage that come out of this as people clean up code and start implementing better tricks because of it.

imtringued · on July 13, 2024

Yeah, one of the dumbest things about Dataloaders running in a different process is that you are logging into the void.

veber-alex · on July 13, 2024

huh?

Any python library that cares about performance is written in C/C++/Rust/Fortran and only provides a python interface.

ML will have 0 benefit from this.

jmward01 · on July 13, 2024

Have you done any multi-gpu training? Generally every GPU gets a process. Coordinating between them and passing around data between them is complex and can easily have performance issues since normal communication between python processes requires some sort of serialization/de-serialization of objects (there are many * here when it comes to GPU training). This has the potential to simplify all of that and remove a lot of inter-process communication which is just pure overhead.

KeplerBoy · on July 13, 2024

Of course ML will benefit from it. Soon you will be able to run your dataloaders/data preprocessing in different threads which will not starve your GPUs of data.

bdd8f1df777b · on July 14, 2024

If you have done ML with PyTorch or Tensorflow you will know how much multithreading can improve data loading performance. Currently multiprocessing provides the necessary parallelization of data loading but it is painful and riddle with bugs.