I know, I know, 'not every story needs to be about ML' but.... I can only imagine how unlocking the GIL will change the nature of ML training and inference. There is so much waste and complexity in passing memory around and coordinating processes. I know that libraries have made it (somewhat) easier and more efficient but I can't wait to see what can be done with things like pytorch when optimized for this.
Pretty universally I have seen performance improvements in code when complexity is reduced and this could drop complexity considerably. I wouldn't be surprised to see a double digit percent improvement in tokens per sec when an optimized pytorch eventually comes out with this. There may even be hidden gains on GPU memory usage that come out of this as people clean up code and start implementing better tricks because of it.
Have you done any multi-gpu training? Generally every GPU gets a process. Coordinating between them and passing around data between them is complex and can easily have performance issues since normal communication between python processes requires some sort of serialization/de-serialization of objects (there are many * here when it comes to GPU training). This has the potential to simplify all of that and remove a lot of inter-process communication which is just pure overhead.
Of course ML will benefit from it. Soon you will be able to run your dataloaders/data preprocessing in different threads which will not starve your GPUs of data.
If you have done ML with PyTorch or Tensorflow you will know how much multithreading can improve data loading performance. Currently multiprocessing provides the necessary parallelization of data loading but it is painful and riddle with bugs.