Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

R is hard to parallelise compared to python?

lapply(list, DoSomething)

To parallelise to 16 cores , rewrite as

mclapply(list, DoSomething, mc.cores = 16)

What’s the equivalent in python ?



Probably something like:

    with multiprocessing.Pool(core_count) as p:    
        p.map(do_something, list)


The parallel code structure looks very different from the standard for loops in python.

So there is a lot of rewriting to get things to work on parallel in python compared to R.

In R you just replace lapply with mclapply.


But, many R tools are already vectorised, so your shift from lapply() to mclapply() is about as fair a comparison as claiming it's "just" a shift from python's builtin map() to pool.map(). Anybody can play this game, and it's not helpful. I've been using+teaching R now for nearly seven years and the number of times I've used lapply can be counted on one hand.

> just

https://justsimply.dev/


There is also this in the futureverse if you like for loop style code more:

  library(doFuture)
  plan(multisession)

  y <- foreach(x = 1:4, y = 1:10) %dofuture% {
    z <- x + y
    slow_sqrt(z)
  }
https://dofuture.futureverse.org/


I use sapply all the time to transform data all the time. It tends to be less code (no counter, no output initialisation ) and easier to follow if that style is familiar.


I am curious now, what do you use instead of lapply (or other *apply variants)?


> just

This isn't documentation or a guide or helping someone.

It's a friendly competition between languages so it gets to use the perspective of someone that's familiar with things.


Serialisation/deserialisation code will bite you unless you are very careful.


so, only twice as much code!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: