Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why do you think that functional programming results in data duplication? I would think it's rather the opposite.

With strong immutability like in Haskell, you can share values even between threads and can avoid defensive copying. Two versions of the same immutable tree-like data structure can also share part of their representation in memory.

(Haskell has other problems causing increased memory usage, but not related to data duplication in my mind)



Why do you think that functional programming results in data duplication? I would think it's rather the opposite.

I suspect it has something to do with the perceptions around always returning a new thing from a function rather than the mutated input to the function. For example, if you need to mutate the property of an object based on other inputs, the perception is you would clone that object, modify the property appropriately, and then return the cloned object.

[Edit: formatting]


So, ELI5: What do you do instead, where you return a modified object, but still have immutability? Or do you avoid the problem by not trying to do that?


In most situations, you use persistent data structures where you only have to copy the modified leaves.

If you really need your type to be backed by a contiguous block of memory, you batch updates (stencils, SIMD, etc.)


But if you modify the leaf, don't you also have to modify the branch node that points to the leaf, so that it points to the new one? And every node from there to the root?


Yes, you (or rather, core library developers) need to pick a data structure appropriate to your access patterns: linked lists for iteration, search trees for concatenation, finger trees for random access, and so on. But you should be doing this anyway for clarity, even if you face no performance constraints.


Not the poster you’re responding to but I think they’re referring to the current need to allocate more memory when updating immutable data structures.

Not that there aren’t ways to represent mutability in Haskell, just that the de facto use of immutability causes excess allocation.


Efficient functional programming often uses tree-like data structures. These can be immutable but still avoid duplication.

Consider if you "duplicate" a Data.Sequence Seq (finger tree) for modification. You're not actually copying the whole structure, you are creating a new root node and re-using as much as possible of the common structure.

The end result is that a bit more memory is used in the simplest case, but not due to duplication I think.

The benefit is that a thread can make a modified value at cheaper cost without affecting another thread that is still using the original value. I also think it's easier for the programmer to understand the code.


Won't this still result in a lot of fragmentation? I.e. won't you have disjoint allocations for those new branches of the tree? Sounds pretty cache-unfriendly.


In a strict language or with poorly optimized lazy code, yes. If you can get good stream fusion, not really. If your code fuses really well (common for lists, harder elsewhere) the whole thing will happen on the stack.


That's duplicating part of the structure. That uses more memory than just modifying a value in-place, but less than duplicating the whole tree.


Sure, but let's assume that the program has more than one thread and that another thread could still be using the old value. In that case, an imperative program might be required to copy the whole structure or sleep until the existing users are done, which is often less efficient and is always more complicated.

If it's ok to support only a single concurrent user of the value, then a mutable structure is indeed more efficient. Even in Haskell we have mutable structures that can be used for such purposes.

The interesting question to me is, what should be the default? I think there is a good argument that it should be the safer, simpler immutable structures.


"Simpler". If you're only single-threaded, the mutable structures are simpler.

If you're multi-threaded, you have to choose: Do I go with immutable, which means that the other thread gets something safe? It also means that the other thread gets something stale (out of date), which can have its own consequences.

Or do I use a lock or mutex, which means that I have to place guards in all the right places, which is error-prone and can break in weird ways if I miss one?

Or do I use mutable data without guards, and just let it crash? (Because it's probably going to crash if I do that...)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: