Docker layers are content-addressable. So the hashes are not entirely opaque. Th...

MereInterest · on March 18, 2023

The local storage is mostly a solved problem with hard links. Any modern file system (I.e. not NTFS) can have arbitrarily many file paths that refer to the same underlying file, with no more overhead than a normal file system.

fuzzy2 · on March 18, 2023

I was referring to the network transfer process concerning the overhead of single-file transfers.

MereInterest · on March 18, 2023

The comment to which you were replying mentioned both the excessive local disk usage and the excessive network transfer, and so your comment appeared to apply to both portions. This is why I started my comment by explicitly restricting it to the case of local disk usage.

rfoo · on March 19, 2023

For hard links to work you still need to know that the brand new layer you just downloaded is same as something you already have, i.e. running a deduplication step.

How? Well, the most simple way is compute the digest of the content and look it up, oh wait :thinking:

MereInterest · on March 19, 2023

I’m not sure what point you’re trying to make. Are you assuming that a layer would be transferred in its entirety, even in cases where the majority of the contents are already available locally? The purpose of bringing up hard links was to state that when de-duplication is done at a per-file granularity rather than a per-layer granularity, it doesn’t introduce a ru time overhead.

rkeene2 · on March 18, 2023

AppFS [0] does deduplication at the file level and it works well for me.

[0] https://AppFS.net