> It is interesting what the result will be (average saving on deduplication) if...

ketralnis · on Jan 9, 2023

That's an interesting extension of the illegal numbers or coloured bits theories, but we don't really see it used that way in practise. When governments or media industry groups crack down on this stuff, they don't go after everybody that ever had those bits in memory. Maybe that's just for practical reasons, but we've never seen every router in between a buyer and seller get confiscated too as they've been somehow tainted. Honestly this doesn't seem like more than a dystopian mental exercise

https://en.wikipedia.org/wiki/Illegal_number https://ansuz.sooke.bc.ca/entry/23 https://shkspr.mobi/blog/2022/11/illegal-hashes/

klabb3 · on Jan 9, 2023

I’m not suggesting the hashes themselves are illegal to possess, but that transferring the bytes corresponding to those hashes is problematic: if both sides are lowly trusted, that puts you at risk as a hoster of that content. This is indeed an issue with IPFS, for instance, where I believe the solutions are “pinning” content that is already vetted by another party, or denylists of “bad bits”. I assume it’s similar to any other clearnet hosting. Btw, I make zero value judgments about all of that.

Off topic: I see downvotes on my parent comment, please let me know if I said something bad to help me improve.

HelloNurse · on Jan 9, 2023

Shared bytes could be construed in the opposite direction: if two or more of my users have the same chunk in their files, it is more likely to be some legal piece of data.

Files become piracy when there is evidence of intentional copyright infringement, for example when the chunk is part of a valid MPEG4 file and the MPEG4 file is titled "Wednesday_S2E4_FullHD_NetflixRip.MP4"

flipbrad · on Jan 9, 2023

Re last para: probably because it's full of very certain, but also quite certainly wrong, statements along the lines of "Under current legal doctrine, blobs need some form of chain of custody." Citation needed.

klabb3 · on Jan 9, 2023

I can see how that’s overly assuming. Thanks for being candid.

ketralnis · on Jan 9, 2023

It's not the illegalness I'm challenging, it's the problematicness. Maybe it is illegal to even think about those bit patterns. But I'm not aware of cases where people get _actually_ thrown in jail or fined for possessing or transmitting them. In all of the cases I know about there is intent involved.

somat · on Jan 9, 2023

It is hard to tell if this is what you are saying. But a common misconception of ipfs seems to be that you may end up hosting random unwanted files. this is untrue, you only end up hosting files you want.

chaxor · on Jan 9, 2023

Isn't the main use of bittorrent for ML and research data? Academic torrents is a wonderful resource and what every developer should be using if they need to provide their neural network weights, training data, etc. How is there any legal problem using bittorrent? It's simply much more tailored for this problem than http. It doesn't make any sense to talk about 'Legal problems' for torrent protocols.

vonseel · on Jan 9, 2023

What planet have you been living on? Bittorrent is widely used to distribute copyrighted material - movies, TV shows, games, programs, porn... I'd imagine a large majority of bittorrent traffic worldwide is pirated material, with a small portion being datasets as you describe, and other legally-shared data like actual Linux distros, etc.

chaxor · on Jan 20, 2023

I suppose there could be many things happening on the internet that we are unaware of; however, torrents are very good and specifically tailored as a protocol for scientific data and ML.

It solves the link-rot issues that occur due to moving institutions, it allows huge storage for essentially free (ever tried to store 9 TB of training data or CERN data on Dropbox?), and it scales extremely beautifully.

It's really the absolute perfect solution for reproducible research in large data studies.

AlfeG · on Jan 9, 2023

Torrents are no longer main source of copyrighted materials, at least for shows and movies. There is a bunch of illegal services that provide Netflix like experience against pirated content.

mandarax8 · on Jan 9, 2023

Don't these services usually use torrents under the hood though? Thinking about stremio and popcorntime.

loxias · on Jan 9, 2023

Now I feel old, using bittorrent and soulseek.

saagarjha · on Jan 9, 2023

If you’re distributing CSAM on your blob storage, and someone lets you know, you should probably remove it. This is independent of whether you distribute chunks or the whole file.

klabb3 · on Jan 9, 2023

I think for piracy/DMCA it’s enough to simply remove it. As for CSAM or more serious stuff, I don’t know if that’s enough? Does section 230 cover that? Is there a difference between being a company and an individual?