Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That freedom for many free licenses comes with the caveat that you provide basic attribution and the same freedom to your users.

LLMs don't (cannot, by design) provide attribution, nor do LLM users have the freedom to run most of these models themselves.





That is if you redistribute or make a derivative work. Applying learnings you made from such software does not require such attribution.

Here we are talking about derivative works, not "learnings".

In the first sentence "you" actually refers to you, a person, in the second you're intentionally cheating and applying it to a machine doing a mechanical transformation. One so mechanical that different LLMs trained on the same material would have output that closely resembles each other.

The only indispensable part is the resource you're pirating. A resource that was given to you under the most generous of terms, which you ignored and decided to be guided by a purpose that you've assigned to those terms that embodies an intention that has been specifically denied. You do this because it allows you to do what you want to do. It's motivated "reasoning."

Without this "FOSS is for learning" thing you think overrules the license, you are no more justified in training off of it without complying with the terms than training on pirated Microsoft code without complying with their terms. People who work at Microsoft learn on Microsoft code, too, but you don't feel entitled to that.


I'm not sure it's always bad intent. People often don't get that "machine learning" is a compound industrial term where "learning" is not literally "learning" just like "machine" is not literally "machine".

So it's sort of sentient when it comes to training and generating derivative works but when you ask "if it's actually sentient then are you in the business of abusing sentient beings?" then it's just a tool.


I think LLMs could provide attribution. Either running a second hidden prompt (like, who said this?) or by doing reverse query on the training dataset. Say if they do it with even 98% accuracy it would probably be good enough. Especially for bits of info where there's very few or even just one source.

Of course it would be more expensive to get them to do it.

But if it was required to provide attribution with some % accuracy, plus we identified and addressed other problems like GPL washing/piracy of our intellectual property/people going insane with chatbots/opinion manipulation and hidden advertisement, then at some point commercial LLMs could become actually not bad for us.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: