Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> But as long as it needs to be trained on the work of humans it should not be allowed to displace those people it relied on to get to where it is. Simple as that.

Do you feel the same way about tools like Google Translate?



Tbh I'm not familiar enough with how Google Translate is built, but if it's ingesting tons of people's work without their permission so it can be used to replace them then yes I do.


For what it's worth: that's pretty much how Translate works.

Translate operates at a large-chunk resolution, and one of the insights in solving the problem was the idea that you can often get a pretty-good-enough translation by swapping a whole sentence for another whole sentence. So they ingest vast amounts of pre-translated content (the UN publications are a great source, because they have to be published in the language of every member nation), align it for sentence- and paragraph-match, and feed the translation engine at that level.

It's created an uncanny amount of accuracy in the result, and it's basically fed wholesale by the diligent work of translators who were not asked their consent to feed that beast. Almost nobody bats an eye about this because the value (letting people using different languages communicate with each other) grossly outstrips the opportunity cost of lost human translator work, and even the translators are, in general, in favor of it; they aren't going to be displaced because (a) it doesn't really work in realtime (yet), (b) it can't handle any of the deeper signal (body language, tone, nuance) of face-to-face negotiation, and (c) languages are living things that constantly evolve, and human translators handle novel constructs way better than the machines do (so in high-touch political environments, they matter; the machines have replaced translators in roles like "rewriting instruction manuals" that were always pretty under-served in the first place).


I would argue that Translate being fed by paid UN translators who likely agreed to the use of their transcriptions in a TOS or something is not an equal comparison to unpaid artists having their art submitted online to sites which become part of a training set used in for-profit models such as OpenAI, that they never consented to. OpenAI is a nonprofit parent company, but this spawned a child for-profit company OpenAI LP which most of their staff work for, which is meant to return many-fold returns to their shareholders who are effectively profiting from the labor of all the artists and sources in their training.


Google translate is very basic and not even close to something good if you already know both languages. Useful if you're translating to your language (you do the correction when reading), but can lead to confusion the other way.


Interesting distinction.

If you can do the correction when reading, it seems reasonable to assume the reader in the opposite direction has the same correction capability.

I would expect the chance of confusion to be identical. The only difference is a matter of perspective, where in one case you are the reader and in one case you are the author.


Yes, they are identical. But I believe the reader is better armed to deal with the confusion, or at least to recognize the error, because it does not fit it. But when producing, you don't know the target language, so there's a better chance for errors to slip in unnoticed.

It's better for me to receive a text in the original language and translate it myself than to try to decipher something translated automatically.


Vastly inappropriate comparison- there are millions of pages of text out of copyright, you can get a good translation engine using public domain.

That’s is not the case for art, vast majority of art used by midjourney is not public domain.


> vast majority of art used by midjourney is not public domain

Is that true? How did you establish that?


It's unfortunately also not great for translation. Language changes fast enough that training on content that went out of copyright is old data.


OpenAI has basically admitted it. Is OpenAI even disputing that it ingested all the works its being sued over? Not as far as I can tell.


Huh? You’re aware that midjouney and OpenAI are different things, right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: