Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AI

Can The AI Industry Continue To Avoid Paying for the Content They're Using? (yahoo.com) 196

Last year Marc Andreessen's firm "argued that AI companies would go broke if they had to pay copyright royalties or licensing fees," notes a Los Angeles Times technology columnist.

But are these powerful companies doing even more to ensure they're not billed for their training data? Just this week, British media outlets reported that OpenAI has made the same case, seeking an exemption from copyright rules in England, claiming that the company simply couldn't operate without ingesting copyrighted materials.... The AI companies also argue what they're doing falls under the legal doctrine of fair use — probably the strongest argument they've got — because it's transformative. This argument helped Google win in court against the big book publishers when it was copying books into its massive Google Books database, and defeat claims that YouTube was profiting by allowing users to host and promulgate unlicensed material. Next, the AI companies argue that copyright-violating outputs like those uncovered by AI expert Gary Marcus, film industry veteran Reid Southern and the New York Times are rare or are bugs that are going to be patched.
But finally, William Fitzgerald, a partner at the Worker Agency and former member of the public policy team at Google, predicts Google will try to line up supportive groups to tell lawmakers artists support AI: Fitzgerald also sees Google's fingerprints on Creative Commons' embrace of the argument that AI art is fair use, as Google is a major funder of the organization. "It's worrisome to see Google deploy the same lobbying tactics they've developed over the years to ensure workers don't get paid fairly for their labor," Fitzgerald said. And OpenAI is close behind. It is not only taking a similar approach to heading off copyright complaints as Google, but it's also hiring the same people: It hired Fred Von Lohmann, Google's former director of copyright policy, as its top copyright lawyer....

[Marcus says] "There's an obvious alternative here — OpenAI's saying that we need all this or we can't build AI — but they could pay for it!" We want a world with artists and with writers, after all, he adds, one that rewards artistic work — not one where all the money goes to the top because a handful of tech companies won a digital land grab. "It's up to workers everywhere to see this for what it is, get organized, educate lawmakers and fight to get paid fairly for their labor," Fitzgerald says.

"Because if they don't, Google and OpenAI will continue to profit from other people's labor and content for a long time to come."

This discussion has been archived. No new comments can be posted.

Can The AI Industry Continue To Avoid Paying for the Content They're Using?

Comments Filter:
  • Yes (Score:5, Interesting)

    by DrMrLordX ( 559371 ) on Monday January 15, 2024 @08:39AM (#64159413)

    Yes. So long as any one individual human has the ability to read/consume published works of any kind, that human can feed an LLM, and there's very little anyone can do to stop that from happening.

    • You might not be able to stop it happening in a private lab or even private LLM, but if you want to make money from it, you bet it can be stopped.

      The question really becomes how far will law makers go? Europe has put its line in the sand, and since OpenAI is in the USA, what the USA does next is what's going to really matter here.

      • by gweihir ( 88907 )

        Indeed. There is also an exception for research. But as soon as you monetize, you are crossing a big, fat, red line.

      • You might not be able to stop it happening in a private lab or even private LLM, but if you want to make money from it, you bet it can be stopped.

        So imagine this is the case, and meanwhile [insert country here] ignores this particular form of repressing learning, and leaps ahead in GPT/LLM ML systems, and possibly even to AGI. Because while GPT/LLM ML isn't AGI (or even AI) by any means, we can be absolutely certain that learning from the broadest possible training data will be involved when that tech arriv

      • If you obfuscate output to the point that it isn't rote regurgitation of material line-by-line, I do t think anyone can legally stop an LLM from using someone's IP as training data.

    • So he is something an individual can do. Artists can poison their image data before posting, it means an additional step in post processing, and it means that any ML collecting their date from an artists images, will find that their data models are seriously messed up (thanks to the team working on this from the university of Chicago:

      https://www.technologyreview.c... [technologyreview.com]

      https://amt-lab.org/reviews/20... [amt-lab.org]

  • by Bruce66423 ( 1678196 ) on Monday January 15, 2024 @08:41AM (#64159417)

    So the idea that they have an inherent right to a part of this new flow is not necessarily rational; books and newspapers were not created on the basis that they would gain an income flow from being used to develop AI.

    So how do we decide? The instinctive reaction, to kick the tech companies because everyone hates them and we're Philistines if we don't do all we can to support the arts, means that we're in danger of assuming that they have a right to a totally new revenue source - despite the damage that will do to the development of AI.

    • Re: (Score:3, Insightful)

      by gweihir ( 88907 )

      It is really a lot easier: The LLM makers did copy copyrighted material and then processed it in a fashion decidedly not covered by fair use. That means the product of that processing is illegal and must be deleted. That they did all that commercially makes them liable financially and likely makes the whole thing criminal. As OpenAI has this process as its very business model, OpenAI may well be a criminal enterprise in addition.

      • by Luckyo ( 1726890 ) on Monday January 15, 2024 @09:42AM (#64159583)

        So does a human brain. To process anything, you must first commit it to memory. Notably we already had the same argument with browser caches.

        The answer remains no. Your insane interpretation of copyright to keep competition down remains factually and objectively wrong.

        Copyright does not, and has never limited ability to learn from copyrighted material. That is expressly not touched ever, everything that was ever created by humans was an interpreted processing of something that was created by other humans in the past. It's how all mammalian learning processes work. And that is also how BD ML processes work.

        And it is true that throughout the history, upper managerial class strived to prevent just that from happening. Upcoming smart individuals among the rabble actually learning things, and therefore being able to replace the managers with much more talented people who weren't in the upper class clique. It's one of the defining features of totalitarian left today, who invent entire systems to block talented people who can do what they do better and with less societal damage from advancing in social class. And primary method of advancement in social class is learning from existing things.

        It's therefore no surprise that it's usually the same people who advocate covering learning processes with copyright also advocate for things like DEI, as all of these things have the same primary goal in mind. Locking in social classes, allowing only those vetted to be utterly unthreatening to rise and ensuring that no talented usurpers rise up.

      • In what way is the way they processed it not covered by fair use? Creating a large database of the information in the text (whatever the format) has been found to be fair use already. Reading the material, and using the knowledge gained to create new things has also. This lies about half way between the two, and I donâ(TM)t see any reason why it wouldnâ(TM)t be found to be fair use too.

      • The LLM makers did copy copyrighted material and then processed it in a fashion decidedly not covered by fair use.

        That's not clear at all. Image search engines download images and process them to create thumbnails, which they then make available in image search results, That was ruled transformative enough to be worth a fair-use exception (e.g. by Perfect 10 vs Amazon & Google [wikipedia.org]). Google did a similar thing with Google Books, except they did keep copies, and that still got ruled transformative enough to be fair use (Authors Guild vs Google [wikipedia.org])

        It seems to me creating an LLM from copyrighted text is one hell of a lot more

    • by john83 ( 923470 )
      Maybe I'm old-fashioned, but I hate copyright law more than I hate the tech industry.
      • by gtall ( 79522 )

        Maybe you hate it because you have not produced any copyrightable material that you spent a lot of time, money, and effort producing only to have some bot or company take it and take the profit you had counted on obtaining.

        • Until eighteen months ago no creative would be counting on any income from AI using their material. That's what I mean by a 'new revenue source'. Thus it can be argued that this new income has the legitimacy as the new income that authors and film makers have gained because their creatures in Congress extended the duration of copyright, i.e. zero.

          • I think the question the court is more likely to ask is whether the AI is going to make money in place of the original author. If you write a great work like Principia Mathematica that contains cutting edge information on a topic, and is the one source for all this information together; and then OpenAI gets all the revenue for reading it and then handing out that information, youâ(TM)re likely to be pissed off.

            That said, if I had an amazing memory, read your book, and started handing out the informati

            • It in no way - unless subverted into doing so - trots out the same material as it has been trained on. It does something very other. No normally behaving AI as we have them today would ever be done for plagiarism.

    • by Entrope ( 68843 ) on Monday January 15, 2024 @09:54AM (#64159631) Homepage

      Your argument is very similar to saying that FTX should have been allowed to steal client funds because they were applying it to a business that didn't exist when laws against bank fraud were created.

      Coming up with a new business model that relies on breaking the law does not excuse that law-breaking.

      • The discussion here is whether the use for training AI - a totally new role for the copyright material - constitutes fair use or not. The FTX scam is clearly separating its owner from their money, something which has always been defined as theft. Copyright - by definition - is talking about something that is not lost when it is copied.

    • It's a complex issue, but at the core, you are right that they want to be a part of the new flow. I'm not actually opposed to it.

      I think it was Toyota that has a small group of trades people who know how to do things manually and build things by hand. They employ them so the knowledge is not lost by simply automating everything.

      It's extremely complicated and in no way is it easy to figure out how to pay humans who technically might not be 'needed' for the job anymore. However, I think it conceptually a good

      • That's an extreme assumption, and one that leaves creatives looking less than worthwhile.

        • Most creatives are less than worthwhile.

          I think it is very few 'creatives' who actually get to make money on their creativity. A lot of creatives make their bread and butter on things that are more routine, and then use their creativity in small slices or other projects.

          An artist might get a job make boring corporate diagrams or generic game/movie imagery. I think in a lot of those cases 'AI' could produce something 'good enough' for most uses. I'd still find it a sad an artist would not be able to make a

  • Yes (Score:4, Informative)

    by nospam007 ( 722110 ) * on Monday January 15, 2024 @08:43AM (#64159423)

    The bought the books, the newspapers, the magazines ..and they AI read them and remembers.

    Like every single kid.

    • Re:Yes (Score:4, Interesting)

      by gweihir ( 88907 ) on Monday January 15, 2024 @08:52AM (#64159443)

      You are confused. Machines are different from humans. The law is _very_ clear on that. Anybody with two actually working brain-cells is too.

      • by dfghjk ( 711126 )

        Says the guy who claims OpenAI is a "criminal enterprise", a real legal expert. LOL. Where is the law "very clear" that machines are different from humans? Citations please. Hell, corporations are people, according to "law".

        And are you claiming that you do NOT have two working brain cells, or that you are not human? Asking for a friend so that he may know how to target you in his lawsuit.

      • No you're confused: humans are machines. Messy biochemical machines to be sure, but made of atoms and like any machine performing energy + material in ==> work + calculation + products + waste out. No magic, soul, spirit, whatever involved, just atoms and physics. And we understand some of the machinery and can change it.

        Just a couple years ago, we sent instructions for the nucleus to produce a new product, so as to update our antivirus software. A century ago we hacked the blood sugar regulation mechani

      • The law is, but not in relation to this issue. The two working brain cells bit⦠can you point out which part of the brain makes it something more than a complex well optimised machine?

      • The law is _very_ clear on that.

        The law has not once ruled on the machine concept of learning and it's relation to copyright infringement. It's not documented in actual law nor case law. There's nothing clear about at all, and so far no legal cases have actually come up addressing this issue (instead they all focus on whether a machine can own copyright or the legal impact on the user).

        Stop pretending your fever dream is reality. Every post of you've made here in this story is unsubstantiated rubbish. The world doesn't revolve around your

      • You are confused. Machines are different from humans. The law is _very_ clear on that. Anybody with two actually working brain-cells is too.

        If it's _very_ clear you should have no problem whatsoever citing relevant laws and or case law. Of course you won't support your baseless claims in this way because such evidence doesn't actually exist.

      • Please quote the relevant law. I can't find it. There's copyright law about fixing works in a physical form, but that would require the data of the model to resemble the object being copied, which they don't. There's precedence in law rulings that says that we can do special things with data storage and processing that means that indexing massive amounts of data is NOT a violation of copyright though.

    • by Njovich ( 553857 )

      You are talking about thousands of nodes that received copies of this data. If these are 'kids', as you see them, are you saying that I can legally buy 1 book, make thousands of copies, and hand them over to kids? And these kids are then free to make their own derivative works and sell those?

      • by gweihir ( 88907 )

        That nicely sums it up. The AI fanatics have no working minds...

      • "You are talking about thousands of nodes that received copies of this data. If these are 'kids', as you see them, are you saying that I can legally buy 1 book, make thousands of copies, and hand them over to kids? "

        Not you, but people called 'libraries'.

    • You lost me at the very first step. AI can't buy anything, including books, because its not a person, and doesn't represent a person.

    • by Calydor ( 739835 )

      Show me a kid (or an adult, not picky on this point) who can flawlessly recite the millions of books, newspapers, images, movies, etc. they have read, seen, and so on.

  • by gweihir ( 88907 ) on Monday January 15, 2024 @08:51AM (#64159441)

    And delete the models based on their massive campaign of commercial intellectual theft.

    • Please delete your own memory of this topic. We're tired of listening to your rubbish. Incidentally rubbish that couldn't exist if you didn't steal the story into your brain in order to write an (un)informed opinion on it.

  • Why not send the LLMs to school, and have them learn like humans?

    I mean, people going to school don't have to worry about copyright or any of that nonsense when they are learning that data. The copyright is fulfilled by the purchase of the textbooks and other materials. Just because a computer learns "faster" than a person, and can answer questions "faster" than a person, I don't see a fundamental difference in what these LLMs are doing versus some teenager just reading the internet all day.

    I'm not a huge

    • Re:Alternative (Score:5, Interesting)

      by Bert64 ( 520050 ) <bert AT slashdot DOT firenzee DOT com> on Monday January 15, 2024 @09:00AM (#64159467) Homepage

      Well another part of copyright is that the author can license the work for specific purposes. These textbooks are licensed for teaching humans, but not for teaching machines - thus you'd need to negotiate a different license with the publisher.
      The textbooks are also not supposed to be used alone, they are generally meant to be accompanied by a live teacher and sometimes practical demonstrations.

      • These textbooks are licensed for teaching humans

        Wow I never knew that. Does that mean I'm in violation if I use a textbook to balance a table?

      • by Sloppy ( 14984 )

        Authors can license textbooks instead of selling them, but do they?

        I guess I wouldn't be surprised if kids these days (yes, I'm old) are agreeing to EULAs when they open their textbook apps. But I know for sure that tens of millions of people still alive today, purchased textbooks instead of licensing them. If those textbooks still exist, then the knowledge is attainable without any contracts, so there's no means of discriminating against computers.

        Just avoid the weird textbooks (ones that require special s

        • by Bert64 ( 520050 )

          Authors can license textbooks instead of selling them, but do they?

          Yes, read the fine print. Just because you bought the physical book doesn't mean they don't try to restrict what you can do with the contents of it.

      • They can license it for specific purposes, but only when the purpose violates copyright in the first place. It doesnâ(TM)t seem likely that this does violate copyright to me.

    • by dfghjk ( 711126 )

      Schools contribute to learning but learning doesn't end there.

      "I don't see a fundamental difference in what these LLMs are doing versus some teenager just reading the internet all day."
      Exactly. Some of those teenagers have photographic memories, yet there are no dumbasses arguing that they should have to pay more.

      Faced with the idea that machines can think like humans and replace humans for many, even most, tasks, the natural reaction would be to consider what life would be like if we didn't have to work.

    • by gweihir ( 88907 )

      Machines cannot "learn like humans". Seriously.

  • by Bert64 ( 520050 ) <bert AT slashdot DOT firenzee DOT com> on Monday January 15, 2024 @09:03AM (#64159481) Homepage

    not one where all the money goes to the top because a handful of tech companies won a digital land grab

    Only that's already the status quo with copyright. The vast majority of profits from selling copyrighted works go to a small handful of companies. These companies continue to control the copyrights long after the original author is dead as is anyone who was around when the work was first released.

    We want a world with artists and with writers, after all, he adds, one that rewards artistic work

    A system with excessively long copyrights does not reward artistic work, it rewards sitting on your ass creating absolutely nothing new and living off royalties from something created 50+ years ago. If you want to reward artistic work, copyright needs to be much shorter, and then there would be much more public domain content that AI could ingest.

  • Isn't this literally what writing is? Labor that other people profit from? Isn't that what content is? A way of conveying knowledge.

    The objection is that "the enemy" is benefiting, the enemy defined as a corporation with money. If you don't want anyone to learn from your effort, don't create content. AI is using content in exactly the way it is intended to be used, and that makes people mad.

  • by jenningsthecat ( 1525947 ) on Monday January 15, 2024 @09:08AM (#64159489)

    ... claiming that the company simply couldn't operate without ingesting copyrighted materials ...

    "Yes, we're stealing from creators, but... but... but... Innovation! Bizness! Benefactors! We're above the considerations which bind mere average citizens! We're a corporation!

    I really don't think my characterization exaggerates much, if at all. The fact that these clowns not only think like that, but also say it aloud, demonstrates just how delusional, megalomaniacal, and outright dangerous they are.

    • by gweihir ( 88907 )

      Indeed. And their deranged fanbois (also here) are no better.

    • The US Constitution is very clear - congress is given the power:

      'to promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries'

      If the effect of the contested material is to PREVENT the progress of science and useful arts because the copyright holders are demanding too much, then the purpose of the power is frustrated.

      You are falling for the idea that the copyright owners - very often big corporation

  • Not "Are using" -"Have used". Once trained on the content, it is of no further value to AI, and can safely be discarded. What is a fair price for a one-time analysis, or "reading" of a copyrighted piece?

    • by gweihir ( 88907 )

      Simple: If you did use my content without asking, the "fair" price is anything I want and I can insist on them deleting my data from their model in addition. The only exception would be fair use. That clearly does not apply.

      • Fair Use probably does apply unless they're reproducing copyrighted output... having seen the work during training isn't a technical violation of copyright. As we are about to see in the courts.
    • If itâ(TM)s not stored it cannot be recalled by a computer.
  • by Rosco P. Coltrane ( 209368 ) on Monday January 15, 2024 @09:20AM (#64159515)

    it's transformative. This argument helped Google win in court against the big book publishers when it was copying books into its massive Google Books database, and defeat claims that YouTube was profiting by allowing users to host and promulgate unlicensed material.

    Look up Family Guy on Youtube: you will find hundreds of really long videos with almost all episodes of Family Guy, interspersed with bogus content from some video game, and/or with the video zooming in and out constantly to fool the Youtube copyright bot.

    I'd argue that's transformative: some dude took a bunch of Family Guy episodes and turned them into really annoying, barely watchable compilations. They don't really resemble the original episodes. And yet Google keeps taking them down (and the dudes posting them keep re-uploading them, and it's been going on for years...)

    How come Google gets away with scanning books almost verbatim, but dudes uploading heavily fucked up TV series episodes are copyright violators?

    I'll tell you how: Google has MONEY and plenty of lobbyists in Washington. So does Microsoft. And that's how both Google and Microsoft will somehow be allowed to develop their AI businesses without paying anyone anything, while you can't download a 70-year-old Disney cartoon without paying royalties. Mark my words!

  • The AI might be transformative; but do they delete all their data as soon as they get it, or do they keep a copy of their training set data?

  • Would probably achieve that goal
  • The AI companies also argue what they're doing falls under the legal doctrine of fair use — probably the strongest argument they've got — because it's transformative

    Ok, cool. All I have to do is take the latest hot movie, copy it, transcode it, add a screen of text of why the studios hated all the strikes, and re-publish it as I wish, because it's "transformative". Perfectly legal by their logic.

  • ...will happen. Big media companies' lawyers will lobby for their share of the profits while big IT companies' lawyers will lobby to keep as much of the profits for themselves. Either way, artists are going to suffer if/when demand for their original works declines as a result of being replaced more cheaply by GenAI output. Let's not fool ourselves for a minute that this is about poor starving artists.
  • Whilst western entities predictably waste their energy trying to slow down our collective AI efforts with regulation and greedy application of copyright laws, China and others are quietly beavering away, not giving a flying fuck about any of it.

    If we don't stop this nonsense, the next AI leap will come from China and we're not going to be ready for it.

  • AI companies are currently far from profitable - OpenAI for example is burning through cash at a furious rate, and their path to profitability is based on what seems to me like ludicrous projections of future paying customers, given the amount of competition building up out there. Sooner or later, the AI bubble is going to pop, and then we'll see what sustainable business models will survive. A copyright-holder cash-grab could bring that forward if successful, adding a potentially massive cost to the alread

  • by rsilvergun ( 571051 ) on Monday January 15, 2024 @10:27AM (#64159709)
    there's about 500 families that make up the 1% of the 1% and own basically everything (or at least a controlling share of it, but honestly we're splitting hairs at that point).

    They own the media conglomerates that in turn own the IP that the AI that they also own is using. They'll be a little back and forth among them but they'll work out deals and that'll be that.

    Us peons? We'll own nothing and be happy [wikipedia.org], right?
  • by GeLeTo ( 527660 ) on Monday January 15, 2024 @10:54AM (#64159763)
    Even if U.S. courts deem the current practice of data scraping for LLM training illegal, the laws will change very quickly. Imagine a situation where China has access to the smartest LLMs and USA is limited to only AI trained on public domain data.
  • If you kill 10 people, you're a mass murderer. Kill 10,000, and you're a conqueror.

    Google Books, YouTube, and AI prove copyright infringement is OK, as long as done en masse.

  • It's all about stealing from people. Make your own training data.
  • If they are reproducing copyrighted output, then they'll get in trouble and have to pay. If they only train on it, and training is found to be fair use, because it doesn't actually reproduce a copyrighted work nor deprive the author of the ability to make money on their work, then it will continue to avoid paying for it, yes.
  • Current political powers are not interested in regulating new business. They all receive money through PACs and back channels from the people making the technology. There will eventually be an era of reform, but that can't happen until the damage is done.

  • Regardless of ones opinions about all this AI shit notion copyright should be extend to imposition of constraints on the reader... you can't profit from having read my book unless you agree to reimburse me... is an extraordinarily breathtaking change in policy.

    At present copyright applies entirely to "write" operations (e.g. public performances, fixed verbatim and derivative copies). It explicitly does not apply to "read" operations including underlying facts and ideas. Copyright regime has no province ov

  • by honestmonkey ( 819408 ) on Monday January 15, 2024 @03:09PM (#64160513) Journal
    I need to rob banks for my business to be profitable, there's just no way to do it without robbing banks. So I need to continue to rob banks for what I do to be a viable business. People can complain about "laws" and "stealing", but it's my business, so what can I do?

Make sure your code does nothing gracefully.

Working...