Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

OpenAI Says It Has Evidence DeepSeek Used Its Model To Train Competitor (theverge.com) 118

Posted by msmash on Wednesday January 29, 2025 @10:00AM from the how-about-that dept.

OpenAI says it has evidence suggesting Chinese AI startup DeepSeek used its proprietary models to train a competing open-source system through "distillation," a technique where smaller models learn from larger ones' outputs.

The San Francisco-based company, along with partner Microsoft, blocked suspected DeepSeek accounts from accessing its API last year after detecting potential terms of service violations. DeepSeek's R1 reasoning model has achieved comparable results to leading U.S. models despite claiming minimal resources.

This discussion has been archived. No new comments can be posted.

OpenAI Says It Has Evidence DeepSeek Used Its Model To Train Competitor

Load All Comments

Search 118 Comments Log In/Create an Account

Comments Filter:

LOL (Score:5, Funny)

by Anonymous Coward writes: on Wednesday January 29, 2025 @10:02AM (#65127571)

Sore losers who stole the internet to train their models complain they are treated unfairly.

Share
twitter facebook
- Re: LOL (Score:4)
  
  by fluffernutter ( 1411889 ) writes: on Wednesday January 29, 2025 @10:30AM (#65127685)
  
  Came here to say exactly this
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by thrasher thetic ( 4566717 ) writes:
  
  "I think everyone involved in this story should die." I miss Norm MacDonald.
  - Re: (Score:2)
    
    by narcc ( 412956 ) writes:
    
    That strikes me as harsh...
- Re: (Score:3)
  
  by Calydor ( 739835 ) writes:
  
  Exactly this. They've spent the past two years telling us to just get over it; maybe they should just get over it.
- Re: (Score:1)
  
  by FlipperPA ( 456193 ) writes:
  
  1000000000%. Karma's a bitch, Altman.
- Re: (Score:2)
  
  by rabun_bike ( 905430 ) writes:
  
  Exactly my thoughts. Stolen data stolen again.
- Re: (Score:2)
  
  by locater16 ( 2326718 ) writes:
  
  Us internet old timers have a saying for such times and feelings, to quote for brevity "lol"
Intellectual property? (Score:5, Funny)

by wed128 ( 722152 ) writes: on Wednesday January 29, 2025 @10:09AM (#65127595)

Is OpenAI saying that DeepSeek trained on its intellectual property without permission? How dare they! I've never known any AI proponent to be flippant about IP rules.

Share
twitter facebook
- Re:Intellectual property? (Score:5, Funny)
  
  by Registered Coward v2 ( 447531 ) writes: on Wednesday January 29, 2025 @10:13AM (#65127619)
  
  Is OpenAI saying that DeepSeek trained on its intellectual property without permission? How dare they! I've never known any AI proponent to be flippant about IP rules.
  OpenAI stole it fair and square, just ask them.
  
  Parent Share
  twitter facebook
  - Re:Intellectual property? (Score:5, Interesting)
    
    by AmiMoJo ( 196126 ) writes: on Wednesday January 29, 2025 @10:32AM (#65127693) Homepage Journal
    
    How would OpenAI even be able to differentiate stealing from them and stealing from the same sources OpenAI stole from?
    
    Parent Share
    twitter facebook
    - Re:Intellectual property? (Score:5, Interesting)
      
      by jenningsthecat ( 1525947 ) writes: on Wednesday January 29, 2025 @10:57AM (#65127769)
      
      How would OpenAI even be able to differentiate stealing from them and stealing from the same sources OpenAI stole from?
      OpenAI stole the corn to make the mash. DeepSeek stole some of OpenAI's partially-distilled bourbon and turned it into high-proof alcohol.
      
      Parent Share
      twitter facebook
      - Re: (Score:3)
        
        by Mr. Dollar Ton ( 5495648 ) writes:
        
        So, basically, if it reeks of moonshine, knocks me out and gives me headaches for the next three days, it is OpenAI, if it doesn't smell, burns my throat and puts me in the toxicology department, it is DeepSeek?
        
        Re: (Score:2)
        
        by jenningsthecat ( 1525947 ) writes:
        
        Brilliant - I'm still laughing! Thanks!
    - Re: (Score:3)
      
      by fleeped ( 1945926 ) writes:
      
      But but but, we were told that an AI "looking at the data" is the same as a human looking at the data/output, so it's normal and legal and not stealing, as long as reproduction is not identical. World's smallest violins concerto has begun.
      - Re: (Score:1)
        
        by easyTree ( 1042254 ) writes:
        
        World's smallest violins concerto has begun.
        OK, I know it's not the '90s but this deserves an acronym - who's with me?
    - Re: (Score:2)
      
      by itzdandy ( 183397 ) writes:
      
      same way a teacher can tell which kids copied which kid's homework.
  - Re: (Score:2)
    
    by poofmeisterp ( 650750 ) writes:
    
    Is OpenAI saying that DeepSeek trained on its intellectual property without permission? How dare they! I've never known any AI proponent to be flippant about IP rules.
    OpenAI stole it fair and square, just ask them.
    I think it's a knee-jerk reaction and beyond my control, so please don't hate me. I felt the uncontrollable desire to yelp, "RIGGED!"
- Re: (Score:2)
  
  by null etc. ( 524767 ) writes:
  
  It's all about the Terms of Service, baby
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    If I ever meet you I'll control-alt-delete you!
- Re: (Score:2)
  
  by dfghjk ( 711126 ) writes:
  
  OpenAI isn't "flippant about IP rules", they claim ro respect "IP rules". It's others that claim they aren't.
  If OpenAI can demonstrate that their product was used improperly then they can take action, just as others do against them. IP law is probably not sufficient for AI, but OpenAI could very well be right in all these cases despite their evilness.
  - Re: (Score:2)
    
    by ebcdic ( 39948 ) writes:
    
    "If OpenAI can demonstrate that their product was used improperly then they can take action, just as others do against them."
    There is a legal doctrine called "clean hands".
  - Re: Intellectual property? (Score:2)
    
    by Raisey-raison ( 850922 ) writes:
    
    I may be the only person supporting OpenAI here but they have a lot of legitimate points.
    First, training using people's data is not literally copying it. The whole point of copyright law is to get people to create things that won't be immediately copied and thus cause the original creator to not create again.
    If you write a book, you do it to sell books, not to sell training data. And saying that you lack the money from selling it for training purposes is circular reasoning.
    Second, we all read things and tra
- Re:Intellectual property? (Score:5, Insightful)
  
  by supremebob ( 574732 ) writes: <themejunky&geocities,com> on Wednesday January 29, 2025 @10:28AM (#65127669) Journal
  
  China in general doesn't really seem to care about foreign intellectual property rules. Their entire MO seems to be copying existing designs and finding ways to produce them more cheaply.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Interesting)
    
    by martin-boundary ( 547041 ) writes:
    
    China in general doesn't really seem to care about foreign intellectual property rules. Their entire MO seems to be copying existing designs and finding ways to produce them more cheaply.
    
    Reminds me of America, circa 1891 [wikipedia.org].
    - - Re: (Score:1)
        
        by easyTree ( 1042254 ) writes:
        
        On the plus side, now with DeepSeek etc there's still hope for the majority of the people of the world that they'll be able to afford to buy electricity in the future. Thanks China.
  - Re: (Score:1)
    
    by mcouper ( 128103 ) writes:
    
    A friend of mine recently said of the global economy: America innovates, Japan makes it smaller, China makes it cheaper. For a bit of context, he's a naturalized citizen from a southeast Asian country, so there's likely some bias, but it not as much against the US.
  - Re: (Score:2)
    
    by Xylantiel ( 177496 ) writes:
    
    Not that I'm an apologist for China, but the other way to look at this is that the Western IP rules are a little overboard. It was supposed to be the case that if you see what a product does and independently come up with a way to do the same thing, then the patent doesn't apply. (That seems like a pretty workable definition of "obvious to someone skilled in the art", or the required specificity of the patent.) The whole point of the patent is that the patent document is used to implement the solution.
    - Re: (Score:1)
      
      by easyTree ( 1042254 ) writes:
      
      That's an open market working as it should.
      Which is the problem. Any country which needs to have the phrase 'home of the free' in its theme tune should be under suspicion about the levels of actual freedom available there.
      And I'm looking at you too 'bargin' bin in overpriced-shit store.
  - Re: (Score:2)
    
    by poofmeisterp ( 650750 ) writes:
    
    China in general doesn't really seem to care about foreign intellectual property rules. Their entire MO seems to be copying existing designs and finding ways to produce them more cheaply.
    Agreed. My brain sort of divides two known Asian countries we buy a lot from (others omitted to reduce info overload and comment length).
    Japan - think, innovate, practice, sell, reduce costs to be able to do business with whatever means you feel are necessary, but play fair on the market.
    China - take ideas, copy them, change them a hair but not much, cheapen the labor costs associated with them to the n'th degree, always be "lowest cost".
    Referencing the second, I don't think they know how to invent or inno
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Either an LLM is intellectual property, then the training data is too. Or both are not. You can really not have it both ways.
  Incidentally, we already know LLM answers do not have copyright protection on their own. They just may fall under the _original_ copyright of the training data and then the LLM becomes illegal.
  - Re: (Score:3)
    
    by AvitarX ( 172628 ) writes:
    
    It's about the unauthorized use of the API I think, not the intellectual property of the output.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      That is how it gets framed. But the question is whether OpenAI can actually legally limit the API use in this way.
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        How does the question become that at all?
        I have every right to limit the TOS on an API that accesses even public domain data.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        No, you do not. Seriously. This is not the dark ages. You can limit illegal activity and you can limit load. You cannot limit what is done over the API unless it endangers your tech. To limit actual use of the data gotten over an API, you need to have an intellectual property claim.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        No, you do not.
        Yes, I 100% do....
        This is not the dark ages.
        No, it's not... But that doesn't change the fact that you're wrong, lol
        You can limit illegal activity and you can limit load.
        What in the 9 hells are you talking about?
        This is outright false.
        You cannot limit what is done over the API unless it endangers your tech.
        Wrong.
        To limit actual use of the data gotten over an API, you need to have an intellectual property claim.
        Correct- and that's where you lost the plot.
        
        Being unable to limit what you do with the public domain data you retrieve over my API does not mean I cannot limit your access to my API. Period. Full stop. To claim otherwise is to be wrong.
        
        Re: (Score:1)
        
        by easyTree ( 1042254 ) writes:
        
        ProTip: Writing 'period' / 'full stop' / 'end of discussion' is clearly false if it needs to be put out there. A self-evident truth which cannot be countered might just do that.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        ProTip: Writing 'period' / 'full stop' / 'end of discussion' is clearly false if it needs to be put out there.
        Wrong. Wanna guess which logical fallacy you just used?
        A self-evident truth which cannot be countered might just do that.
        We aren't talking about self-evidence, or truths.
        We are talking about a fact of law. Period. Full stop. End of discussion.
        
        Re: Intellectual property? (Score:1)
        
        by easyTree ( 1042254 ) writes:
        
        Wanna guess which logical fallacy you just used?
        Yes and
        I refer you back to GP.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        If I say that E=mc^2, period, full stop, end of discussion- have I disproven General Relativity?
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        A fact need not be self evident to be a fact.
        Stating that a fact is, indeed a fact, and cannot be argued, does not say anything about the validity of said fact.
        Someone is, of course, more than free to ignore my statement of finality, and continue to be wrong.
        
        The statement in question can be reduced formulaically.
        
        GP asserts, "A = B".
        I assert that "since A = 1, and B = 2, then A does not equal B, period, full stop."
        
        It is only inappropriate to claim finality in the case where that which is asserted is
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        The former doesn't benefit from the magic fairy dust 'period' / 'full stop' / 'end of discussion' and that the fairy dust isn't really magic so neither does the latter; so why bother?
        Now this, I agree with.
        To express a level of confidence that may lead the GP to re-evaluate their broken logic.
        
        In the case where:
        Argument-step-is-final is in fact, final, (i.e., you are attempting to argue an immutable fact, which is fallacious in itself) adding the statement of finality is just argument frosting. I won't argue that at all. You just can't make any value assessment of what precedes it based upon it, and that goes in both directions.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Which immutable fact are you arguing again because I'm pretty sure you don't get to be the judge of that.
        Yes, I do.
        The immutable fact is thus:
        Being unable to limit what you do with the public domain data you retrieve over my API does not mean I cannot limit your access to my API. Period. Full stop. To claim otherwise is to be wrong.
        This was a logic failure on GP's part.
        He confused the data behind a gateway, and the gateway itself. These things are not the same.
        
        There is no law or jurisprudence that that makes this conflation, it is therefor flatly false.
        
        Engaging with people who try to argue with reality is a pointless endeavor. If you don't see that, it's because you're not an intelligent human.
        
        Re: (Score:2)
        
        by poofmeisterp ( 650750 ) writes:
        
        Just let it flap its fingers/lips. It's not about correct or incorrect. It's about having the last say before everyone else quits fighting. It's somehow proven to the BSer that they are above all others and have all of the information anyone needs. It's exactly like narcissism but completely different. ;)
        
        Re: Intellectual property? (Score:2)
        
        by bohmt ( 900463 ) writes:
        
        You misunderstood what he was saying. He can limit the access to his private API even if it's used to access public data. He didn't say he can limit access to public data.
- Re: (Score:2)
  
  by kwelch007 ( 197081 ) writes:
  
  Is OpenAI saying that DeepSeek trained on its intellectual property without permission? How dare they! I've never known any AI proponent to be flippant about IP rules.
  Well, kind of. But it's not quite that OpenAI is complaining that DeepSeek used their IP. I mean, I'm sure they're not thrilled about that part, but what I really think they're saying is that DeepSeek utilized OpenAI/ChatGPT's training/compute to significantly decrease the overhead of training DeepSeek. By doing so, DeepSeek's claim to have reduced the cost of their model by so much is disingenuous. It's not that DeepSeek trained for less resources, but rather they offloaded that resource cost to OpenAI
  - Re: (Score:2)
    
    by Local ID10T ( 790134 ) writes:
    
    Well, kind of. But it's not quite that OpenAI is complaining that DeepSeek used their IP. I mean, I'm sure they're not thrilled about that part, but what I really think they're saying is that DeepSeek utilized OpenAI/ChatGPT's training/compute to significantly decrease the overhead of training DeepSeek. By doing so, DeepSeek's claim to have reduced the cost of their model by so much is disingenuous. It's not that DeepSeek trained for less resources, but rather they offloaded that resource cost to OpenAI.
    THIS is the point they are making: the claims made by DeepSeek do not represent the actual costs of training.
    DeepSeek : "Look, we did better than everyone else, while using less resources!"
    OpenAI: "Yah, no. You built that on top of what we had already built."
- Re: (Score:2)
  
  by hawk ( 1151 ) writes:
  
  >I've never known any AI proponent to be flippant about IP rules.
  Indeed.
  An idea even more preposterous than suggesting Chinese intellectual property theft!
  hawk
Wasn't that known already? (Score:3)

by aaarrrgggh ( 9205 ) writes: on Wednesday January 29, 2025 @10:12AM (#65127607)

I thought the whole point of what DeepSeek did was based on qualifying tokens with other models first for their training; is this really a surprise?

Share
twitter facebook
Sauce for the goose (Score:2)

by beanpott ( 929294 ) writes:

Sauce for the gander.
Outrageous! (Score:4, Funny)

by greytree ( 7124971 ) writes: on Wednesday January 29, 2025 @10:28AM (#65127675)

Imagine taking open information and using it for private gain.

What sort of cunt would do that Sam Altman.

Share
twitter facebook
Alternate headline (Score:3)

by dskoll ( 99328 ) writes: on Wednesday January 29, 2025 @10:33AM (#65127699) Homepage

Company That Stole Others' Intellectual Property Accuses DeepSeek of Stealing its Stolen Intellecutal Property.

Share
twitter facebook
- Re: (Score:2)
  
  by hyades1 ( 1149581 ) writes:
  
  Oh, the irony!
  - Re: (Score:1)
    
    by boggin4fun ( 1422043 ) writes:
    
    I wonder if OpenAI will defend their Chinese customer against copyright claims if they are the source of the claim. LOL. https://techcrunch.com/2023/11... [techcrunch.com]
Copies of a copy (Score:2)

by TheStatsMan ( 1763322 ) writes:

"AI" huh? Copies of a copy of a copy with a filter over the top. It's the functional equivalent of rearranging the words from the encyclopedia to write your term paper. At least the teacher's made you cite a source.
$600 billion of shareholder value wiped out and... (Score:5, Insightful)

by greytree ( 7124971 ) writes: on Wednesday January 29, 2025 @10:47AM (#65127749)

... nothing of value was lost.

Au contraire, as DeepSeek appears to be vaguely Open Source, the decent people of the world have gained at the expense of shitheads like Altman.

Share
twitter facebook
What a tragedy... (Score:4, Funny)

by fuzzyfuzzyfungus ( 1223518 ) writes: on Wednesday January 29, 2025 @10:48AM (#65127751) Journal

I'm shocked, shocked, that the AI bros are suddenly against aggressive scraping and piously concerned with intellectual property rights...

I would be far from surprised if the claim is accurate; but it's really, really, hard to care when Altman wants to sit and rent seek because 'safety' means that 'Open'AI need to remain a subscription blackbox; while those sinister thieving chinese are delivering something where you don't have to trust the vendor(and anyone who does trust the vendor by providing information worth having to their hosted service is stupid) because it's small enough to run local if you must.

I'm honestly not sure that this claim will even help his stock price that much: certainly easier to try to enlist state assistance if you can wrap your objection to your competitor in IP law; rather than having a competitor who has made a breakthrough in efficiency; but if it's impractical or impossible to offer API access to your model while preventing distillation by a modestly committed attacker that essentially guarantees that your fancy oh-so-special-give-me-all-the-money model will be shadowed closely by copies that are good enough to be concerning and a lot cheaper; and even in jurisdictions that care you'll have to whack the moles one at a time and for local use or in jurisdictions that don't you won't even be able to do that.

Share
twitter facebook
robots.txt (Score:5, Funny)

by El_Muerte_TDS ( 592157 ) writes: on Wednesday January 29, 2025 @10:59AM (#65127781) Homepage

Then they should have added DeepSeek to their robots.txt

Share
twitter facebook
- Re: robots.txt (Score:3)
  
  by spinitch ( 1033676 ) writes:
  
  Genie back in the bottle. Easy fix.
I can't believe that ... (Score:2)

by devlp0 ( 897273 ) writes:

ChatGPT lost its job to AI!
They are just trying to white-wash their own theft (Score:2)

by gweihir ( 88907 ) writes:

Because if what these Chinese did (or may not have done) is a crime against OpenAI, then what OpenAI did in stealing all its training data would be legal. Obviously, the latter has yet to be established. We already know that LLM answers do not have copyright or claims to originality, legally. We also know that of the massive data theft that OpenAI did is ruled illegal, the whole LLM becomes an illegal possession and then the Chinese model would have stolen form the original copyright holders. Maybe.
- Re: (Score:3)
  
  by dgatwood ( 11270 ) writes:
  
  Because if what these Chinese did (or may not have done) is a crime against OpenAI, then what OpenAI did in stealing all its training data would be legal. Obviously, the latter has yet to be established. We already know that LLM answers do not have copyright or claims to originality, legally. We also know that of the massive data theft that OpenAI did is ruled illegal, the whole LLM becomes an illegal possession and then the Chinese model would have stolen form the original copyright holders. Maybe.
  Training data, assuming it is published on a public website, is presumed to be for public consumption unless gated by a security mechanism or legal language that makes it clear that it cannot be used for specific purposes. OpenAI's API is published with very specific terms of use. So at a bare minimum, what DeepSeek is accused of doing would be a violation of contract law, whereas one would hope OpenAI did not take similar shortcuts.
  Additionally, if DeepSeek actually obtained OpenAI's model and distilled
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Training an LLM on data is not "public consumption". It is commercial use and that is fundamentally different.
    For example, since you seem to ab a bit dense, you can read my personal web-page and even print it out. But if you want to print and then sell it or use it commercially in any other way, you need permission from me, unless fair use applies. Fair use does not apply when the data is used to parametrize a machine commercially. Please stop making absolute beginner's mistakes.
    - Re: (Score:2)
      
      by dgatwood ( 11270 ) writes:
      
      Training an LLM on data is not "public consumption". It is commercial use and that is fundamentally different.
      That's for the courts to decide. Until they do, that's speculation. And it seems very unlikely that training an open source model would be considered commercial use, so taken on its face, your statement is clearly too broad to be correct.
      Also, commercial use actually varies widely from jurisdiction to jurisdiction. Is a book cover commercial use? In general no, at least in the U.S., but in other countries, maybe. Why? Because it is seen as primarily artistic. So there are huge grey areas here, and as
      - Re: (Score:2)
        
        by BadDreamer ( 196188 ) writes:
        
        "Open source" and "commercial" are not opposites, or even in conflict. In this case, the software used is both. As to the legality, precedence thus far is that this kind of use is infringing. That may change, but right now that's not speculation, but legal reality, as far as the US goes.
        And training a model IS exact copying. The copying is done from the medium the data is on into memory, and in present day copyright law, that is exact copying.
        
        Re: (Score:2)
        
        by dgatwood ( 11270 ) writes:
        
        "Open source" and "commercial" are not opposites, or even in conflict. In this case, the software used is both. As to the legality, precedence thus far is that this kind of use is infringing. That may change, but right now that's not speculation, but legal reality, as far as the US goes.
        I'm unaware of any such precedents. In fact, most court cases over this have been dismissed by the courts, which suggests that the courts are leaning pretty strongly in the opposite direction. I can only find evidence of a single case related to model training that has even made it to to trial (The Intercept Media and Raw Story Media v. OpenAI), and that one has been going on for about a year and just got through with pre-trial motions. We won't have any real binding precedent at this rate for probably a
        
        Re: (Score:2)
        
        by BadDreamer ( 196188 ) writes:
        
        No cases on model training have made it, but cases about companies vacuuming up data have. Google Books is the most prominent one, and in the ruling of that, it was made clear that they were narrowly allowed because they provided a valuable service by making the books searchable, and they did not make any other use of the copyrighted material.
        And no, copying a computer program to memory only falls under fair use if it is for a human to read it. There is no such fair use clause for other uses.
        
        Re: (Score:2)
        
        by dgatwood ( 11270 ) writes:
        
        No cases on model training have made it, but cases about companies vacuuming up data have. Google Books is the most prominent one, and in the ruling of that, it was made clear that they were narrowly allowed because they provided a valuable service by making the books searchable, and they did not make any other use of the copyrighted material.
        Google Books indexes books that people have to pay for, and in doing so, makes snippets of the original content available to people who may or may not have paid for it. OpenAI trains a model with content that anyone can get for free with a web browser, and except to the extent that the AI might occasionally regurgitate some small piece of its training data, it does not make snippets of the original content available to anyone. Those two usages of data have approximately nothing in common in common other t
Imagine (Score:2)

by Iamthecheese ( 1264298 ) writes:

Not even Natalie Portman wants to join this propaganda campaign. In a recent interview she was recorded saying, "I would rather have cold grits than help the tech press hide its shame by misdirecting around the massive advance represented by DeepSeek. OpenAI couldn't have made that advance with a Beowulf cluster of Deep Minds.
Let's get to the real story (Score:3)

by AlanObject ( 3603453 ) writes: on Wednesday January 29, 2025 @11:14AM (#65127829)

25 posts in and they are all about how OpenAI has no right to complain after its own IP practices. Yeah, I agree. Weren't they supposed to be nonprofit to begin with? In other words the "Open" part of OpenAI?
But that aside, to what extend does this indicate that Deep Seek's viral claim of being able to do train an o1-level model at a fraction the cost is either fake or, at best, over-hyped? That's what the big market-crash story was all about isn't it?

Share
twitter facebook
- Re: (Score:2)
  
  by martin-boundary ( 547041 ) writes:
  
  The original source material from human civilization since the start of time is priceless. If you're going to try to inflate the physical cost of training an AI by the actual value of the data sources, then OpenAI's claims of 100m+ training costs are wildly underestimated to the point of being meaningless.
  When people calculate training costs they only mean the immediate cost that they would have paid to build the system, at the market rates of the time.
But Maybe Not (Score:5, Informative)

by AndrewZX ( 9173721 ) writes: on Wednesday January 29, 2025 @11:27AM (#65127865)

Independent tests show different answers than OpenAI: https://arstechnica.com/ai/202... [arstechnica.com]

Share
twitter facebook
I'm still waiting for India to reveal their model (Score:2, Funny)

by EnsilZah ( 575600 ) writes:

DeepSikh.
Is that a problem? (Score:3)

by Cley Faye ( 1123605 ) writes: on Wednesday January 29, 2025 @11:49AM (#65127929) Homepage

All the people behind DeepSeek have to say is "our work can not be done without violating copyright and terms of services, so we had to do it, you see?". That would absolve them of any and all issues, right? right?

Share
twitter facebook
Pot, meet kettle (Score:2)

by nospam007 ( 722110 ) * writes:

Long time no see.
A great disturbance in the force (Score:2)

by WaffleMonster ( 969671 ) writes:

I feel a great disturbance in the force as if OpenAI shareholders suddenly cried out in terror and suddenly went broke.
Oh the glorious missed first post. (Score:2)

by Petersko ( 564140 ) writes:

Saw 40 replies, which meant at least 35 people have already stated the blindingly obvious, and 5 said, "Came here to say this!"
So I'll add a question. Could they try to defend their IP in courts against downstream consumption by other models, while simultaneously ignoring their hypocrisy? I mean... of course they can. I withdraw my question.
- Re: (Score:2)
  
  by Pinky's Brain ( 1158667 ) writes:
  
  They wouldn't dare go to an actual court, it would be way too dangerous to specify damages because it would make all the lawsuits targeting them easier.
  If they go the legal route, it will be via ITC.
- Re: (Score:2)
  
  by LetterRip ( 30937 ) writes:
  
  So I'll add a question. Could they try to defend their IP in courts against downstream consumption by other models, while simultaneously ignoring their hypocrisy? I mean... of course they can. I withdraw my question.
  They aren't claiming a copyright violation, they are claiming a contractual violation - that Deepseek violated the terms of use of their API by allegedly using the API to generate training samples.
  People who don't like OpenAI are trying to claim copyright violations that OpenAI 'stole' copyrighted works. Under US law there is 'transformative fair use' - and machine learning models are pretty clearly transformative. So they aren't really the same thing, since in this case copyright isn't being asserted, just
Can't distill much without the chain of thought (Score:2)

by Pinky's Brain ( 1158667 ) writes:

You can finetune on turn based chat output to avoid having to pay the developing world sweatshops for chat finetuning, but that's useless for the chain of thought.
The R1 distills work so well exactly because R1 exposes the chain of thought. You can't distill the hard part from o1.
- Re: (Score:2)
  
  by narcc ( 412956 ) writes:
  
  The R1 distills work so well exactly because R1 exposes the chain of thought.
  
  That doesn't make any sense. o1 still produces that output, it just hides it from the user, contributing to the illusion that the system is "thinking". It wouldn't be hidden from the smaller model during distillation.
  Don't forget that CoT is not reasoning, it's just text that looks like reasoning. It's not special in any way.
  - Re: (Score:2)
    
    by Pinky's Brain ( 1158667 ) writes:
    
    The CoT output is a procedure suited to a LLM, it is special for LLMs. That's exactly why OpenAI hides it, because it would make distilling a CoT model from o1 possible.
    The R1 distills are distilled using the ....Wait...Wait...But Wait...Wait...Wait... output from R1. The distilled model learns to emulate the "thoughts". Without the "thoughts", they would need to run their entire reinforcement learning on the smaller models too. They would no longer be distills either.
    - Re: (Score:2)
      
      by narcc ( 412956 ) writes:
      
      OMG... No one in their right mind would try to distill o1 through the API, even if they had access to the hidden output. Don't be stupid.
      That output is hidden for one reason and one reason only: to contribute to the illusion that the system is 'thinking', as I explained. Not to prevent someone from trying to produce a (phenomenally expensive) distillation!
      You are clearly not qualified for this discussion.
      - Re: (Score:2)
        
        by Pinky's Brain ( 1158667 ) writes:
        
        If you read between the lines, no one is accusing them of hacking but just using the API and breaking contract. The API includes logprobs to the extent OpenAI feels like it, which varies with time.
        So you get the logits ... now what about the rest :
        "2.4. Distillation: Empower Small Models with Reasoning Capability
        To equip more efficient smaller models with reasoning capabilities like DeepSeek-R1, we directly
        fine-tuned open-source models like Qwen (Qwen, 2024b) and Llama (AI@Meta, 2024) using
        the 800k samples
        
        Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        no one is accusing them of hacking
        Not only did I make no such claim, I didn't reference "them" at all.
        Go gaslight someone else. I'm not interested in entertaining your delusions.
        
        Re: (Score:2)
        
        by Pinky's Brain ( 1158667 ) writes:
        
        You were saying it's impossible to distill from API output.
        With a mere 600k CoT outputs they are distilling effective CoT for R1 distills. With the same amount of data they could have done the same for the capability from OpenAI o1, but the API does not provide.
        So as I said in the first place "The R1 distills work so well exactly because R1 exposes the chain of thought. You can't distill the hard part from o1."
        
        Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        You were saying it's impossible to distill from API output.
        No, I said it would be stupid to do that. Learn how to read.
perils (Score:2)

by groobly ( 6155920 ) writes:

Well, duh, these are the perils of open source and freely available models. You want to help the world? Well, the world includes China. And, also, Russia and North Korea, btw.
- Re: (Score:2)
  
  by narcc ( 412956 ) writes:
  
  You want to help the world? Well, the world includes China. And, also, Russia and North Korea, btw.
  Here's the thing: People who actually hold those ideals are fully aware of this and they're okay with it. We know that when everyone benefits ... everyone benefits. It doesn't matter in the slightest that someone we don't like also benefits. It is always better that more people benefit than fewer. Withholding those benefits from people you like out of fear that someone you dislike will also benefit is irrational.
  The weird conservative argument that you shouldn't do something good because it might help
OpenAI is that incompetent? (Score:2)

by Fly Swatter ( 30498 ) writes:

If someone is using your 'proprietary' model's output to train itself, wouldn't your IT security department notice the huge number of queries from one general location?

Oh right, they do the same thing when scraping websites for training data. Oh well, live by the sword and all that..
Evidence (Score:2)

by tiananmen tank man ( 979067 ) writes:

So why not post this evidence, instead of just saying you have evidence
Take whatever you want ... (Score:3)

by Freischutz ( 4776131 ) writes: on Wednesday January 29, 2025 @01:24PM (#65128253)

OpenAI Says It Has Evidence DeepSeek Used Its Model To Train Competitor
US Tech-barons who scraped the internet for data with complete disregard for terms of service and copyright to train their AIs are complaining that a Chinese used their AI to train a competing Chinese AI? Let me see ... nope, not crying any rivers. What was is that motto these tech-barons live by "Take whatever you want, apologize later" ?? Funny how a mere apology did not suffice when people are pirating their media and software but now that they are ripping off the planet not even an apology is forthcoming. If the Chinese really did train up an AI that can compete with the best US tech-barons have to offer then my only reaction is: ... Mmmmmm ... Scandenfreude ...

Share
twitter facebook
Why so much hate on OpenAI? (Score:1)

by TheWho79 ( 10289219 ) writes:
So I have been pondering, why the hate on OpenAi?
- Because they used OpenCrawl data to build their db? (Possibly)
- Partially founded by Elon the terrible? (Possibly)
- Run by an openly gay man? (Not so much - if so, Apple would get some of this love as well)
- There approach is proprietary? (meh, who cares)
- They bellied up to the Microsoft bar and rolled around like a pig in slop? Most defiantly.
- From nonprofit to corp?
So recap: We hate OpenAi because they didn't ask permission to use our publicly acce
- Re: (Score:2)
  
  by VaccinesCauseAdults ( 7114361 ) writes:
  
  Well put. This whole episode reminds me of Apple and Microsoft both stealing Xerox PARC windowing innovations in the early 1980s. There is a famous meeting where Jobs accused gates of stealing Apple’s Macintosh window/icon/mouse/pointer user interface, and Gates replies something along the lines of “Well, Steve, I think it's more like we both had this rich neighbor named Xerox, and I broke into his house to steal the TV set and found out that you had already stolen it."
OpenAI is "OpenSource" (Score:2)

by FudRucker ( 866063 ) writes:

share the code, and all the data OpenAI scraped off the internet did not belong to OpenAI to begin with, so i have no sympathy to OpenAI's complaint
So? (Score:2)

by allo ( 1728082 ) writes:

There is no copyright on AI outputs. Yes, this breaks their ToS. So find and ban their accounts ...
Oh i've seen this part of the story before (Score:2)

by Gideon Fubar ( 833343 ) writes:

Big "there was a sniper on the united aerospace hangar that's why my rocket exploded" energy.
\o/ (Score:1)

by easyTree ( 1042254 ) writes:

Wtf, I was on the way back home from boosting a car and someone carjacked me, waaa.
- Re: China doing China (Score:2)
  
  by rrconan ( 1082759 ) writes:
  
  Typical USA arrogance, Keep your head in the sand while the world move ahead, it worked well with Japan car industry in the 90s, because "only merica knows how to make cars"
  - Re: (Score:1)
    
    by RitchCraft ( 6454710 ) writes:
    
    Ah, another CCP loyalist I see. Xi is happy you are making your post quota for today. Keep up the good work comrade.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

LOL (Score:5, Funny)

Re: LOL (Score:4)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Intellectual property? (Score:5, Funny)

Re:Intellectual property? (Score:5, Funny)

Re:Intellectual property? (Score:5, Interesting)

Re:Intellectual property? (Score:5, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: Intellectual property? (Score:2)

Re:Intellectual property? (Score:5, Insightful)

Re: (Score:2, Interesting)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: Intellectual property? (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Intellectual property? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Wasn't that known already? (Score:3)

Sauce for the goose (Score:2)

Outrageous! (Score:4, Funny)

Alternate headline (Score:3)

Re: (Score:2)

Re: (Score:1)

Copies of a copy (Score:2)

$600 billion of shareholder value wiped out and... (Score:5, Insightful)

What a tragedy... (Score:4, Funny)

robots.txt (Score:5, Funny)

Re: robots.txt (Score:3)

I can't believe that ... (Score:2)

They are just trying to white-wash their own theft (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Imagine (Score:2)

Let's get to the real story (Score:3)

Re: (Score:2)

But Maybe Not (Score:5, Informative)

I'm still waiting for India to reveal their model (Score:2, Funny)

Is that a problem? (Score:3)

Pot, meet kettle (Score:2)

A great disturbance in the force (Score:2)

Oh the glorious missed first post. (Score:2)

Re: (Score:2)

Re: (Score:2)

Can't distill much without the chain of thought (Score:2)