Anthropic Launches Claude 3.5 Sonnet, Says New Model Outperforms GPT-4 Omni (anthropic.com) 34

Posted by msmash on Thursday June 20, 2024 @10:49AM from the intensifying-competition dept.

Anthropic launched Claude 3.5 Sonnet on Thursday, claiming it outperforms previous models and OpenAI's GPT-4 Omni. The AI startup also introduced Artifacts, a workspace for users to edit AI-generated projects. This release, part of the Claude 3.5 family, follows three months after Claude 3. Claude 3.5 Sonnet is available for free on Claude.ai and the Claude iOS app, while Claude Pro and Team plan subscribers can access it with significantly higher rate limits.

Anthropic plans to launch 3.5 versions of Haiku and Opus later this year, exploring features like web search and memory for future releases.

Anthropic also introduced Artifacts on Claude.ai, a new feature that expands how users can interact with Claude. When a user asks Claude to generate content like code snippets, text documents, or website designs, these Artifacts appear in a dedicated window alongside their conversation. This creates a dynamic workspace where they can see, edit, and build upon Claude's creations in real-time, seamlessly integrating AI-generated content into their projects and workflows, the startup said.

Anthropic Launches Claude 3.5 Sonnet, Says New Model Outperforms GPT-4 Omni

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 34 Comments Log In/Create an Account

Comments Filter:

Let's have "AI" write the summaries (Score:2)

by Mr. Dollar Ton ( 5495648 ) writes:

I just read one that has several things that look like trademarks of something, and zero information on what that something is, except that it is faster than some other thing.
That's not "news for nerds", that's not even and advertisement, it is just a piece of crap.
Your AI can do better, so let it.
- - Re: (Score:2)
    
    by Mr. Dollar Ton ( 5495648 ) writes:
    
    The /. editor proves you wrong :)
- Re: (Score:2)
  
  by Tony Isaac ( 1301187 ) writes:
  
  Yes, I think AI *could* probably do a better job for a lot of summarization-type work. Game recaps are another candidate.
"outperforms" lolz (Score:1)

by iggymanz ( 596061 ) writes:

what does "outperforming" even mean? what is the standard?
cut and paste marketing spew by startup golddiggers shouldn't be slashdot articles, especially AI bandwagon types
- Re: (Score:2)
  
  by war4peace ( 1628283 ) writes:
  
  what does "outperforming" even mean?
  It's also asking for a phone number before you can use it.
- Re: (Score:2)
  
  by seebs ( 15766 ) writes:
  
  there's a lot of standard "evals" that things are scored on, and the claim is that it performs better on a lot of those tests of functionality. that seems like a really easy and straightforward thing to find out, since it's right there in the article and in all the other news coverage about this.
  - Re: (Score:1)
    
    by iggymanz ( 596061 ) writes:
    
    Article has "standards" used by no one but money grubbing startups and have no bearing on ability to solve any real world problem. And graph at top with "intelligence" on axis is doubly stupid and worthless.
Artifacts (Score:2)

by Rei ( 128717 ) writes:

Artifacts sounds interesting. Sounds like it might finally be a step toward moving from having AI generate code snippets to be able to actually work on "projects", where you make your code base and and any design documents and bug reports available, and can have it implement desired features or bufixes from the documents a step at a time. The step after that would be the incorporation of agents which have the ability to run and control the program to test fixes.
Current AI tools are great at writing snipp
Artifact editing=acknowledgement of no context (Score:2)

by iAmWaySmarterThanYou ( 10095012 ) writes:

It's nice someone finally acknowledged that so called AI has no understanding of context and allows users to deep dive edit under the hood instead of making the user re-ask multiple times guessing at what the right question is.
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
- Re: (Score:2)
  
  by nightflameauto ( 6607976 ) writes:
  
  Let me stand on my soapbox and vent. I'm getting so lost with all the marketing words with these products. There was a previous post about python versioning being based on a timestamp. I hope this becomes an industry standard. Forget about the new names that have no relevance to their functionality, what does 3.5 even mean? Is it 50% better than 3.0 or 50% done from version 4.0?
  Don't worry. In a couple years, humans won't need to know anything. So all these numbers can just be interpreted by the AI for us and we can watch Ow! My Balls! in peace.
Failed. (Score:3)

by backslashdot ( 95548 ) writes: on Thursday June 20, 2024 @11:39AM (#64564143)

where are the u's in blueberry?
Claude 3.5: There are two 'u's in "blueberry":
bl[u]eb[u]rry
They are located in the first and third syllables of the word.
How many r's are in strawberry?
Claude 3.5: There are two r's in "strawberry".

- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:3)
  
  by Rei ( 128717 ) writes:
  
  You, to a blind person: "Which is a darker shade of blue, a blueberry or the pokemon Squirtle?"
  Blind person: "Um... Squirtle?"
  You: "Hahaha, blind people are morons!"
  ---
  LLMs don't "see" letters. They only see tokens. These do not even map to word boundaries, let alone individual letters. Here's the sort of thing that they 'see' [huggingface.co] (scroll down past the special tokens at the top, just click to somewhere in the middle). Such as:
  "p ag",
  "pa g",
  - Re: (Score:2)
    
    by war4peace ( 1628283 ) writes:
    
    That was enlightening, however, I don't think average users would give a shit.
    When LLM developers market them as the best thing since hot water, and open them up to the public, the public couldn't care less why LLMs fail at basic questions. They don't know and don't care, nor should they.
    If the webpage where people ask questions gives stupid responses, people would laugh at it. Tokens? What's that? They want a proper answer, not a scientific explanation for the stupid answer.
    - Re: (Score:2)
      
      by war4peace ( 1628283 ) writes:
      
      And I fail at closing tags. Stupid me.
    - Re: (Score:2)
      
      by timeOday ( 582209 ) writes:
      
      Yeah, it's a current, and known, limitation. But not indicative of any larger or important issue. When it's resolved, nobody will say, "whew, I'm glad that's fixed. Now I'm a believer!!"
      If you were interested in actually doing this task, and aware of the issue (which of course should ideally not be necessary), try this:
      spell out the word blueberry and then tell me where the u's are
      ChatGPT:
      Sure, here is the word "blueberry" spelled out:
      B L U E B E R R Y
      The 'u' is in the second position.
      - Re: (Score:2)
        
        by Moridineas ( 213502 ) writes:
        
        Thank you for this. Yours is better. My attempt was "The word blueberry is spelled B l u e b e r r y. Where is the u?" (and that worked fine)
        This kind of "gotcha" post that dominates discussions on LLMs on slashdot is really interesting. I get the feeling there are many people who are just very upset that this technology exists, and who can't understand that there is a difference between "AI" and "LLM" and "General purpose intelligence." That's equally true of journalists, but I expected more nuanced unders
      - Re: (Score:2)
        
        by backslashdot ( 95548 ) writes:
        
        Yes it's a known issue, yet they haven't addressed it in years of knowing it's an issue. I agree it's easily solvable, yet they seem to have doubled down on not being concerned with it.
What? (Score:2)

by MachineShedFred ( 621896 ) writes:

I recognize a lot of the words being used in this nonsense, but when digested in-context, this gibberish is unintelligible.
Maybe someone, maybe a Slashdot Editor, could have included a sentence giving a bit more context?
- Re: (Score:2)
  
  by Rei ( 128717 ) writes:
  
  It's plain English. What are you having trouble with?
  - Re: (Score:2)
    
    by Osgeld ( 1900440 ) writes:
    
    I mean its mostly word salad, it mentions Claude 11 times in 2 paragraphs and whatever that sentence in the middle is ...Which I have no idea what Claude has to do with poetry, or that screwball OS that never seems to make progress (besides over a decade of development)
    - Re: (Score:2)
      
      by Rei ( 128717 ) writes:
      
      "Anthropic launched Claude 3.5 Sonnet on Thursday"
      Broken down:
      Anthropic: A company
      Claude: Anthropic's main product line
      Claude 3.5: A family of Claude products
      Claude 3.5 Sonnet: a specific version of Claude 3.5.
      Compare to:
      "Microsoft launched Windows 11 Pro for Workstations on Thursday"
      Broken down:
      Microsoft: A company
      Windows: Microsoft's main product line
      Windows 11: A family of Windows products
      Windows 11 Pro for Workstations: A specific version of Windows 11.
      I'm not sure how this is confusing? It's a standar
      - Re: (Score:2)
        
        by Osgeld ( 1900440 ) writes:
        
        hey yea Microsoft, there's a company people have actually heard of
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        Anthropic has a market cap about 2/3rds that of GM.
        Just because you've never heard of them doesn't mean that others haven't. They're a big name in the AI space.
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        (And yes, I know EV would be more relevant here, but I just figured more people would have heard of market cap...)
  - Re: (Score:2)
    
    by Moridineas ( 213502 ) writes:
    
    AI and specifically LLMs. The topic which makes otherwise smart people say dumb things in an effort to "own" anyone who thinks there is any possible productive use of LLM technology.
I've been enjoying Claude Sonnet (Score:2)

by shmert ( 258705 ) writes:

My prompt:
> write a web app to analyze a JSON REST response containing a mix of arrays, objects. The purpose of the tool is to provide insight into the size of the various fields being returns, and the count of repeating strings, and the size before/after compression.
>
> Initially you are presented with a textarea called "JSON". Paste some JSON in here, and click the "Analyze" button. This validates that the JSON is valid. It then analyzes the JSON payload by iterating recursively over all arrays an
- Re: (Score:2)
  
  by Moridineas ( 213502 ) writes:
  
  That's exactly the kind of task I've been using ChatGPT for. I've also seen productive usage from writing a function prototype, describing what the function should do, "include error handling of ___ variety", what return values should be, parameter constraints to check for, and then telling it to implement the function.
  We're not quite there yet, but I think it's highly like that some future programming languages will basically just be plain english. It's almost as if the LLM model is a compiler. It actually
- Re: (Score:2)
  
  by MobyDisk ( 75490 ) writes:
  
  I second Shmert and Mordineas view that this is a great use of AIs. It may not replace a full-stack software engineer, but it has made simple tools that were "I could do than in a day, I just don't have time" into 15-minute tasks. This is such a productivity boost. Before too long non-programmers will be doing this too.
  P.S. GPT-3.5 wrote a tool to extract all my Slashdot posts into HTML files. I've wanted to do that for over 10 years. Next I'll ask it to put them all in a SQL database for me and full-t
Targeting your own benchmarks (Score:2)

by Tony Isaac ( 1301187 ) writes:

There don't seem to be many unbiased benchmarks out there, so each AI vendor makes their own. Of course, they're going to do better at the tasks their own benchmark considers important, because that's where they put the effort.
- Re: (Score:2)
  
  by seebs ( 15766 ) writes:
  
  this doesn't sound right, there's a bunch of benchmarks that i see in lots of announcements/model cards/whatever. some of the oldest ones aren't in use anymore because everyone aces them and they're boring, but a lot of these are sort of standard things that are the same benchmarks openai and meta used in their recent announcements?
Even GPT-4 outperforms GPT-4o (Score:2)

by MobyDisk ( 75490 ) writes:

I've switched back to GPT-4 from GPT-4o. GPT-4o is 5x faster than GPT-4 because it is just 5x more verbose. I wonder if OpenAI will go from being the "best" LLM to the "fastest, cheapest" LLM. This will give competitors (and hopefully open models) a chance to catch-up.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Anthropic Launches Claude 3.5 Sonnet, Says New Model Outperforms GPT-4 Omni (anthropic.com) 34

Anthropic Launches Claude 3.5 Sonnet, Says New Model Outperforms GPT-4 Omni More Login

Anthropic Launches Claude 3.5 Sonnet, Says New Model Outperforms GPT-4 Omni

Let's have "AI" write the summaries (Score:2)

Re: (Score:2)

Re: (Score:2)

"outperforms" lolz (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Artifacts (Score:2)

Artifact editing=acknowledgement of no context (Score:2)

Re: (Score:2)

Re: (Score:2)

Failed. (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

What? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I've been enjoying Claude Sonnet (Score:2)

Re: (Score:2)

Re: (Score:2)

Targeting your own benchmarks (Score:2)

Re: (Score:2)

Even GPT-4 outperforms GPT-4o (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot