Anthropic Looks To Fund a New, More Comprehensive Generation of AI Benchmarks 8

Posted by msmash on Monday July 01, 2024 @10:02PM from the moving-forward dept.

AI firm Anthropic launched a funding program Monday to develop new benchmarks for evaluating AI models, including its chatbot Claude. The initiative will pay third-party organizations to create metrics for assessing advanced AI capabilities. Anthropic aims to "elevate the entire field of AI safety" with this investment, according to its blog. TechCrunch adds: As we've highlighted before, AI has a benchmarking problem. The most commonly cited benchmarks for AI today do a poor job of capturing how the average person actually uses the systems being tested. There are also questions as to whether some benchmarks, particularly those released before the dawn of modern generative AI, even measure what they purport to measure, given their age.

The very-high-level, harder-than-it-sounds solution Anthropic is proposing is creating challenging benchmarks with a focus on AI security and societal implications via new tools, infrastructure and methods.

Anthropic Looks To Fund a New, More Comprehensive Generation of AI Benchmarks

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 8 Comments Log In/Create an Account

Comments Filter:

Benchmarks (Score:2)

by phantomfive ( 622387 ) writes:

Improved benchmarks are always a good thing, but the timing makes it look like they're mad they lost the leaderboard to China [slashdot.org].
- Re: (Score:2)
  
  by Rei ( 128717 ) writes:
  
  The HuggingFace leaderboard doesn't include closed models, such as GPT-4, Claude, Gemini, etc.
- Re: (Score:3)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  Benchmarks are there for one reason only - to promote the sales of their snake oil.
- Re: (Score:2)
  
  by martin-boundary ( 547041 ) writes:
  
  Industry benchmarks are never a good thing unles they have been developed by competent organisations funded by taxpayer money. Like NIST.
  If Anthropic wants to fund proper benchmarks, they should pay more taxes and lobby the government to get NIST involved.Then let those guys do the Real Science.
  Private money to develop new private benchmarks is a great way to give money to your mates who are itching to do a startup with guaranteed funding.
Hopefully they do like the SEAL Leaderboard... (Score:3)

by Rei ( 128717 ) writes: on Monday July 01, 2024 @10:54PM (#64593713) Homepage

... and have the 3rd party org keep the benchmark questions private, so that nobody can be accused of training to the test (IMHO more of a problem with third parties doing finetunes than the sort of companies making foundational models, as it's such an easy hack to gain prestige)

AI all the way down (Score:2)

by sound+vision ( 884283 ) writes:

Surely these benchmarks are AI-based.
Why have a human evaluate the AI, when an AI can do it?
Besides, there's nothing ChatGPT can do better than talk about ChatGPT.
censorship benchmark (Score:1)

by elcor ( 4519045 ) writes:

"focus on AI security and societal implications"

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Anthropic Looks To Fund a New, More Comprehensive Generation of AI Benchmarks 8

Anthropic Looks To Fund a New, More Comprehensive Generation of AI Benchmarks More Login

Anthropic Looks To Fund a New, More Comprehensive Generation of AI Benchmarks

Benchmarks (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Hopefully they do like the SEAL Leaderboard... (Score:3)

AI all the way down (Score:2)

censorship benchmark (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot