Forgot your password?
typodupeerror

Comment Everything that comes out of an AI needs checking (Score 2) 10

These systems are genuinely useful and can sometimes do very impressive things. But absolutely everything that comes out of them needs to be checked. I'm not sure how people don't get this at this point. It is also particularly a big deal for something like this being produced by a major government, since they can presumably afford access to pay for the higher quality models which have lower hallucination rates (Claude in particular is better for this.). This still shouldn't stop the humans from looking over everything, as noted by the minister in TFS, but everyone should already know this by now. How many more incidents of this sort do we need?

Comment Re:Just means none of the experts cared enough (Score 1) 93

No. But you many of these benchmarks existed even a year before the reasoning methods were introduced and are benchmarks created by people who have no connection to the AI companies. At that point, to discount this evidence, you need to claim that the AI companies all worked together in a big conspiracy with external academics and others to make benchmarks which would in the long-run show improvement in model learning across the board. Do you see why someone would see that as a conspiracy theory being insisted upon because one wants to just dismiss the evidence?

Comment Re:Just means none of the experts cared enough (Score 3, Interesting) 93

I'm not trying to convince you that "Because Tao said this, you must be wrong,"- I agree that Fields Medalists can make mistakes or miss things. Heck, I corrected Tao once on something (albeit very minor) and I'm namechecked in a paper for making an essentially trivial observation in a conversation right after Timothy Gowers asked a question. But if you disagree with people this strongly, it should as note occur to you are possibly wrong here. Are you able to spend 5 or 10 minutes considering that at least as a possibility?

Comment Re:Just means none of the experts cared enough (Score 1) 93

The benchmarks used in those metrics were all created *before* the reasoning models were introduced. It is hard to see how multiple different organizations could have carefully introduced benchmarks before the tech in question existed. You don't need to think that these models are "reasoning" in any deep sense to recognize that the models labeled as such do better than mere scaling would suggest and improve also faster when scaled then the models which don't. That means that "reasoning model" is not just marketing hype, whether or not one likes the term.

Comment Re:Just means none of the experts cared enough (Score 1) 93

They call them "reasoning models" but that is pure marketing.

The data doesn't agree with you. Looking at improvement in benchmarks, the reasoning models show faster improvement on benchmarks than non-reasoning models did. See https://epoch.ai/blog/have-ai-capabilities-accelerated for a summary. This is hard to explain as pure marketing.

Comment Re:Just means none of the experts cared enough (Score 4, Insightful) 93

It is fascinating which people get math degrees these days. Apparently logical thinking is not a requirement anymore.

"Logical thinking" is not synonymous with "agrees with gweihir" as much as you would like them to be.

Ok, let me spell it out for you: The information was out there and could be combined in a purely mechanical, no-insight-required way to provide the answer. Nobody cared enough to find it and try that. Is that clear enough or are you still bereft of understanding?

What you are spelling out and claiming here is just not true. The approach the AI took was *different* than the approach that the literature did. You don't need just my opinion on this. Terry Tao (who is a Fields Medalist) and Jared Lichtman (who is one of the best of the young new number theorists who had previously published work on this and related problems) both disagree with you https://www.scientificamerican.com/article/amateur-armed-with-chatgpt-vibe-maths-a-60-year-old-problem/. If the people who have looked at this in detail and are subject matter experts disagree with you, maybe you it should, possibly occur to you that are wrong here?

Comment Re:Ahem (Score 2) 93

We know from Terry Tao's interest in AI that previous attempts at claiming successful AI-generated novelty with Erdos problems were wrong.

Two of the previous examples turned out to be in the literature. We've had other examples where this was not the case. This example is more noteworthy not because it is the first where this is the case but because it is the first substantial one. (I think of Erdos problems as coming in three categories. First, pretty obscure ones which almost no one has heard of. Second, ones where subject matter experts have heard of them even if they aren't that famous. Third, thinks like the Erdos-Straus conjecture which are well known enough to have a Wikipedia page. What makes this one more interesting is not that it is the first Erdos problem to fall, but the first to fall that really is in the second category.

As in Life, the phrase "I can't think of a reason against, so it must be true" is not a useful adage.

In this case, this really is not what it looks like is going on here. And the fact that the system found multiple distinct proofs in different runs is pretty strong evidence. But I'm curious, if this isn't evidence enough to strongly suggest this problem is original to the AI, what sort of evidence would convince you an AI genuinely came up with a novel solution?

Comment Re:Pardon my mathematical ignornance, but (Score 4, Informative) 93

So, the problem is about primitive sets, sets where no element of the set is a multiple of another element. You do have a partially correct intuition here. The canonical example of a primitive set is the set of primes. Buy you can give other examples of primitive sets. For example, you could take the set of primes, remove 2 and 3, and then throw in 4, 6 and 9 into the set. Notice that if I compare this to the set of primes less than 10 which are just 2, 3, 5 and 7, whereas this new set has 4, 5, 6, 7, 9 and so has one additional small element. But the problem in question is one of a series of conjectures which all together say in a certain sense that primitive sets cannot end up being much denser than the set of primes.

Comment Re:Just means none of the experts cared enough (Score 4, Informative) 93

If you are a mathematician, you should be able to see the difference between "nobody cared enough" (my claim) and "no one cared" (your gross mis-statement of my claim).

Sigh. Why am I not surprised that you've responded this way. Let's be clear then: You can replace my comment I wrote earlier with the word "enough" added just at the end and everything I wrote would still be true. Your statement is just wrong, and it is wrong for exactly the reasons I outlined.

Comment Re:Just means none of the experts cared enough (Score 4, Informative) 93

Aside from them being almost certainly *not* correct here, given that there are a whole bunch of prior papers about this specific problem, you are confusing two different things. There's having solved an Erdos problem which is different than having an Erdos number. An Erdos number https://en.wikipedia.org/wiki/Erd%C5%91s_number comes from having a chain of collaborators going back to Erdos. Erdos has Erdos number 0. Anyone who wrote a paper with Erdos has Erdos number 1. If someone else then writes a paper with someone with Erdos number 1 (and that person is not Erdos and does not have Erdos number 1) then that person now has Erdos number 3, and so on. And having an Erdos number is not a big deal. I for example, have an Erdos number. Most working mathematicians in number theory and graph theory have some Erdos number, and many in other subfields do as well. Having a *low* Erdos number though is more impressive, but even then people care much more about what results one has proven (with or without Erdos) than one's Erdos number. They really are a fun social thing and nothing more.

Comment Re:Ahem (Score 4, Informative) 93

Mathematician here. This is highly unlikely to be the case. This was a moderately well known Erdos problem (not one of the famous ones but well known enough that I had seen it before this). If there were a solution on the internet we would likely have already found it, especially because there's been a concerted effort to track down what is happening with all the Erdos problems in the last few years. Moreover, even after this problem was solved, people then went and tried hard to find a copy of the solution somewhere on the internet, and have all failed. Taken together with the fact that the solution uses a novel technique which is not used in the literature on this problem (well, a novel direction to go in even as it starts with the same basic starting point) that looks highly unlikely to be anything else. Furthermore, with similar prompting, another copy of the AI was able to make multiple different valid proofs of the result as discussed by Terry Tao here https://www.erdosproblems.com/forum/thread/1196#post-5565. The chance that there were multiple missed copies of different proofs of this result is extremely small.

Comment Re:Just means none of the experts cared enough (Score 5, Informative) 93

Mathematician here, and in the same area of research (number theory). This is not a problem where no one one cared. While there are some Erdos problems in this category, this problem is one which was well known enough that I was already familiar with. This is also a problem where multiple people, including Jared Lichtman, who is an up and coming well respected young number theorist, have thought about. And if you go to the page for problem 1196 on the general Erdos Problem data base, you'll see three references all of which include references to further papers which thought about this problem. https://www.erdosproblems.com/1196.

Slashdot Top Deals

"There is such a fine line between genius and stupidity." - David St. Hubbins, "Spinal Tap"

Working...