Comment Re:AI more expensive than humans? (Score 1) 13
Highly paid software developers are capable of formal methods and other advanced techniques, which would require a far far caster LLM to perform. So the LLM woukd still be more expensive.
Highly paid software developers are capable of formal methods and other advanced techniques, which would require a far far caster LLM to perform. So the LLM woukd still be more expensive.
Since R1 has good reasoning, but no real breadth, and is open source, the logical thing would be to modify R1 to pre-digest inputs and create an optimised input to 4.1. The logic there would be that people generally won't provide prompts ideally suited to how LLMs work, so LLM processing will always be worse than it could be.
R1 should, however, be ample for preprocessing inputs to make them more LLM-friendly.
The bottom line is that AI software is, at present, nowhere near where it needs to be to be useful. The approach used has serious flaws and more power won't help that.
Artificial silicone life certainly exists in America. But I don't think it's what you're referring to.
Microsoft today isn't the Microsoft of the 80s and 90s?
I think this rather indicates there may be some lies involved in that claim. This is very much the tactic of Old Microsoft.
Gemini struggles. I've repeated the runs on Gemini and can now report it says files were truncated two times in three. After reading in the files, it struggles with prompts - sometimes erroring, sometimes repeating a previous response, sometimes giving part of an answer.
When it gives some sort of answer, further probing shows it failed to consider most of the information available. In fairness, it says it is under development. Handling complex data sources clearly needs work.
But precisely because that is its answer, I must conclude it is not ready yet.
I would be absolutely happy if any of these AI companies were to use the files as a way to stress-test, but I'm sure they've better stress tests already.
Still, it would be very unfair of me to argue a need for improvement if I were to insist on not providing either evidence or a means of testing.
I would estimate 190,000 words in total, and reading the files one at a time and asking for a count of concepts, the total seems to be around 3,000.
This opinion article (Washington Post, gift link so no paywall) says there's 20-year-old Supreme Court case law that says the Government can't treat people differently under the law just because they don't like some of them.
Those cases were under Roberts, and were unanimous where it counted.
Trump is repeatedly on the record saying he uses US law to go after his enemies. A Supreme Court with any spine at all could use this to shut down the worst of his behaviors.
I'm using the web interface, using the file attachment button then selecting documents. The prompt used explicitly instructs using the largest input window available and to report any file that failed to be handled correctly. It gives the OK for most of the files, then reports several got truncated.
Yeah, if it was something simple like input exhaustion, they'd not be making the claim. That would be the first thing anyone tested. So I'm reasoning that the problem has to be deeper, that the overflow in Gemini's case is in the complexity and trying to handle the relationships.
Testing it further, after just a short list of questions, the answers cease to have anything to do with the prompt. This lends credence to the notion that it's the relating of ideas that kills Gemini, not the file size.
You're correct. It also violates WTO rules but the Republicans are actively blocking the WTO from operating.
Sovereign immunity is a dangerous game. Although the Magna Carta has long since ceased to be a factor in law, it did actually address the specific problem of sovereign immunity, proposing a special court for the sole purpose of trying those with such immunity, preventing bogus lawsuits but also providing a way to hold such people to account.
The US has attempted to use Congress for this, but we've now seen that "lawful" bribery makes it a useless mechanism for that purpose.
I'll be fair and say Gemini Advanced read more of the files in, but still choked on on file read.
So, to answer your question, none of the AIs I've tried can cope. There may be AIs I've not tried, ideas welcome, but the AIs just don't do what they claim in terms of data digesting, which leads me to conclude that the hidden overheads underpinning their methods are too large.
Clause and ChatGPT manage to read the files but the context window is too small to process them.
Grok, DeepSeek, and amazingly even Gemini choke on the files. Gemini, multi-million-token windows notwithstanding, reported that the specification files (of which I've 13, plus another 9 containing contextual information) are too big to process, even though they're only about 20k of tokens each.
I had Claude and ChatGPT work with each other on a little engineering project, each finding the limitations in the design the other hadn't spotted.
It was actually good fun. Cost me a bit to get all the technical info they both wanted, but I now have a design both insist is absolutely robust, absolutely perfect. But they also both tell me that it's too big to properly process.
Yes, AI itself designed a project the very same AI cannot actually understand.
I now have a very large file, that cost me a fair bit of money to produce, that I'm quite convinced is useless but no AI examining even part can find fault in.
Beyond the fun aspect of having AI defeat itself, the project illustrates two things:
1. If AI can't handle a toy specification, it's never going to be able to handle any complex problem. This means that the "pro" editions are not all that "pro". The processing windows are clearly too small for real problems if my little effort is too big.
2. Anything either AI got right is, by virtue of how I worked on the problem, something the other AI got wrong. Of course, AI doesn't "understand", it's only looking at word patterns, but it shows that the reasoning capacity simply isn't there, regardless of whether the knowledgebase is.
3. I've now got a quite nice benchmark for AI systems. I can ignore any AI that can't cope. If it hasn't got the capacity to handle any trivial problem, because the complexity is too high, then it won't manage any real problem better.
Microsoft announced in January that they would be building $80 billion in new datacenters for 2025. By the end of February they were already announcing that that this was no longer the case and that they would actually be pulling back from that target and were actually canceling leases.
Of all of the technology companies Microsoft is absolutely the worst at actually designing and building products. But they are very good at business.
UNIX was half a billion (500000000) seconds old on Tue Nov 5 00:53:20 1985 GMT (measuring since the time(2) epoch). -- Andy Tannenbaum