Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror

Comment Re:That's awesome and I prefer Claude (Score 1) 31

At the moment these companies are all trying to attract market share, and given that advertisements are unlikely to be popular the ones who hold out the longest (maybe because they can afford to) will be most successful.

For example, Google generates tons of cash outside of AI and can propably afford to provide free Gemini ChatBot indefinitely if they chose to, whereas for OpenAI ChatGPT is their primary source of revenue (with API services being second), so they may be somewhat compelled to add advertizing sooner rather than later.

Demis Hassabis (of Google) took a sly dig at OpenAI in a recent Davos interview, questioning why they needed to raise cash by advertizing if they really were on the cusp of AGI (which would bring untold profits). Hassabis says AGI is 5-10 years out, but OpenAI are claiming that (what they want to call "AGI") is right around the corner.

Comment Re:This is not a clash... (Score 5, Insightful) 105

> They're just using different definitions of general intelligence

True.

> Dario and Sam correctly point out that the models are already at least as intelligent as most people

But that's not true.

LLMs are kind of like idiot savants - great at some things, and piss-poor at others. Even in the things that they are great at, showing flashes of human level or expert intelligence, they are at the same time deeply flawed in ways that humans of average intelligence are not, continuing to hallucinate, and not understanding when their regurgitated human reasoning patterns actually apply or not. They give advice without thinking thought the consequences, then just say "my bad" and move on if you are knowledgeable enough to call them out.

There are also huge gaps in LLM capability that even the most stupid of humans don't have, such as on the job learning. If you show a stupid human how to do something simple, enough times, they will eventually get it, and memorize the skill (flipping burgers, or bagging groceries), but the LLM never will. It may learn "in context" one day, but the next day, and the next, will be groundhog day when you have to teach it all over again. For any reasonably skilled job that takes months or years to learn and master, this having to train it every day is a non-starter.

Altman's definition of AGI is something that can perform most "economically valuable jobs", such as his own used-car salesman job, but we are still far from that. About the only job than an LLM could do today, where it would be a viable replacement for a human, would be a call center job where it is doing something highly repetitive and non-creative, working in a narrow domain that it could have been trained to master, where it oes not need to learn anything, and where the consequences of messing up are fairly low (and chances of doing so maybe no worse than the humans it is replacing).

Comment Re:They are on crack (Score 4, Interesting) 105

I wouldn't describe them as morons.. Altman is more of a shifty used car salesman, compulsive liar, type, maybe of average intelligence. Amodei is quite smart, but the money/power lust seems to have got to him, and in the last year he has jumped the shark and will now seemingly say anything and everything he can to hype AI.

There was an interesting interview of Hassabis at Davos by
Alex Kantrowitz (quite underrated as an interviewer - asks deceptively simple questions, and gets the guests to talk), where Hassabis, obviously no fan of Altman, calls him out by noting why does ChatGPT need to add advertizing (coming soon) if OpenAI really is on the cusp of massively valuable world altering AGI!

Hassabis is the only sane voice in the entire AI field.

Comment Re:Obvious (Score 1) 61

> Maybe the assumption is that it would also work for maintenance of larger code bases...

Or maybe not even - there is no assumption, just some non-developer, or weak-developer trying to get something done, with no thought for tomorrow.

The kind of developer that doesn't care about the code they are developing, nowadays vibe coding, probably has a large overlap with the kind of developer that was previously just trying to get through their day by copying code from Stack Overflow, or randomly shuffling their code to try to make core dump bugs appear to disappear.

There is version of this argument that you also sometimes hear - that so what if the LLM generated code is bad, because soon a better LLM will come along that is able to fix it. Perhaps, but that "soon" may be a long time coming. Human level AI is probably at least 10-20 years away, and in the meantime your LLM generated slop (if you vibe coded it rather than managing it) may be sitting there in production exposing your production servers and customers data to who knows what kind of vulnerabilities.

In the meantime, in terms of what LLMs can to today, there is definitely a limit in terms of complexity and project size, unless perhaps you are just talking about regurgitating that vibe-coded BASIC interpreter.

The fundamental problem is that while the LLM's training data was full of smaller size bits of code like student assignments, Stack Overflow solutions with explanations, decently commented parts of Open Source projects, etc, that enabled it to understand coding on a small scale, what was necessarily lacking from the training data (because it just does not exist in the public domain) are many examples of large scale projects. While there are a few large open source projects to learn from, such as gcc, linux, etc, what is missing are the high level design notes / explanations that would let the LM learn WHY the project was designed that way, and without the "WHY" all it can to is copy the "WHAT" - cargo cult coding by coping code patterns without knowing why they were chosen, and whether they are appropriate for the task at hand.

Comment Re:what is a 'good idea'? (Score 2) 61

I'd kinda put this in the personal throwaway tool category - it does what he wants, and that's good enough. Would it pass code review and checks for security vulnerabilities? Does it have bugs in it? I guess none of that matters if he's the boss and finds it useful, especially if its internal use only and behind a firewall so that any hard coded passwords or who know what else isn't creating a bad security situation.

Comment Re:Vibe coding just is... (Score 1) 61

This sounds like a good process, but it's much more controlled, with yourself providing all the oversight and intelligent decision making than "vibe coding".

Here's Karpathy's "vibe coding" tweet that introduced the term.

"There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works."

Comment Re:Hobby use != Professional use (Score 1) 61

This guy apparently did it:

https://www.youtube.com/watch?...

It's going to depend on what model you use - apparently the strongest by far right now is Anthropic's Opus 4.5, and to do something like this you need to be running it via something like Claude Code where it can iterate test and fix bugs until complete.

I expect this video is real. There are other videos out there of people one shot vibe coding relatively complex (but still boilerplate) things.

Comment Hobby use != Professional use (Score 4, Insightful) 61

It's unfortunate if some people hear that Torvalds is vibe coding on his hobby projects, or that Karpathy (who coined the term) is, and their takeaway is that vibe coding at work, not just on their throwaway hobby projects.

For sure, it's amazing that we now live in a world where you can single shot an entire app "write me a BASIC interpreter", and for sure try it and have fun with it if you like. Beyond fun it has utility too for single shot throwaway apps that wouldn't have been worth the effort if you had to to it yourself (e.g give Claude a photo of your credit card bill and ask it for the category totals - it'll write a Python program to do it).

For hobby / throwaway apps you may not care what the code looks like, what the tech stack is, whether it was well designed or not - you just want something that works.

At work you obviously need to be more professional and care about the code you are developing. Using LLMs doesn't change what is expected of you, and vibe coding isn't going to cut it. A better mindset is that you are pair-programming with the LLM, aware of what it is generating, and that you are the lead developer and systems architect.

Comment Re:GW vs flops? (Score 1) 24

You seem to be completely out of touch with what people are using LLMs for.

Incidentally, if you want to use an LLM to do everyday computer stuff like logging into a system (some people use it as their system administrator) then what you need is an agentic client that actually runs on the system you want to control, such as Claude Code, Gemini CLI, etc. You seem to be blaming the stupidity of your undergraduate researchers on LLMs, when it is in fact their own stupidity and lack of having bothered to understand how these tools work.

I'm not sure what your anecdote of entrepreneural friends not having all their needs met by LLMs is meant to prove? Did they try Google search? If the information was neither in the training set of the LLM (or maybe it was, but they failed to create a prompt to elicit it), nor findable by Google, then reaching out to someone who can help seems an entirely reasonable thing to do.

I think there is an underlying problem of non-developers hearing all the success that developers are having with LLMs, and so thinking that they do can just "vibe code" some new app, and then discovering that LLMs are just a tool, not magic, and that you do indeed need to be a developer to use them for development work.

LLMs are for sure an odd and flawed technology, but if you haven't been able to find a way to use them productively, then maybe you are not trying hard enough. If you are from an academic background, then maybe Ethan Mollick's extensive writing about and experimentation with LLMs would be of interest.

https://www.oneusefulthing.org...

Comment Re:GW vs flops? (Score 1) 24

Yes, when there is an alternative it is almost certainly more efficient, and people no doubt are sometimes asking LLMs to do trivial things like math where when they could have just used a calculator instead, but surely you don't believe that the majority of paying LLM users are stupid and using it for things like this? The excitement about LLMs is because they can do things, like writing code, where there is no alternative (other than doing it by hand).

Comment Re:GW vs flops? (Score 1) 24

The relationship between FLOPS and watts is a function of GPU generation, and will certainly change (and be very disruptive - increased power density may require entire datacenter cooling to be redone), but what may change even faster is tokens per FLOP as the models get tweaked for efficiency, and this is what counts since customer pricing is in tokens. The production capacity (tokens/sec) of the "factory" is certainly far from fixed and defined by the power it is consuming.

GW really is an odd metric to fixate on - I don't understand it, other than perhaps if power is the growth constraint then maybe this becomes a useful way to think about it, but these statements must be directed at investors, so not sure what they are meant to make of it.

Comment Re:Imaginary numbers (Score 2) 24

Surely it'd be latest month, or quarter, being extrapolated, not highest.

It's a bit of a strange quibble though given that by same logic (growing revenues), they most like have 2026 revenue well in excess of $20B.

Their problem, evidentially isn't revenue, it's spending, and AFAIK they are not expected to break even until 2030 (assuming they last that long).

Slashdot Top Deals

To communicate is the beginning of understanding. -- AT&T

Working...