I spent the last couple weeks on a bender working on a project with the first as a fullstop hard deadline. It was a devils brew of php, js, css, html, a sprinkle of perl legacy code, and a full rewrite of a (argh) vb.net custom client. Every day, I would open each of "the big eight" LLMS in a new window and then copy and paste the same prompt to all of them. By the end of the day, there was usually only one I was still using.
Given those constraints, what I found was I found some strong differences between the current crop of llms.
Kimi - Whew - amazing and unpretentious. I was taken aback by how much better this was than most of the rest.
Grok - (elmo aside) same as Kimi. Rarely made major mistakes and didn't double down if it found it made one. And it is Grok 3 I am using, not Grok 4. It is also quite fast. The biggest benefit is that when Grok makes changes to minor routines, it reprints the entire code base and doesn't force you to figure out which like of 10k it is talking about (like chatgpt) to get updated.
DeepSeek - seems very smilar to kimi but with a smaller context window and the code was very vanilla.
ChatGPT - often spend more time cleaning up the code it generates than it saves you. It makes subtle consistent mistakes. the context window often means ChatGPT completely ignores long sections of code you share with it. The biggest problem is that in referencing code after interaction, it makes vague references to where updated code should go.
Gemini - moderately ok, but like Chatgpt makes lots of mistakes (and not as nice when you call it out). However, I use it for small routines as it is the fastest of the bunch.
Claude - solid code but often judegmental. I don't subscribe ($) here but the small context window is a stopper. Lots of mistakes.
CoPilot - Not used it enough to judge.
Meta.ai (Llama) - Meh really slow and code out put always has errors.
So to me, there is little doubt why one LLM would be trying to figure out what the other one was doing for code.
The only way Google takes on Microsoft is with a unified platform that can run anywhere, on anything. Android has massive reach (Chrome books do not). Dropping a new OS on there that can run on laptops, phones, tablets, and even watches from a unified code base would be a new category defining ambient OS new paradigm. This would be the first "post cloud" OS to come out.
If I were Google, my next project would be making it run on anything Windows 11 or Apple can run on and offer an installer to install onto those machines straight away for free.
about time Google - about time.
Search string: "chatgpt.com/?q=%s&hints=search&model=gpt-4o" (where model is your preferred model)
2: Coding. All-of-the-things: C+/VB/VS, shell scripts, Perl, WordPress plugins, PHP, JS, HTML, and even slumming with CSS. For short stuff - it rarely needs "correcting", but some times it does. It eliminates all the big issues when laying out a task. It's guidance turns a day job into an hour job.
Where it shines for me is in languages that I use very rarely but a project requires it. (I loath js with the fire of a thousand suns - ChatGPT makes it tolerable)
3: Ideation. It's so good at this. It will think of stuff you miss. It has been way-way over baked, but MS did have it right when they said ai was a Co-pilot. I can't count the times I've had a phone meeting in 5 mins and ask ChatGPT to give me things to talk about and ask.
4: Content. Yes, we generate all sorts of content with it from blog posts to schedules, to project overviews.
Obviously, everything needs human oversight, but sure, I've asked it to write stuff, and dropped it in for a test without looking at line-for-line.
Last week I asked it to lay out a modest wordpress plugin, and without prompting specifically for code, it generated it and it ran, and worked, the first go around.
Oh, interesting and very legitimate way to look at this - not thought of it that way.
Real Users never know what they want, but they always know when your program doesn't deliver it.