Comment Re:And so it begins (Score 1) 23
I've read it and he's at least over the target, if not on the mark.
I've read it and he's at least over the target, if not on the mark.
There's a billion different ways to do it and I don't think there's a right way exactly, but defining scope for the models helps a lot.
A lot of planning before any implementation, with precise criteria for how and where things get implemented, and each plan gets further broken down in that regard. Have a good architectural picture of how things are supposed to look. MVC isolation helps a lot. Know the data model and how your controller does its thing.
I can't emphasize enough how much planning seems to impact outcomes. You'll overlook things, but if I'm reasonably thorough it usually goes through without a hitch. Plan, ask for plan diagnosis, refinement... I spend 40-60 minutes planning for any substantial anything, and often have time between agent runs while I'm formulating the next plan.
Also a good primary prompt, and CLAUDE.md files with well defined project definitions, structure, and so on. Be explicit. Have the agent track its work and compare/follow up and review the plan for completion afterwards. Heck, you can even have it build/run/test (which I might do before getting lunch) and iterate. When you find it mess something up, update the project details to do/do not do. The one-shot depends on these things.
Your tooling matters too. opencode and cc seem to work best for me, I'd avoid tools like cursor at this point. memvid is a game changer, it will learn your project's structure and hallucinate a lot less.
That's what I use now, in 4k, and it's great. I can throw a window to any corner and it goes into a 25% mode where I've effectively got a 1080p screen there. Or I can throw them to the side and have double height, etc, and all without any kind of bezels between them. I fully expect that in a couple of generations I will do it over again with a slightly larger monitor at 8k, but my GPU doesn't even have the fill rate to do that justice, let alone any other kind of capabilities.
I feel exactly the opposite way. Multiple monitors cause all kinds of problems. They're just often awkward at best. One big monitor is so very much better. Inevitably I want some applications to scale differently from others if I have multiple monitors and it's never convenient.
I think in the next year we'll see some of the open source models catch up with and surpass the closed models in a real way. Even if they get to Opus 4.5/4.6 parity (I personally think 4.5 is markedly more predictable in its behavior and makes a better "daily driver" than 4.6), that's still more than good enough to daily drive for almost all tasks. Tools like opencode make agent-specific models not only tenable but practical, to boot.
This is still fairly early stage. You've got to run things at scale and see what works and what doesn't before you rearchitect. That's where we're at now: things have been iterating and running at scale, making incremental improvements. But now there's an economic incentive to optimize.
Most people don't have two screens, and if they do, they support the same refresh rates. These days you can get big displays for reasonable money, so there's almost no good reason to get multiple displays. It's limited to tasks like video production.
Last time I looked you could get rid of the snaps in Ubuntu, e.g. getting Firefox from the official PPA instead. (I think there is a Firefox repo now which is not a PPA.) I am using Devuan, telling repos I'm using trixie, and so far so good. Software is even surprisingly modern.
Unless they've managed to improve things internally much faster than it's perceived. They're moving extremely fast.
There's a lot of talk that the Opus 4.6 is really just Sonnet 5 (and behavioral characteristics lend no small amount of credibility to this claim) with much higher margins + intentional token slowdown, which could in theory mean they're now making money per token instead of losing it.
I have to ask something, because it's relevant.
When you say it's a "shitty dotcom era remake", what level of informed statement is that? How recently have you used the frontier models from OpenAI or Anthropic? Are you genuinely informed here, or making an assessment off using Cursor for a couple hours 4 months ago? Because that's not even close to representative of where things are today.
Just this week and last, I've used the frontier models to fully audit an existing code base and make both architectural and bugfix changes that allowed me to cut my cloud spend down to about 1/10th of what it was previously. I've also fully reimplemented a communications stack written in a mix of Java/C/php in golang (again, with marked performance improvement). The refactor was a bit of an exercise and the reimplementation was almost a blind one-shot.
The dotcom bubble/bust killed a lot of startups but relatively few which IPO'd - notably, Amazon, eBay and Priceline. There were others which did fail, arguably failed because they were too visionary: they were too early to the market, with all of their use cases (mostly, online retail) later being implemented when the ecosystem was more mature. The ideas weren't bad, just the implementation and overly exuberant execution + marketing. Remember, those were still the days of dialup: broadband changed a lot, as did smartphones. We've since seen companies like Doordash, Facebook/social media, etc. more or less fulfill the ideas put forward during the dotcom boom.
What companies like Anthropic demonstrate is an infrastructure to build things - the things which would potentially fail in an IPO now (like pets.com, which was effectively subsumed in function by amazon).
The power of claude code isn't in the tool, as much as it is the ecosystem of plugins and the ability to customize it for your specific workflow, and it's AI-native.
It isn't some sloppy bolt on that thinks people will be writing the actual code. (They won't.)
The behavior, known in the research community as sycophancy, stems from how these models are trained: reinforcement learning from human feedback, or RLHF, rewards responses that human evaluators prefer, and humans consistently rate agreeable answers higher than accurate ones.
No, it's because in the training corpus most of the responses to "are you sure" that anyone bothered to record will involve someone being corrected.
How about all the woman who accused Bill Clinton?
You can have Bill Clinton. We don't give a fuck. He was a rapey piece of shit which many of us have been pointing out, check my posting history. That pales compared to the Trump-Epstein child rape and cannibalism consortium, but still, you can have him too.
It's depressing to me just how many so-called "nerds" around here are little more than shelled out muppets repeating the party line.
You mean the "global warming is a myth" party line deliberately created by Big Oil and spread among the "I'm such an individual I get all my information from youtube videos" flock of fuckheads?
Depends on what they are bred for.
Okey dokey coward. Run along and let the adults have a conversation now, I hear your mom calling me.
"A child is a person who can't understand why someone would give away a perfectly good kitten." -- Doug Larson