Forgot your password?
typodupeerror

Comment Re:Apply Betteridge's Law (Score 1) 49

So, no, this cluster of patches doesn't tell us anything in particular beyond what we already knew: That emergency patches are relatively common.

Considering that Microsoft has been promising this exact same type of improvement since the release of XP Service Pack 3, the words spoken now are worthless platitudes provided to ensure the smoothness of the theft of your money. There is zero reality behind any of their promises.

I'm just talking about statistical patterns. I know little about Microsoft patches. I abandoned Windows in 2001, right around the time XP was released, and have never looked back.

Comment Re:25,000 lines of code (Score 1) 76

The LLM and the compiler and the formatter will get the low-level details right.

Maybe in about 90% if you are lucky. That still leaves about 10% error rate which is way too much.

Not remotely similar to my experience. Granted I'm writing Rust, and the Rust compiler is *really* picky, so by the time the agent gets something that compiles it's a lot closer to correct than in other languages. Particularly if you know how to use the type system to enforce correctness.

Your job is to make sure the structure is correct and maintainable, and that the test suites cover all the bases,

Depends on the definition of "bases". Passing test suite does not show your program correct. And if your test suite is also AI generated then you are again at the problem whether the tests themselves are correct.

Yes, you have to know how to write tests. A few decades of experience helps a lot. I find I actually spend a lot more time focused on the details of APIs and data structures than the details of tests, though. Getting APIs or data structures wrong will cost you down the road.

Also, I suppose it helps a bit that my work is in cryptography (protocols, not algorithms). The great thing about crypto code is that if you get a single bit wrong, it doesn't work at all. If you screw up the business logic just a little bit, you get completely wrong answers. The terrible thing is that if you get a single bit wrong, it doesn't work at all and gives you no clue where your problem might be.

Of course that's just functional correctness. With cryptography, the really hard part is making sure that the implementation is actually secure. The AI can't help much with that. That requires lots of knowledge and lots of experience.

and then to scan the code for anomalies that make your antennas twitch,

Vibe error detection goes nicely with vibe programming. That being said, experienced programmers have a talent to detect errors. But detecting some errors here and there is far from full code review. Well, you can ask LLM to do it as well and many proposals it provides are good. Greg Kroah-Hartman estimates about 2/3 are good and the rest is marginally somewhat usable.

Deep experience is absolutely required. My antennas are quite good after 40 years.

then dig into those and start asking questions -- not of product managers and developers, usually, but of the LLM!

Nothing goes as nicely as discussing with LLM. The longer you are at it the more askew it goes.

You really have to know what questions to ask, and what answers not to accept. It also helps to know what kinds of errors the LLM makes. It never outright lies, but it will guess rather than look, so you have to know when and how to push it, and how to manage its context window. When stuff starts falling out of the context window the machine starts guessing, approximating, justifying. Sometimes this means you need to make it spawn a bunch of focused subagents each responsible for a small piece of the problem. There are a lot of techniques to learn to maximize the benefit and minimize the errors.

My point is that 25k LOC a month (god forbid a week) is a lot. It may look working on the outside but it is likely full of hopefully only small errors. Especially when you decide that you do not need to human-review all the LLM generated code. But if you consider e.g. lines of an XML file defining your UI (which you have drawn in some GUI designer) to be valid LOC then yeah. 25k is not a big deal. Not all LOCs are equal.

Yeah, I am definitely not doing UI work.

Comment Re:25,000 lines of code (Score 1) 76

its during those sprints when I'm pumping out thousands of lines per day that I write the code that turns out to be the highest quality, requiring the fewest number of bugfixes later

yeah, all of us write (or copy/paste) great boilerplate code. that's not really something to be proud of.

we all make mistakes when writing business functions which are never 25k LOC in a week.

Speak for yourself. I wrote Android's Keymaster implementation in less than a month, and it was about that size, and then re-wrote most of it in a week when it turned out I'd made some core assumptions that Qualcomm couldn't match in their implementation. It was relatively bug-free for a decade -- even when a third-party security research lab spent a month scrutinizing it. They found a handful of things, but nothing serious. I was amazed, especially since I'd seen the reports they turned in on some other code.

That's just one example. In my nearly 40-year career I've had a half dozen crazy-productive weeks like that, and usually when working on particularly-complex bits. If you haven't had that experience, that's unfortunate. It's not something I could do frequently (or would want to), but it's a glorious feeling when you're that deep in the zone.

Comment Re: 25,000 lines of code (Score 2) 76

You assume that a standards document exists and is also sufficiently specific for all scenarios. Other than some very fundamental IETF stuff have I seen a standards document that pretty much covers the scope specifically. Even more severely, "specifications" for an internal project have been so traditionally bad, a whole methodology cropped up basically saying that getting specifications that specifically correct is a waste of time because during the coding it will turn out to not be workable.

Yes, it can write hundreds of tests, but if the same mediocre engine that can't code it right is also generating tests, the tests will be mediocre. Leading to bizarre things like a test case to make sure '1234' comes back as 'abcd' and the function just always returns the fixed string 'abcd' and passes the test because it decided to make a test and pass it instead of trying to implement the logic. I have seen people almost superstitiously add to a prompt "and test everything to make sure it's correct" and declare "that'll fix the problems". The superstitious prompting is a big problem in my mind, that people think they add a magic phrase and suddenly the LLM won't make the mistakes LLMs tend to make. I have seen people take an LLM at their word when the LLM "promises" to not make a specific mistake, and then confounded the first time they hit the LLM making the mistake anyway. "It specifically said it wouldn't do that!", it doesn't understand promises, the thing just will generate the 'consistent' followup to a demand for a promise which is text indicating making the promise.

Take the experiment where they took Opus 4.6 and made it produce a C compiler. To do so, the guy at Anthropic said point blank he had to invest a great deal of effort in a test harness, that the process needed an already working gcc to use as a reference on top of that, and specified the end game as a bootable, compiled kernel. Even then he had to intervene to fix it and it couldn't do the whole thing and when people reviewed the published result, it failed to compile other valid code and managed to compile things that shouldn't have been compilable. This is Anthropic with their best model doing a silly stunt to create a knock off of an existing open source project with full access to said project and source code and *still* it being a lot of human work for mediocre output.

Yes, it has utility, but there's a lot of people overestimating capabilities and underestimating risks and it's hard for the non-technical decision makers to tell the difference until much further down the line. Mileage varies greatly depending on the nature of the task at hand as to whether LLM is barely useful at all or it can credibly almost generate the whole thing.

Comment Re:25,000 lines of code (Score 1, Interesting) 76

It might take one person one year to write 25k lines.

A year? I've regularly written that much in a month, and sometimes in a week. And, counter-intuitively, its during those sprints when I'm pumping out thousands of lines per day that I write the code that turns out to be the highest quality, requiring the fewest number of bugfixes later. I think it's because that very high productivity level can only happen when you're really in the zone, with the whole system held in your head. And when you have that full context, you make fewer mistakes, because mistakes mostly derive from not understanding the other pieces your code is interacting with.

Of course, that kind of focus is exhausting, and you can't do it long term.

How does a person get their head around that in 15 hours?

By focusing on the structure, not the details. The LLM and the compiler and the formatter will get the low-level details right. Your job is to make sure the structure is correct and maintainable, and that the test suites cover all the bases, and then to scan the code for anomalies that make your antennas twitch, then dig into those and start asking questions -- not of product managers and developers, usually, but of the LLM!

But, yeah, it is challenging -- and also strangely addictive. I haven't worked more than 8 hours per day for years, but I find myself working 10+ hours per day on a regular basis, and then pulling out the laptop in bed at 11 PM to check on the last thing I told the AI to do, mostly because it's exhilarating to be able to get so much done, at such high quality, so quickly.

Comment Re:Not unique to AI (Score 3, Insightful) 76

The problem is volume.

Just like AI slop content isn't generally that much worse than human slop that flooded the services, at *least* the human slop required more effort to generate than it takes a person to watch, and that balance meant the slop was obnoxious, but the amount was a bit more limited and easier to ignore.

Now the LLM enables those same people that make insufferable slop to generate orders of magnitude more slop than they could before. Complete with companies really egging them on to make as much slop as they possibly can.

LLM can be useful for generating content, but it is proportionally *way* better at generating content for content creators that don't care about their content.

Which for self-directed people is an easy-ish solution, don't let the LLM far off a leash if you use it at all. Problem is micromanaging executives that are all in and demanding to see some volume of LLM usage the way they think is correct (little prompt, large amounts of code).

Comment Re:25,000 lines of code (Score 1) 76

As far as I've seen, the AI fanatic's answer is "don't care about the code".

They ask for something and whatever they get, they get. The bugs, the glitchiness, the "not what they were expecting" are just accepted as attempts to amend purely through prompting tend to just trade one set of drawbacks for another rather than unambiguously fix stuff. Trying again is expensive and chances are not high that it'll be that much better, unless you have an incredibly specific and verifiable set of criteria that can drive automatic retry on failure. However making that harness is sometimes harder than making the code itself, and without a working reference implementation even that may be a lost cause.

I've always hated trying to salvage outsource slop, and LLM has a very similar smell with similar reactions where people resign themselves to the crappiness.

Comment Re:They probably had incompetent people anyway... (Score 1) 63

Well, in one respect it is 'very useful'. Executive direction that the legacy codebase must be 'documented' fully. Poof, it is 'documented'. Is it correct? Who knows, no one will ever read it, but it fluffs the executives "thought leadership". The compromise between 'port the code' which is a risk no one will take and 'document the code to prepare for a porting effort that will never come'.

Just be careful to keep the LLM vomit clearly distinguished from actually curated documentation, lest some naive person one day believe the documentation is actually based on anything.

So we have LLM vomit directed in ways to make the leadership feel like we are 'properly' leveraging the hype while we wait for the hype train to run out of steam.

Comment Re:you jackasses are smart enough to do self hosti (Score 1) 73

Problem being that this is requests from people trying to contribute.

Even when they avoided github, they got hit.

I wager at one point, a project that stayed strictly email based will have threads with this sort of slop in it.

Unless you make your repository and all means of contact with you invite-only, it's going to be hard to avoid.

Comment Re:Who wouldn't use this trick? (Score 2) 63

May not ever 'figure it out'.

A lot of 'leadership' saw "everyone is hiring tech" in the aftermath of the pandemic and so they did, with or without any vision.

This represents a narrative consistent with shedding those people they didn't have business value for. So they end up no more broken than they were in 2019, and it provides a narrative consistent with doing things "right".

Slashdot Top Deals

Adapt. Enjoy. Survive.

Working...