Comment Re:Shifting the blame and cost (Score 1) 43
So they're asking users to pay for tokens despite a good portion of tokens being consumed for nothing because of the number of attempts it takes to generate anything usable.
If you're bad at using the tools, is that their fault?
Good prompting and good context management are non-trivial, but they are things you can learn to do.
Good prompting is really just good communication. Pretend you were telling a junior developer who is very bright and somewhat overenthusiastic what to do via email, and that you can't send them another email for several hours. If you give them incorrect instructions, they're going to produce incorrect results. If you give them vague instructions, they're going to spend a lot of time building their guess at what you want or -- often worse -- reading the whole codebase to gather the context required to figure out what you want. (Humans hate reading huge amounts of code, so a human dev probably wouldn't do that, but an LLM will).
And what you need to communicate isn't just what to do, but how. As one example, I do most of my work in statically-typed languages, primarily Rust and C++, and I find that the LLMs really all seem to primarily be Python jockeys. They can write Rust or C++ just fine, but they don't really think about how to take advantage of strong typing. If I ask them to refactor something, the first thing they want to do is to go scan the entire codebase to see what will be affected by it. In a dynamically-typed language (especially if you don't have good unit tests), this is the right thing to do. Sometimes the LLM can use grep or sed to find the relevant code efficiently. Sometimes they need to actually ingest thousands of lines of code (newly-loaded tokens!) and that gets expensive.
What an experienced human Rust/C++ programmer will do, and what an LLM can do if you tell it to, is to rely on the compiler. Think about how to structure your refactor so that all of the places that need to be updated will be broken, then let the compiler tell you where all of them are, then fix them. This is much more efficient, for humans or LLMs, but an LLM won't do that unless you specifically tell it to. A junior dev might not think to, either.
As with a human, it's usually a good idea to have a conversation about the task before telling them to start the task, to make sure you and they both understand well what is to be done. But this leads into another important cost-management issue: Context management. If you're going to have an extended back-and forth with an LLM, make sure that it doesn't have a lot of extraneous data in its context window.
Context management is crucial to keeping costs down. Every time you submit a prompt, the model has to load the entire contents of its context window. "Reloaded" tokens are a lot cheaper than "newly-loaded" tokens, but when the context window is 1M tokens, the costs can add up fast. One solution is to use a model with a small context window. That works, but then you have a junior developer who doesn't understand much and constantly forgets what he does understand. For some tasks, especially very mechanical tasks, that works fine (in fact, for some tasks it's actually better). But if you're doing something that requires understanding a large codebase or lots of other context, such as large requirements documents or something, you're going to get stupid results from a model that doesn't have enough context. On the other hand, clearing the context too often means having to reload it more, and newly-loaded tokens cost more than reloaded tokens. (There are also output tokens to consider, but I find those aren't usually relevant to cost). So, knowing when to use a larger or smaller window and when to clear the context window are essential skills for keeping the costs down.
A related choice is which model to use, and this interacts strongly with context window size/content. I primarily use Claude Code, and most of the time I keep it set on the default Sonnet model with a 200k context window. When doing something larger, I bump that to 1M tokens. When I need help thinking through a complex design question, I switch to Opus, but usually with a fresh context window. I have some good project summary documents (a few thousand tokens) that provide high-level context for cheap, so I clear the context window, tell it to read the project docs (I have a skill for that, with a short name), and then start working through the issue.
There's a lot more I could add, but this is long enough. The TL;DR is that using LLMs effectively is a skill -- a rapidly evolving one. Perhaps in the near future the LLMs themselves will get better at context management, model selection and knowing when to ask cheap followup questions rather than do a lot of expensive research. But right now, they don't.