Forgot your password?
typodupeerror

Comment YIKES! API Price (Score 1) 22

Just saw the reported API pricing for those who are allowed access: $25/$125 per 1M tokens. To put that into perspective Opus 4.6 is $5/$25 per 1M tokens. Even Opus 4 was "only" $15/$75 per 1M. No way this one is coming to any plans. It will be enterprise only when they do open it up more.

Still cheaper than GPT Pro though ($30/$180)

Comment Re:I use gemini (Score 1) 67

You can't code rules into models themselves. Best you can do is try to train the behavior you want but that's never going to be 100% reliable. You can do it by watching the logits from the inference engine an try to redirect the model back on track or force a hard stop. Some are doing this today. The problem is that next word low probabilities are not always the source of this problem. You also run into high probability wrong results, so it's a bit more complicated. The other issue is not all of the APIs expose logprobs, or don't by default (openAI lets you turn them on). So if you don't own the inference engine and your LLM provider doesn't support it, it's not even possible to do it yourself.

And it actually is very much in their best interest. Hallucinations are a huge issue and kill many enterprise projects in the planning or demo stage. Solving it, even if that means returning "I don't know" or a signal in the response would drive more business for them, not less.

Comment Re:Local LMs worth it? (Score 1) 44

That Mac Studio with a 2tb ssd is $7900 not $10K. The old 512GB was a little over $10K but they dropped that option. As for price as far as I know it hasn't gone up. The new M5 Max 128 didn't get a price increase over the M4 Max (with the same size SSD configured) so hopefully the next studio will follow the same pattern.

But yea if you want to run large models for a reasonable price it's the only game in town right now.

Comment Re: I already cancelled my subscription (Score 1) 44

If you want to use it async that's fine as long as async means 10's of minutes or more between turns for development work. And it's not just tokens per second. Prefill is compute bound and is going to be very slow even compared to a low end GPU. Larger contexts are going to pressure the KV cache reads which will also impact tps, and coding generally uses lots of context each turn. It all adds up.

Comment Re:AI is not there yet (Score 2) 50

Funny enough, this is one place where the fix is easy but probably not cheap. They just need to build guardrails that automatically checks any case law references against something like LexisNexis and feeds back to the AI if it makes something up. Case law is extremely well documented and fairly structured in how it's indexed. You wouldn't even need to use AI for the lookup, a competent traditional search algo would work. Of course that's going to be expensive since it will require access to the case law data electronically and SOMEONE is going to make bank on that. It's not something you will get with a $20 ChatGPT subscription.

Comment Re:How does using parts of the parameters work? (Score 4, Informative) 3

No. The entire model is loaded into memory but the feed-forward layers are split into subnetworks. A router picks a few of the best "experts" on a per-token basis to activate and those are the 3.8B that are activated. It's a way to increase inference performance so you get the speed of a dense 3.8B model with the output quality close to (but not equal to) a 26B dense model.

Comment Re:Really? (Score 2) 46

I'm not. Video generation is expensive, even by LLM standards. They needed a huge buy-in from industry to make it workable and all they got was the Disney deal. I suspect that the price they would need to charge to make it profitable is just too much to make it attractive to industry customers. Add in the issues with it in general and with getting it to work with long-form content and it's probably a non-starter right now.

Comment Britannica is an AI company itself now (Score 4, Interesting) 26

They are pushing back against competition from Open AI and others, but not for the reason many think:

While it still offers an online edition of its encyclopedia, as well as the Merriam-Webster dictionary, Britannica’s biggest business today is selling online education software to schools and libraries, the software it hopes to supercharge with AI. ...

Britannica’s CEO Jorge Cauz also told the Times about the company’s Britannica AI chatbot, which allows users to ask questions about its vast database of encyclopedic knowledge that it collected over two centuries from vetted academics and editors. The company similarly offers chatbot software for customer service use cases.

Britannica told the Times it is expecting revenue to double from two years ago, to $100 million.

https://gizmodo.com/encycloped...

They are pinning their future on providing AI products trained on their encyclopedias and research notes, putting them in somewhat direct competition with the other AI companies.

Comment Re: yes obviously (Score 1) 112

You random AC didnâ(TM)t say shit. Or you playimg game trying to bump your other AC post. Anyway. $2,700 is the api cost for a full max plan. Let that sink in. Even with profit that is a massive gap. So no. Not $500. Or $1000. Closer to $1700-$2000. You are also grossly underestimating how devs use these.

now read my sig

Comment Re:um (Score 1) 112

Problem is that $200 Claude Max, if you calculate what it would cost to do the same thing via the PAYGO API, the difference is staggering. Like over 10x difference. It would cost you thousands a month without the plan. Anthropic HAS to be losing money on those. Even users who are not maxing their usage out each month, unless they are using only 15-20%, are going to be using more than what their subscription costs Anthropic to provide. Eventually they will need to either find a way to drastically lower their inference costs or jack the cost of the plans up.

Slashdot Top Deals

"If value corrupts then absolute value corrupts absolutely."

Working...