

Microsoft Researchers Develop Hyper-Efficient AI Model That Can Run On CPUs 58
Microsoft has introduced BitNet b1.58 2B4T, the largest-scale 1-bit AI model to date with 2 billion parameters and the ability to run efficiently on CPUs. It's openly available under an MIT license. TechCrunch reports: The Microsoft researchers say that BitNet b1.58 2B4T is the first bitnet with 2 billion parameters, "parameters" being largely synonymous with "weights." Trained on a dataset of 4 trillion tokens -- equivalent to about 33 million books, by one estimate -- BitNet b1.58 2B4T outperforms traditional models of similar sizes, the researchers claim.
BitNet b1.58 2B4T doesn't sweep the floor with rival 2 billion-parameter models, to be clear, but it seemingly holds its own. According to the researchers' testing, the model surpasses Meta's Llama 3.2 1B, Google's Gemma 3 1B, and Alibaba's Qwen 2.5 1.5B on benchmarks including GSM8K (a collection of grade-school-level math problems) and PIQA (which tests physical commonsense reasoning skills). Perhaps more impressively, BitNet b1.58 2B4T is speedier than other models of its size -- in some cases, twice the speed -- while using a fraction of the memory.
There is a catch, however. Achieving that performance requires using Microsoft's custom framework, bitnet.cpp, which only works with certain hardware at the moment. Absent from the list of supported chips are GPUs, which dominate the AI infrastructure landscape.
BitNet b1.58 2B4T doesn't sweep the floor with rival 2 billion-parameter models, to be clear, but it seemingly holds its own. According to the researchers' testing, the model surpasses Meta's Llama 3.2 1B, Google's Gemma 3 1B, and Alibaba's Qwen 2.5 1.5B on benchmarks including GSM8K (a collection of grade-school-level math problems) and PIQA (which tests physical commonsense reasoning skills). Perhaps more impressively, BitNet b1.58 2B4T is speedier than other models of its size -- in some cases, twice the speed -- while using a fraction of the memory.
There is a catch, however. Achieving that performance requires using Microsoft's custom framework, bitnet.cpp, which only works with certain hardware at the moment. Absent from the list of supported chips are GPUs, which dominate the AI infrastructure landscape.
Trained on about 33 million books .. (Score:5, Insightful)
Have the human writers of the books been compensated?
Re: (Score:1)
Bitnet is Skynet's little bot sister.
Re: (Score:3, Funny)
Re: (Score:2)
And reddit.
Re: (Score:2)
--
BitNet b1.58 2B4T threads together an unstable truce between minimal representation and surprising performance. It doesn’t dominate its size class, but it behaves differently—speedier, leaner, and oddly competent, like a scrappy underdog trained for fast reaction, not memory. You don’t get
Re: (Score:2)
Did you compensate me for reading this text I just wrote? Well actually... I guess it really depends on if you get the point of what I just asked. If you get the point and learned something from it I suppose you now owe me compensation? If on the other hand you don't understand the difference between copying something and inferring something from content then I guess you don't owe me anything either.
Re: Trained on about 33 million books .. (Score:2)
Re: Trained on about 33 million books .. (Score:2)
Another major difference is that the AI model is used for business and profit.
Re: Trained on about 33 million books .. (Score:2)
So am I. I wish it wasnâ(TM)t so, but here I am using my brain for money answering questions people ask.
Re: (Score:2)
At no point does copyright law consider speed.
Re: Trained on about 33 million books .. (Score:2)
You can't read 33 million books.
Not against the law? Is a different issue. But now you know the difference.
Re: (Score:2)
Not against the law? Is a different issue.
This entire thread is about compensation which is entirely based on copyright law. Law is THE DEFINING ISSUE.
Re: (Score:2)
You agreed to Slashdot's TOS which gives them a right to publish the words you sent them. Without that it would be illegal for them to do so as they're using your copyrighted content (assuming you posted something meaningful) for commercial purposes. The fact that you didn't negotiate with Slashdot to get paid for your words is a failing on your part and has nothing to do with any readers. The TOS lets readers download and read the content on the site without paying.
For data in a computer system, the act
Re: (Score:2)
You can't block seeing something and learning from it behind a ToS. Copyright does not consider what people learn from what they see, only what work they produce afterwards. Again you missed my point. Simply reading anything you see anywhere on the internet can NEVER be a a copyright violation. Copyright concerns itself on your output, not your input.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Isn't it that the LLMs aren't just reading content, they're transforming the content and storing it into their LLM model?
And transformation is allowed under copyright law.
Then reproducing that content to paying customers.
And that's where this all falls down. Copyright concerns itself with reproduced works not on the training. Yet everyone here seems to be so focused on the training aspect. You want to sue an LLM, you have a the best chance of doing so based on its output, not complaining about its input.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
oh no (Score:1)
Re: (Score:2)
Re: (Score:2)
ironic
Re: (Score:2)
Have the authors of the textbooks you read been compensated by the knowledge you applied later in life?
If you learned well, usually yes. You helped the economy grow, and that would have made their lives better as well.
Re: (Score:2)
Re: (Score:2)
It says the equivalent. They could have written equivalent of n times the library of congress, only that most non european people don't have an idea how much that is. Probably most american either, but they at least think it sounds impressive.
Not powers of 2? (Score:1)
Why 3? If the resolution is whatever one chooses (at the expense of more nodes), then wouldn't powers of two be more efficient on a regular CPU? "3" wastes numeric real-estate it seems.
Does it need an odd number in order to have a zero, a "central" value? Even then, it seems higher resolution nodes would have to sacrifice less to get achieve that.
Re: (Score:1)
Okay, that makes sense. Thanks! Seems kind of like how RISC chips result in bigger EXE binaries, but as a trade for speed and/or power efficiency.
But do (traditional) CPU's also struggle with slightly more resolution per node?
I wonder if a new kind of CPU can be designed around smallish resolution AI nodes yet still do its regular CPU work relatively well. Thus, juice up its AI abilities with little or no penalty for ordinary calculations.
Re: (Score:2)
Re: (Score:2)
I want to know how a single bit, which, by design, can ***ONLY*** be 0 or 1, manages to add a sign bit without actually adding a sign bit. Did bits grow a 3rd state when I wasn't paying attention?
Re: (Score:1)
They are AI generated bits, they render with an extra digit.
Re: (Score:2)
could be a sparse database.
it only needs to store -1 or +1, if a weight is '0' no value needs to be stored.
Re: (Score:2)
They are two bits with 3 states used.
Re: Not powers of 2? (Score:3, Informative)
It works out to about 1.58 bits of information per value, hence the "1.58b" in the name.
It's a little too complicated to type up here, but a web search for "1.58 bit LLM" will turn up some interesting reading.
Re: Not powers of 2? (Score:1)
Whoops, I mean the "b1.58" in the name.
Re: (Score:2)
That makes a whole lot more sense, so it's not actually 1-bit model after all.
Prior use (Score:2)
Uh, back in the 80s, BITnet ran on IBM mainframes and VAXen.
Alpha software (Score:2)
The paper on bitnet.cpp recommends running the model on limited cores" to see the advertised efficiency improvements, and failed running on the i7 despite having the same 64g ram as the tested Apple M2. However they claim to want to expand the supported hardware to Apple and Android phones.
https://arxiv.org/abs/2504.122... [arxiv.org]
Running on CPU is not that hard... (Score:2)
I have llama3:8b running CPU only on my laptop. Sure, it's a little slow, but very usable. Am I missing something here?
Re: (Score:2)
The only thing you're missing the the start of your own second sentence and it's relevance in a world where AI is involved in everything you do. Imagine having to say "sure it's a little slow" for everything you do with your PC. You'll quickly get frustrated. Fine in a world where you fire up an LLM once for shits and giggles, not so much fun when you use it extensively and continuously.
Re: (Score:2, Offtopic)
I have an extremely middle of the road PC, per the Steam survey, with just a little more CPU than average; It's a 5900X with a 4060 Ti 16GB. I have 64GB RAM, which is about double the average, but not too expensive for DDR4 (which most users still have) so most people could upgrade to it if they wanted. My favorite model right now is gemma3, it only runs about twice as fast on my GPU as it does on my CPU in the very respectably useful 12b variant. The 27b version (which is too big to fit on my cheapass GPU)
Re: (Score:1)
Well, if you're not using LLMs for shit and giggles, but for the other use case - to generate fakes, you don't have my sympathy at all. Please die in a fire.
Re: (Score:2)
You could possibly run a llama3:32b-bitnet at the same speed. Currently there is no completely fair comparison as usual LLM implementations are heavily optimized, but bitnet on cpu runs already faster than comparable other LLM on CPU. The best optimization would use hardware that is specialized on addition and logical operations and doesn't need to provide (fast) multiplication. So maybe the NPUs in five years run Bitnets at 10x the speed of other networks.
It can run on CPUs? Wow. What an upgrade from cats (Score:2)
I was not aware that this "AI" was previously running on rutabagas, but it is such an incredible triumph for Microsoft to be able to run software on computers, now. To be sure, the farmers will be devastated.
I know that this site is rapidly devolving into some sort of cheap Buzzfeed knock-off running on CP/M but, err... I take it that the current "editors" have no idea what a CPU is? Possibly they think it's some sort of boat, possibly running on tubes?
Re: (Score:2)
it is such an incredible triumph for Microsoft to be able to run software on computers, now.
Microsoft runs software on computers now? Who knew?
Bad hallucinations, etc. (Score:4, Interesting)
I played around with this model some (you can easily try it here [azurewebsites.net]) and it is very bad about hallucinating, and even when corrected it will acknowledge but then re-hallucinate that information again. It's almost arrogant even lol.
It is technically very interesting they can run this so fast with such a small memory footprint, but I'm not certain what the use case would be with so many inaccuracies. It's not clear if that is a training issue or a byproduct of 1 bit parameters.
It might be that from a processing / reasoning / problem solving perspective the single bit is not a detriment (IE it can still solve math problems and perform reasoning very well), but when it comes to stored knowledge and information retrieval it has affected accuracy.
I think people are missing the importance of this very-low memory footprint type of model though, as it could be used in lightweight, low-power embedded systems and the like. You know how there is the Internet of Things, with zillions of low-power wifi connected smart devices everywhere? Now imagine if they could also run AI models internally as well.
Re: Bad hallucinations, etc. (Score:2)
Re: (Score:2)
It is still a 2B model. 2B Models in 16 bit are not reliable either. The interesting point is not that it's better, but that it can be implemented without multiplications, which allows to build much more efficient hardware than graphic cards to run it. Because it only uses -1, 0, 1 you can use bitwise logical operations to obtain the same result a multiplication would, and these operations are much faster than multiplying the 4-16 bit weights that are common with current models. The CPU implementation is al
hyperefficient clock (Score:2)
I invented a hyperefficient clock....
And requires .NET? (Score:2)
Nope. Sorry. Deal breaker.
Moores Law (Score:2)