Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
AI Microsoft

Microsoft Researchers Develop Hyper-Efficient AI Model That Can Run On CPUs 58

Microsoft has introduced BitNet b1.58 2B4T, the largest-scale 1-bit AI model to date with 2 billion parameters and the ability to run efficiently on CPUs. It's openly available under an MIT license. TechCrunch reports: The Microsoft researchers say that BitNet b1.58 2B4T is the first bitnet with 2 billion parameters, "parameters" being largely synonymous with "weights." Trained on a dataset of 4 trillion tokens -- equivalent to about 33 million books, by one estimate -- BitNet b1.58 2B4T outperforms traditional models of similar sizes, the researchers claim.

BitNet b1.58 2B4T doesn't sweep the floor with rival 2 billion-parameter models, to be clear, but it seemingly holds its own. According to the researchers' testing, the model surpasses Meta's Llama 3.2 1B, Google's Gemma 3 1B, and Alibaba's Qwen 2.5 1.5B on benchmarks including GSM8K (a collection of grade-school-level math problems) and PIQA (which tests physical commonsense reasoning skills). Perhaps more impressively, BitNet b1.58 2B4T is speedier than other models of its size -- in some cases, twice the speed -- while using a fraction of the memory.

There is a catch, however. Achieving that performance requires using Microsoft's custom framework, bitnet.cpp, which only works with certain hardware at the moment. Absent from the list of supported chips are GPUs, which dominate the AI infrastructure landscape.

Microsoft Researchers Develop Hyper-Efficient AI Model That Can Run On CPUs

Comments Filter:
  • by Mirnotoriety ( 10462951 ) on Thursday April 17, 2025 @07:06PM (#65313691)
    > BitNet .. Trained on a dataset of 4 trillion tokens -- equivalent to about 33 million books ..

    Have the human writers of the books been compensated?
    • by Tablizer ( 95088 )

      Bitnet is Skynet's little bot sister.

    • Re: (Score:3, Funny)

      Don't worry. Most of the training data was actually just dumb web comments.
      • by yanyan ( 302849 )

        And reddit.

      • What could be the value in a bunch of "dumb web comments"? To make an experiment, I copy pasted this very page into a LLM and asked for a short article based on it. You judge if is is useful or not.
        --
        BitNet b1.58 2B4T threads together an unstable truce between minimal representation and surprising performance. It doesn’t dominate its size class, but it behaves differently—speedier, leaner, and oddly competent, like a scrappy underdog trained for fast reaction, not memory. You don’t get
    • Did you compensate me for reading this text I just wrote? Well actually... I guess it really depends on if you get the point of what I just asked. If you get the point and learned something from it I suppose you now owe me compensation? If on the other hand you don't understand the difference between copying something and inferring something from content then I guess you don't owe me anything either.

      • The difference is the speed and scale of the AI Ingesting and leveraging such a large amount of content. You can't do that. 33 million books.
      • You agreed to Slashdot's TOS which gives them a right to publish the words you sent them. Without that it would be illegal for them to do so as they're using your copyrighted content (assuming you posted something meaningful) for commercial purposes. The fact that you didn't negotiate with Slashdot to get paid for your words is a failing on your part and has nothing to do with any readers. The TOS lets readers download and read the content on the site without paying.

        For data in a computer system, the act

        • You can't block seeing something and learning from it behind a ToS. Copyright does not consider what people learn from what they see, only what work they produce afterwards. Again you missed my point. Simply reading anything you see anywhere on the internet can NEVER be a a copyright violation. Copyright concerns itself on your output, not your input.

          • Isn't it that the LLMs aren't just reading content, they're transforming the content and storing it into their LLM model? Then reproducing that content to paying customers. Suppose a human had a pile of books and got paid for answering questions. That person is asked a question and they look up what is said in the books and maybe change the wording a little but basically put up what is said in the books. Is that a copyright violation? Seems like it comes down to is the LLM "learning" as humans do, or is it
            • You focus too much on derivative generation. Yes, LLMs can do that, they can paraphrase texts, restyle them, summarize. But usually we provide 10 or even 50 different sources for the model to extract an answer. So it does something on top, it correlates information across sources. Synthesizing from 1 source could be derivative, synthesizing from 10 sources is meta-analysis, something new. Another reason we don't generate derivative content is because we actually don't need to do that, we need direct answers
            • Isn't it that the LLMs aren't just reading content, they're transforming the content and storing it into their LLM model?

              And transformation is allowed under copyright law.

              Then reproducing that content to paying customers.

              And that's where this all falls down. Copyright concerns itself with reproduced works not on the training. Yet everyone here seems to be so focused on the training aspect. You want to sue an LLM, you have a the best chance of doing so based on its output, not complaining about its input.

        • If you want to go that route, then any page you load on the internet creates many copies in all the intermediate nodes and in your computer, and that is before you even get to read the license. But assuming someone did the deed once and trained a LLM on copyrighted data, then generated 4 trillion synthetic tokens, from that moment on there is no copyright issue anymore, you can subtitute copyright protected text with synthetic replacement. This is a small 2B model, not going to store many facts, but will op
    • Well, unless they are using illegal, pirated copies I'd assume the author was compensated when the copy of the book that they used was purchased. In the same way that an author has been compensated when you borrow a book from the library to read yourself.
      • If I buy a music CD, does that give me the right to take samples from it and put it into my own music that I sell?
    • by Anonymous Coward
      Have the authors of the textbooks you read been compensated by the knowledge you applied later in life? It's fucking retarded to think that anytime someone makes a buck off someone else's work that the first guy needs to be compensated. That is not how things work at all.
      • When you grow up and leave your mom's basement, you'll realize that food and rent isn't free. Then you'll understand what an economy is. HTH.
      • Have the authors of the textbooks you read been compensated by the knowledge you applied later in life?

        If you learned well, usually yes. You helped the economy grow, and that would have made their lives better as well.

    • It's very probable the dataset was mostly synthetic text, like the Microsoft Phi models. These small models are not trained for memorizing facts. They will be used for assistance, tool calling, information extraction and classification, reasoning, basically supporting tasks for computers and smart phones. They won't write the best novels. They will work well with provided information in the context.
    • by allo ( 1728082 )

      It says the equivalent. They could have written equivalent of n times the library of congress, only that most non european people don't have an idea how much that is. Probably most american either, but they at least think it sounds impressive.

  • TFA: Bitnets quantize weights into just three values: -1, 0, and 1. In theory, that makes them far more efficient [on limited hardware]

    Why 3? If the resolution is whatever one chooses (at the expense of more nodes), then wouldn't powers of two be more efficient on a regular CPU? "3" wastes numeric real-estate it seems.

    Does it need an odd number in order to have a zero, a "central" value? Even then, it seems higher resolution nodes would have to sacrifice less to get achieve that.

    • I want to know how a single bit, which, by design, can ***ONLY*** be 0 or 1, manages to add a sign bit without actually adding a sign bit. Did bits grow a 3rd state when I wasn't paying attention?

    • It works out to about 1.58 bits of information per value, hence the "1.58b" in the name.

      It's a little too complicated to type up here, but a web search for "1.58 bit LLM" will turn up some interesting reading.

  • Uh, back in the 80s, BITnet ran on IBM mainframes and VAXen.

  • The paper on bitnet.cpp recommends running the model on limited cores" to see the advertised efficiency improvements, and failed running on the i7 despite having the same 64g ram as the tested Apple M2. However they claim to want to expand the supported hardware to Apple and Android phones.

    https://arxiv.org/abs/2504.122... [arxiv.org]

  • I have llama3:8b running CPU only on my laptop. Sure, it's a little slow, but very usable. Am I missing something here?

    • The only thing you're missing the the start of your own second sentence and it's relevance in a world where AI is involved in everything you do. Imagine having to say "sure it's a little slow" for everything you do with your PC. You'll quickly get frustrated. Fine in a world where you fire up an LLM once for shits and giggles, not so much fun when you use it extensively and continuously.

      • Re: (Score:2, Offtopic)

        by drinkypoo ( 153816 )

        I have an extremely middle of the road PC, per the Steam survey, with just a little more CPU than average; It's a 5900X with a 4060 Ti 16GB. I have 64GB RAM, which is about double the average, but not too expensive for DDR4 (which most users still have) so most people could upgrade to it if they wanted. My favorite model right now is gemma3, it only runs about twice as fast on my GPU as it does on my CPU in the very respectably useful 12b variant. The 27b version (which is too big to fit on my cheapass GPU)

      • by Anonymous Coward

        Well, if you're not using LLMs for shit and giggles, but for the other use case - to generate fakes, you don't have my sympathy at all. Please die in a fire.

    • by allo ( 1728082 )

      You could possibly run a llama3:32b-bitnet at the same speed. Currently there is no completely fair comparison as usual LLM implementations are heavily optimized, but bitnet on cpu runs already faster than comparable other LLM on CPU. The best optimization would use hardware that is specialized on addition and logical operations and doesn't need to provide (fast) multiplication. So maybe the NPUs in five years run Bitnets at 10x the speed of other networks.

  • I was not aware that this "AI" was previously running on rutabagas, but it is such an incredible triumph for Microsoft to be able to run software on computers, now. To be sure, the farmers will be devastated.

    I know that this site is rapidly devolving into some sort of cheap Buzzfeed knock-off running on CP/M but, err... I take it that the current "editors" have no idea what a CPU is? Possibly they think it's some sort of boat, possibly running on tubes?

    • it is such an incredible triumph for Microsoft to be able to run software on computers, now.

      Microsoft runs software on computers now? Who knew?

  • by Dan East ( 318230 ) on Friday April 18, 2025 @07:41AM (#65314565) Journal

    I played around with this model some (you can easily try it here [azurewebsites.net]) and it is very bad about hallucinating, and even when corrected it will acknowledge but then re-hallucinate that information again. It's almost arrogant even lol.

    It is technically very interesting they can run this so fast with such a small memory footprint, but I'm not certain what the use case would be with so many inaccuracies. It's not clear if that is a training issue or a byproduct of 1 bit parameters.

    It might be that from a processing / reasoning / problem solving perspective the single bit is not a detriment (IE it can still solve math problems and perform reasoning very well), but when it comes to stored knowledge and information retrieval it has affected accuracy.

    I think people are missing the importance of this very-low memory footprint type of model though, as it could be used in lightweight, low-power embedded systems and the like. You know how there is the Internet of Things, with zillions of low-power wifi connected smart devices everywhere? Now imagine if they could also run AI models internally as well.

    • Yeah, the trend towards models that run in less memory and with less energy consumption is the story. This trend will undercut the profit potential of the current AI incumbents. Barriers to entry are dropping ... open source models also threaten adoption of the proprietary models.
    • by allo ( 1728082 )

      It is still a 2B model. 2B Models in 16 bit are not reliable either. The interesting point is not that it's better, but that it can be implemented without multiplications, which allows to build much more efficient hardware than graphic cards to run it. Because it only uses -1, 0, 1 you can use bitwise logical operations to obtain the same result a multiplication would, and these operations are much faster than multiplying the 4-16 bit weights that are common with current models. The CPU implementation is al

  • I invented a hyperefficient clock....

  • Nope. Sorry. Deal breaker.

  • Assuming Moores Law holds up, I wouldn't worry about performance, give it 10 years and AI will have run out of data to consume

Base 8 is just like base 10, if you are missing two fingers. -- Tom Lehrer

Working...