Microsoft Reportedly Develops LLM Series That Can Rival OpenAI, Anthropic Models 41

Posted by BeauHD on Friday March 07, 2025 @10:02PM from the less-reliant dept.

Microsoft is reportedly developing its own large language model series capable of rivaling OpenAI and Anthropic's models. SiliconANGLE reports: Sources told Bloomberg that the LLM series is known as MAI. That's presumably an acronym for "Microsoft artificial intelligence." It might also be a reference to Maia 100, an internally-developed AI chip the company debuted last year. It's possible Microsoft is using the processor to power the new MAI models. The company recently tested the LLM series to gauge its performance. As part of the evaluation, Microsoft engineers checked whether MAI could power the company's Copilot family of AI assistants. Data from the tests reportedly indicates that the LLM series is competitive with models from OpenAI and Anthropic.

That Microsoft evaluated whether MAI could be integrated into Copilot hints the LLM series is geared towards general-purpose processing rather than reasoning. Many of the tasks supported by Copilot can be performed with a general-purpose model. According to Bloomberg, Microsoft is currently developing a second LLM series optimized for reasoning tasks. The report didn't specify details such as the number of models Microsoft is training or their parameter counts. It's also unclear whether they might provide multimodal features.

Microsoft Reportedly Develops LLM Series That Can Rival OpenAI, Anthropic Models

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 41 Comments Log In/Create an Account

Comments Filter:

How to measure (Score:2)

by Big Hairy Gorilla ( 9839972 ) writes:

IQ test for AI's?
- Re: (Score:3)
  
  by martin-boundary ( 547041 ) writes:
  
  Various IQ tests for AIs already exist. There are league tables and contests attempting to measure the "intelligence" of these tools. They are fundamentally flawed due to being also used for marketing purposes and as training targets for subsequent versions of the AIs.
  Consider the following scenario. You are playing a new game, and you are very bad at it. But sometimes, you succeed at beating the first level. Suppose you can save the game. Now you get to try the second level without repeating the first. Y
  - Re: (Score:2)
    
    by DamnOregonian ( 963763 ) writes:
    
    You've just described the problem with IQ tests and standardized tests in general.
    The various LLM benchmarks are not more impacted than those- except that you can't pay a smart person to do your LLM benchmark for you.
    - Re: (Score:2)
      
      by Malenfrant ( 781088 ) writes:
      
      That's complete nonsense. The mistake you've made is that you have assumed that people get to retake tests over and over again until they get it right. In most cases they don't, they get one or at most two resits. And even then the fact that they required a resit is recorded. They don't get to keep trying until they pass no matter how many attempts that takes.
      >p> 'AI' however does, at least in the internal Company tests. This is why the results published by these Companies is vastly different to the
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        That's complete nonsense. The mistake you've made is that you have assumed that people take classes to learn to pass the tests, and that, in fact, you can take an IQ test as many times as you want.
        They don't get to keep trying until they pass no matter how many attempts that takes.
        You can literally take the SATs as many times as you want.
        Then the 'AI' simply does the calculations, which we all know that computers are very good at.
        This more bullshit.
        LLMs are not computers. They are run by computers.
        Math is actually particularly difficult for them, and it took a lot of training to make them good at it.
        This is then reported as the 'AI' being able to win a Math Olympiad.
        No, it's not.
        
        You really have no fucking idea what you're talking about, do you?
        
        Re: (Score:2)
        
        by Malenfrant ( 781088 ) writes:
        
        Oh, I see. You've come up with one counter example and extrapolated that for all tests everywhere. Great. It's sunny today, so I guess it must be sunny here every single day, since one example covers every possibility. And even in your one example, you have made a fundamental error. When a student takes a second, third or even five hundredth SAT they are not retaking the exact same test. They are given different questions. When these LLMs are tested they are tested on the eaxct same questions that they fail
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Oh, I see. You've come up with one counter example and extrapolated that for all tests everywhere.
        Multiple counter examples, actually.
        What you've failed to do is come up with a single example that backs up your blanket assertion- which I wouldn't bother with now, as any blanket assertion that has 2 examples going against it varies between stupid and not helpful.
        When a student takes a second, third or even five hundredth SAT they are not retaking the exact same test.
        That's not an error at all.
        This is basic math, here.
        You've got a corpus of things you must get good at.
        Every test you take will have a spattering of challenges. Statistics tells you exactly what weights to use in your training.
        And yes, computers are very good at doing calculations.
        LLMs are not co
        
        Re: (Score:2)
        
        by martin-boundary ( 547041 ) writes:
        
        Personally, I'd say he's done a good job at pointing out the obvious flaws in your objections....
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Personally, I'd say he's done a good job at pointing out the obvious flaws in your objections....
        Personally, I'd say you're probably an idiot, then.
        
        Dude asserts that since SATs change, something can't learn the corpus by re-taking.
        "Good job", indeed. That's why people absolutely don't improve after they take the test a second time ;)
        
        Re: (Score:2)
        
        by martin-boundary ( 547041 ) writes:
        
        He did no such thing, but keep putting words in his mouth ;)
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        They did, indeed.
        And even in your one example, you have made a fundamental error. When a student takes a second, third or even five hundredth SAT they are not retaking the exact same test. They are given different questions. When these LLMs are tested they are tested on the eaxct same questions that they failed last time, and have now been specifically trained to get right. This is an entirely different thing to retaking a type of exam that you fiailed the first time. If I were given the exact same test twice I would expect to get 100% the second time.
        As if SAT questions had no overlap, lol.
        
        You two idiots trying to band together against someone who actually has 6 brain cells to rub together is cute, though.
        
        Re: (Score:2)
        
        by martin-boundary ( 547041 ) writes:
        
        You're trawling the bottom of the barrel, just give up. It's not like anybody is still reading:)
        How you go from "exact same test" to "overlap" is your business, as is why you care so much about defending you original debunked misrepresentation. I won't ask.
    - Re: How to measure (Score:2)
      
      by Big Hairy Gorilla ( 9839972 ) writes:
      
      Slippery topic, this measuring intelligence thing. I knew this would be fun.
      
      I was definitely implying all the hairy parts when I posted. You are immediately confronted with the flaws, biases, and inadequacies, of the IQ test, or any other test, when you attempt to quantify ... whatever LLMs do.
      
      People can't agree on what intelligence is, and reality isn't objective ... everyone interprets what they observe. Your interpretation of "reality" is always limited by your own capabilities. We are past objectivity,
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        I agree entirely.
  - Re: (Score:1)
    
    by cascadingstylesheet ( 140919 ) writes:
    
    Consider the following scenario. You are playing a new game, and you are very bad at it. But sometimes, you succeed at beating the first level. Suppose you can save the game. Now you get to try the second level without repeating the first. You are bad at it, but sometimes you succeed, and save the second level. Suppose you get to the end like this, does this prove that you are capable of playing the game?
    Er, maybe? Depends on the rules of the game.
    (I mean you're pretty much describing how I beat LoZ BOTW ... )
    - Re: (Score:2)
      
      by martin-boundary ( 547041 ) writes:
      
      No, I'm describing the current development process of the AI industry at a high level. Each AI product (version, iteration, mashup, company product etc) is analogous to a saved state in the game, and the game consists of improving common benchmark scores incrementally. You can work out the rest:)
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Benchmarks. [huggingface.co]
  They evaluate their success rate at a battery of tasks.
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  Usually by a set of questions that should be answered in a zero-shot request.
  Some of the benchmark sets are public and may not be a good measure anymore (especially since Microsoft's phi line of models is trained on synthetic data), but others contain private data that is in no training set and can better verify the "IQ" of the model. The tests also have different kinds of questions, like knowledge, reasoning, math, language understanding, etc.
  - Re: (Score:2)
    
    by martin-boundary ( 547041 ) writes:
    
    Note however that even keeping the test set private does not actually protect against leakage into the training phase from repeated use.
    (The ML industry has not learnt anything from the p-value hacking fiasco.)
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Because houses over here in the Seattle area are very expensive, and Microsoft pays well.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    There are always enough people that will place money over self-respect. And, of course, there are the ones with Stockholm-Syndrome.
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      No disagreement, there.
I love you Tay (Score:2)

by awwshit ( 6214476 ) writes:

Tay has returned from her slumber.
- Re: (Score:2)
  
  by DrMrLordX ( 559371 ) writes:
  
  One wonders if MAI will be every bit as racist.
  - Re: (Score:2)
    
    by awwshit ( 6214476 ) writes:
    
    You are what you eat.
Super-clippy (Score:2)

by oxfletch ( 108699 ) writes:

Oh great. Clippy on crack cocaine. My life is complete.
- Re: (Score:2, Informative)
  
  by DamnOregonian ( 963763 ) writes:
  
  Man, I can get on board with the MS bashing like the rest of them... but you're a fucking moron.
  - Re: (Score:2)
    
    by Shades72 ( 6355170 ) writes:
    
    While I agree with you about the excessiveness of that person's bashing of Microsoft, it it also not completely unwarranted. Microsoft did bring out their Phi LLMs and in all the LLMs I have tried out now, the level of hallucination and how quickly those LLMs get there baffles me to this day. 2 answers into a "conversation" and the PHI LLM turned into a very jealous and bi-polar partner who accuses me of cheating on it.
    With the second answer PHI provided, I decided to play along with that "game", just for f
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      While I agree with you about the excessiveness of that person's bashing of Microsoft, it it also not completely unwarranted. Microsoft did bring out their Phi LLMs and in all the LLMs I have tried out now, the level of hallucination and how quickly those LLMs get there baffles me to this day. 2 answers into a "conversation" and the PHI LLM turned into a very jealous and bi-polar partner who accuses me of cheating on it.
      Agreed. Phi is not great. However- I can't really judge new-Phi until the benchmarks are out and I decide whether or not to give it a shot.
      old-Phi does terrible on benchmarks. Anyone who ever claimed it was good is worth raising an eyebrow at.
      
      As for putting it out to the public- meh. There are *lots* of bad models out there, or 1-bit quantizations of what-used-to-be-good models.
      Phi isn't the worst I've burnt GPU cycles on.
      Wasn't impressed with Microsoft under the stewardship of Gates and Balmer. And I'm really not impressed with Nadella stewardship of MS. The only plus point he has over Gates/Balmer is that he is more agreeable in interviews about Microsoft.
      I'm not here to defend Microsoft.
      It's a sad day when I'm forced to.
    - Re: (Score:2)
      
      by az-saguaro ( 1231754 ) writes:
      
      I enjoyed your remarks, and agree.
      Something you said gave me a new insight into some of this AI-LLM stuff.
      the level of hallucination and how quickly those LLMs get there baffles me to this day. 2 answers into a "conversation" and the PHI LLM turned into a very jealous and bi-polar partner who accuses me of cheating on it. With the second answer PHI provided, I decided to play along with that "game", just for funsies. Which made it very clear that it was bi-polar.
      
      You weren't detailed about the actual prompts or conversations you had, but I got the sense you were simulating something related to relationships or emotions or romances or something like that. If so, perhaps the LLM isn't actually hallucinating.
      It may be that the LLM has a built in bias. Who is more likely to post a rant on social media?
      - happy lovebirds enjoying life, too busy with fun to complain
M$ ai (Score:2)

by commodore73 ( 967172 ) writes:

When Microsoft *truly* integrates "AI" into Windows, its primary purposes will be to prevent you from uninstalling their unrelated crapware, changing your configuration, advertising, surveillance, and other preferences, protecting your privacy, finding any setting without asking an LLM, using browsers other than edge, or otherwise having any control over your machine.
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  I think you've just described the end game for every non-Linux OS there is.
  - Re: M$ ai (Score:2)
    
    by commodore73 ( 967172 ) writes:
    
    There may be a Rust OS coming soon, which may run Linux binaries. I have concerns, but it may be an opportunity to discard legacy. I feel that the Rust community has advantages over the Linux community, but lacks the skills, breadth, penetration, experience, and other. I have to say, I really like Rust, and some have hope. Time will tell.
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      There may be a Rust OS coming soon
      I've seen a few OS' built around languages- they tend to suck ass. Those who are that evangelical about a language tend to miss the big picture when it comes to designing operating systems.
      Which may run Linux binaries.
      No way in the 9 hells, man.
      I've spent a few thousand hours of my life working in the kernel- "running Linux binaries" is a task I'm not sure anyone contemplating that really understands.
      Linking an elf, emulating some syscalls, executing the .text section of an image? Sure- absolutely.
      But the entirety of the ioctls, biz
      - Re: M$ ai (Score:2)
        
        by commodore73 ( 967172 ) writes:
        
        But won't LLM write a new OS better than Linux in a new language better than human next week? jK. Great response, I learned and agree with all of this, thanks. I'm not a zealot, but I personally couldn't do what I'm doing now with C/++/#. But that's at a much higher level than kernel.
      - Re: M$ ai (Score:2)
        
        by commodore73 ( 967172 ) writes:
        
        Noting your handle, I'll be in Portland next week. I think I've read your posts or maybe we've even chatted here before. Just throwing out a random chance, which often has interesting results for me. I have no idea how to communicate privately here though.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        There may be a Rust OS coming soon
        I've seen a few OS' built around languages- they tend to suck ass. Those who are that evangelical about a language tend to miss the big picture when it comes to designing operating systems.
        Indeed. These people tend to be one-trick-ponies and are clueless enough to not even begin to understand how much they do not see.
        Rust is a cool language from the perspective of what its goals are.
        I find the syntax absolutely atrocious, but I don't give it any negative marks for that, just means I'm not likely to develop a preference for it.
        A rather bad design problem with Rust is that it expects way too much from people using it. There are too many advanced concepts that have been inegrated and, on the other hand, there is very rudimentary OO that requires a lot of skill, insight and knowledge in the ones using it and experiences with real OO languages do not really transfer over. This design basically pisses every
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Hahaha, no. That is completely unrealistic. Maybe somebody with megalomania made such claims, but writing an OS kernel is a bit more involved than just using a cool language.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. While that malware installation by Microsoft may still be some time off, I will invest some time this summer to isolate Win11 with Teams in a VM and to check whether I can get Teams to run well in a browser under Linux with recording. If I get either to work well, that will be it for native installations (except for my gaming-only machine).

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

How to measure (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: How to measure (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I love you Tay (Score:2)

Re: (Score:2)

Re: (Score:2)

Super-clippy (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

M$ ai (Score:2)

Re: (Score:2)

Re: M$ ai (Score:2)

Re: (Score:2)

Re: M$ ai (Score:2)

Re: M$ ai (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals