

Microsoft Reportedly Develops LLM Series That Can Rival OpenAI, Anthropic Models 41
Microsoft is reportedly developing its own large language model series capable of rivaling OpenAI and Anthropic's models. SiliconANGLE reports: Sources told Bloomberg that the LLM series is known as MAI. That's presumably an acronym for "Microsoft artificial intelligence." It might also be a reference to Maia 100, an internally-developed AI chip the company debuted last year. It's possible Microsoft is using the processor to power the new MAI models. The company recently tested the LLM series to gauge its performance. As part of the evaluation, Microsoft engineers checked whether MAI could power the company's Copilot family of AI assistants. Data from the tests reportedly indicates that the LLM series is competitive with models from OpenAI and Anthropic.
That Microsoft evaluated whether MAI could be integrated into Copilot hints the LLM series is geared towards general-purpose processing rather than reasoning. Many of the tasks supported by Copilot can be performed with a general-purpose model. According to Bloomberg, Microsoft is currently developing a second LLM series optimized for reasoning tasks. The report didn't specify details such as the number of models Microsoft is training or their parameter counts. It's also unclear whether they might provide multimodal features.
That Microsoft evaluated whether MAI could be integrated into Copilot hints the LLM series is geared towards general-purpose processing rather than reasoning. Many of the tasks supported by Copilot can be performed with a general-purpose model. According to Bloomberg, Microsoft is currently developing a second LLM series optimized for reasoning tasks. The report didn't specify details such as the number of models Microsoft is training or their parameter counts. It's also unclear whether they might provide multimodal features.
How to measure (Score:2)
Re: (Score:3)
Consider the following scenario. You are playing a new game, and you are very bad at it. But sometimes, you succeed at beating the first level. Suppose you can save the game. Now you get to try the second level without repeating the first. Y
Re: (Score:2)
The various LLM benchmarks are not more impacted than those- except that you can't pay a smart person to do your LLM benchmark for you.
Re: (Score:2)
That's complete nonsense. The mistake you've made is that you have assumed that people get to retake tests over and over again until they get it right. In most cases they don't, they get one or at most two resits. And even then the fact that they required a resit is recorded. They don't get to keep trying until they pass no matter how many attempts that takes.
>p> 'AI' however does, at least in the internal Company tests. This is why the results published by these Companies is vastly different to the
Re: (Score:2)
They don't get to keep trying until they pass no matter how many attempts that takes.
You can literally take the SATs as many times as you want.
Then the 'AI' simply does the calculations, which we all know that computers are very good at.
This more bullshit.
LLMs are not computers. They are run by computers.
Math is actually particularly difficult for them, and it took a lot of training to make them good at it.
This is then reported as the 'AI' being able to win a Math Olympiad.
No, it's not.
You really have no fucking idea what you're talking about, do you?
Re: (Score:2)
Oh, I see. You've come up with one counter example and extrapolated that for all tests everywhere. Great. It's sunny today, so I guess it must be sunny here every single day, since one example covers every possibility. And even in your one example, you have made a fundamental error. When a student takes a second, third or even five hundredth SAT they are not retaking the exact same test. They are given different questions. When these LLMs are tested they are tested on the eaxct same questions that they fail
Re: (Score:2)
Oh, I see. You've come up with one counter example and extrapolated that for all tests everywhere.
Multiple counter examples, actually.
What you've failed to do is come up with a single example that backs up your blanket assertion- which I wouldn't bother with now, as any blanket assertion that has 2 examples going against it varies between stupid and not helpful.
When a student takes a second, third or even five hundredth SAT they are not retaking the exact same test.
That's not an error at all.
This is basic math, here.
You've got a corpus of things you must get good at.
Every test you take will have a spattering of challenges. Statistics tells you exactly what weights to use in your training.
And yes, computers are very good at doing calculations.
LLMs are not co
Re: (Score:2)
Re: (Score:2)
Personally, I'd say he's done a good job at pointing out the obvious flaws in your objections....
Personally, I'd say you're probably an idiot, then.
;)
Dude asserts that since SATs change, something can't learn the corpus by re-taking.
"Good job", indeed. That's why people absolutely don't improve after they take the test a second time
Re: (Score:2)
Re: (Score:2)
And even in your one example, you have made a fundamental error. When a student takes a second, third or even five hundredth SAT they are not retaking the exact same test. They are given different questions. When these LLMs are tested they are tested on the eaxct same questions that they failed last time, and have now been specifically trained to get right. This is an entirely different thing to retaking a type of exam that you fiailed the first time. If I were given the exact same test twice I would expect to get 100% the second time.
As if SAT questions had no overlap, lol.
You two idiots trying to band together against someone who actually has 6 brain cells to rub together is cute, though.
Re: (Score:2)
How you go from "exact same test" to "overlap" is your business, as is why you care so much about defending you original debunked misrepresentation. I won't ask.
Re: How to measure (Score:2)
I was definitely implying all the hairy parts when I posted. You are immediately confronted with the flaws, biases, and inadequacies, of the IQ test, or any other test, when you attempt to quantify
People can't agree on what intelligence is, and reality isn't objective
Re: (Score:2)
Re: (Score:1)
Consider the following scenario. You are playing a new game, and you are very bad at it. But sometimes, you succeed at beating the first level. Suppose you can save the game. Now you get to try the second level without repeating the first. You are bad at it, but sometimes you succeed, and save the second level. Suppose you get to the end like this, does this prove that you are capable of playing the game?
Er, maybe? Depends on the rules of the game.
(I mean you're pretty much describing how I beat LoZ BOTW ... )
Re: (Score:2)
Re: (Score:2)
They evaluate their success rate at a battery of tasks.
Re: (Score:2)
Usually by a set of questions that should be answered in a zero-shot request.
Some of the benchmark sets are public and may not be a good measure anymore (especially since Microsoft's phi line of models is trained on synthetic data), but others contain private data that is in no training set and can better verify the "IQ" of the model. The tests also have different kinds of questions, like knowledge, reasoning, math, language understanding, etc.
Re: (Score:2)
(The ML industry has not learnt anything from the p-value hacking fiasco.)
Re: (Score:2)
Re: (Score:2)
There are always enough people that will place money over self-respect. And, of course, there are the ones with Stockholm-Syndrome.
Re: (Score:2)
I love you Tay (Score:2)
Tay has returned from her slumber.
Re: (Score:2)
One wonders if MAI will be every bit as racist.
Re: (Score:2)
You are what you eat.
Super-clippy (Score:2)
Oh great. Clippy on crack cocaine. My life is complete.
Re: (Score:2, Informative)
Re: (Score:2)
While I agree with you about the excessiveness of that person's bashing of Microsoft, it it also not completely unwarranted. Microsoft did bring out their Phi LLMs and in all the LLMs I have tried out now, the level of hallucination and how quickly those LLMs get there baffles me to this day. 2 answers into a "conversation" and the PHI LLM turned into a very jealous and bi-polar partner who accuses me of cheating on it.
With the second answer PHI provided, I decided to play along with that "game", just for f
Re: (Score:2)
While I agree with you about the excessiveness of that person's bashing of Microsoft, it it also not completely unwarranted. Microsoft did bring out their Phi LLMs and in all the LLMs I have tried out now, the level of hallucination and how quickly those LLMs get there baffles me to this day. 2 answers into a "conversation" and the PHI LLM turned into a very jealous and bi-polar partner who accuses me of cheating on it.
Agreed. Phi is not great. However- I can't really judge new-Phi until the benchmarks are out and I decide whether or not to give it a shot.
old-Phi does terrible on benchmarks. Anyone who ever claimed it was good is worth raising an eyebrow at.
As for putting it out to the public- meh. There are *lots* of bad models out there, or 1-bit quantizations of what-used-to-be-good models.
Phi isn't the worst I've burnt GPU cycles on.
Wasn't impressed with Microsoft under the stewardship of Gates and Balmer. And I'm really not impressed with Nadella stewardship of MS. The only plus point he has over Gates/Balmer is that he is more agreeable in interviews about Microsoft.
I'm not here to defend Microsoft.
It's a sad day when I'm forced to.
Re: (Score:2)
I enjoyed your remarks, and agree.
Something you said gave me a new insight into some of this AI-LLM stuff.
the level of hallucination and how quickly those LLMs get there baffles me to this day. 2 answers into a "conversation" and the PHI LLM turned into a very jealous and bi-polar partner who accuses me of cheating on it. With the second answer PHI provided, I decided to play along with that "game", just for funsies. Which made it very clear that it was bi-polar.
You weren't detailed about the actual prompts or conversations you had, but I got the sense you were simulating something related to relationships or emotions or romances or something like that. If so, perhaps the LLM isn't actually hallucinating.
It may be that the LLM has a built in bias. Who is more likely to post a rant on social media?
- happy lovebirds enjoying life, too busy with fun to complain
M$ ai (Score:2)
Re: (Score:2)
Re: M$ ai (Score:2)
Re: (Score:2)
There may be a Rust OS coming soon
I've seen a few OS' built around languages- they tend to suck ass. Those who are that evangelical about a language tend to miss the big picture when it comes to designing operating systems.
Which may run Linux binaries.
No way in the 9 hells, man. .text section of an image? Sure- absolutely.
I've spent a few thousand hours of my life working in the kernel- "running Linux binaries" is a task I'm not sure anyone contemplating that really understands.
Linking an elf, emulating some syscalls, executing the
But the entirety of the ioctls, biz
Re: M$ ai (Score:2)
Re: M$ ai (Score:2)
Re: (Score:2)
There may be a Rust OS coming soon
I've seen a few OS' built around languages- they tend to suck ass. Those who are that evangelical about a language tend to miss the big picture when it comes to designing operating systems.
Indeed. These people tend to be one-trick-ponies and are clueless enough to not even begin to understand how much they do not see.
Rust is a cool language from the perspective of what its goals are.
I find the syntax absolutely atrocious, but I don't give it any negative marks for that, just means I'm not likely to develop a preference for it.
A rather bad design problem with Rust is that it expects way too much from people using it. There are too many advanced concepts that have been inegrated and, on the other hand, there is very rudimentary OO that requires a lot of skill, insight and knowledge in the ones using it and experiences with real OO languages do not really transfer over. This design basically pisses every
Re: (Score:2)
Hahaha, no. That is completely unrealistic. Maybe somebody with megalomania made such claims, but writing an OS kernel is a bit more involved than just using a cool language.
Re: (Score:2)
Indeed. While that malware installation by Microsoft may still be some time off, I will invest some time this summer to isolate Win11 with Teams in a VM and to check whether I can get Teams to run well in a browser under Linux with recording. If I get either to work well, that will be it for native installations (except for my gaming-only machine).