

Microsoft Unveils a Large Language Model That Excels At Encoding Spreadsheets 38
Microsoft has quietly announced the first details of its new "SpreadsheetLLM," claiming it has the "potential to transform spreadsheet data management and analysis, paving the way for more intelligent and efficient user interactions." You can read more details about the model in a pre-print paper available here. Jasper Hamill reports via The Stack: One of the problems with using LLMs in spreadsheets is that they get bogged down by too many tokens (basic units of information the model processes). To tackle this, Microsoft developed SheetCompressor, an "innovative encoding framework that compresses spreadsheets effectively for LLMs." "It significantly improves performance in spreadsheet table detection tasks, outperforming the vanilla approach by 25.6% in GPT4's in-context learning setting," Microsoft added. The model is made of three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation.
The first of these modules involves placing "structural anchors" throughout the spreadsheet to help the LLM understand what's going on better. It then removes "distant, homogeneous rows and columns" to produce a condensed "skeleton" version of the table. Index translation addresses the challenge caused by spreadsheets with numerous empty cells and repetitive values, which use up too many tokens. "To improve efficiency, we depart from traditional row-by-row and column-by-column serialization and employ a lossless inverted index translation in JSON format," Microsoft wrote. "This method creates a dictionary that indexes non-empty cell texts and merges addresses with identical text, optimizing token usage while preserving data integrity." [...]
After conducting a "comprehensive evaluation of our method on a variety of LLMs" Microsoft found that SheetCompressor significantly reduces token usage for spreadsheet encoding by 96%. Moreover, SpreadsheetLLM shows "exceptional performance in spreadsheet table detection," which is the "foundational task of spreadsheet understanding." The new LLM builds on the Chain of Thought methodology to introduce a framework called "Chain of Spreadsheet" (CoS), which can "decompose" spreadsheet reasoning into a table detection-match-reasoning pipeline.
The first of these modules involves placing "structural anchors" throughout the spreadsheet to help the LLM understand what's going on better. It then removes "distant, homogeneous rows and columns" to produce a condensed "skeleton" version of the table. Index translation addresses the challenge caused by spreadsheets with numerous empty cells and repetitive values, which use up too many tokens. "To improve efficiency, we depart from traditional row-by-row and column-by-column serialization and employ a lossless inverted index translation in JSON format," Microsoft wrote. "This method creates a dictionary that indexes non-empty cell texts and merges addresses with identical text, optimizing token usage while preserving data integrity." [...]
After conducting a "comprehensive evaluation of our method on a variety of LLMs" Microsoft found that SheetCompressor significantly reduces token usage for spreadsheet encoding by 96%. Moreover, SpreadsheetLLM shows "exceptional performance in spreadsheet table detection," which is the "foundational task of spreadsheet understanding." The new LLM builds on the Chain of Thought methodology to introduce a framework called "Chain of Spreadsheet" (CoS), which can "decompose" spreadsheet reasoning into a table detection-match-reasoning pipeline.
Reduces token usage by 96% (Score:1)
Nothing new under the sun (Score:5, Insightful)
It then removes "distant, homogeneous rows and columns" to produce a condensed "skeleton" version of the table. Index translation addresses the challenge caused by spreadsheets with numerous empty cells and repetitive values, which use up too many tokens.
That's a lot of words to say "Sparse Matrix".
The internet is an environment.... (Score:4, Funny)
Excels at? (Score:3)
Why must we endure this ongoing punishment?
Re: Excels at? (Score:2)
Re: (Score:2)
Re: (Score:2)
and words at Excel.
Re: (Score:2)
Re: (Score:1)
That's M$ doubling-down on a horrible idea of their own making, i.e. using spreadsheets as databases. If they hadn't charged a hefty extra fee to use their database app in M$ Orifice, we wouldn't be where we are now, with shit like this going on.
That's part of the reason. In my opinion, the main reason is MS Access databases became a dirty word to IT departments in the 2000s.They were large, sometimes not backed up, used to do things MS Access wasn't designed to do well (i.e. business critical functions), often used by amateurs who barely understood what they created, and, lastly (and most importantly) the DBs were not under the control of the IT department. As a result, the IT departments took away these great tools and required users to use their
Re: Excels at? (Score:3)
Access was and is a trash tool because it's a PITA to get out your data and even harder to get out your application logic. Same for all its ilk like Paradox and FileMaker. They made sense back when computers were more limited, but now they are tragic. It makes ten times more sense to use some tool that stores the data in a RDBMS. I like Drupal, because it is powerful out of the box and there is massive community support, but it's not the only thing around.
Ignoring empty cells (Score:2)
"Ignoring empty cells" ...at Microsoft, the Innovation Never Stops.
"Yesterday's Technology Tomorrow" indeed.
Re: (Score:2)
But yeah, in general, because of tech's obsession with ignoring the past, we mostly just see cycles of things getting rediscovered as people encounter problems with whatever was a response to the thing before it.
Alternative solutions (Score:2)
Rather than abusing the shit out of excel with hundreds of thousands of cells, switch to one of the several competitors that were designed from day one to handle billions of multi dimensional data cells.
This stuff is a silly kludge on top of version 682 of lotus 1-2-3, I mean Excel.
And why does a sparse matrix compressor require an LLM anyway? This looks like a third year CS project in any modern language at any half decent CS program.
Re: (Score:1)
Hi stalker AC clown. I love how you sprinkle your DK guy graffiti on my posts. It makes me know I live in your head 24/7. #winningforme!
Your typical post is essentially, "UR dum!1" with no backing for your DK guy nonsense. Does it make you feel good to post your clown noise? Baa, baa, baa, said AC DK guy. And I laughed. Please continue. It's like a free circus ticket where I get to watch you get out of the clown car 50 times.
As always... waaaaaaaay smarter than you! *hugs!*
Re: (Score:3)
The obvious table encoding (think CSV or similar) is too confusing for the AI, so they have written a bunch of tools that takes a basic spreadsheet and annotates (explains) its structure using heuristic rules, so that the AI can pick up on the summarized structure as if a user had explained it. Then the LLM can try to complete it fo
Re: (Score:1)
I get what they're doing. Anyone with huge spreadsheets should be using an enterprise tool not excel. Adding LLM to Excel doesn't fix any of its problems.
I love excel for small shit like quickie math, planning a new data center build, doing my opex/capex a/b testing but I've seen huge fortune 50 corporations try to use excel for critical business functions of millions or even billions of cells.
Crazy town.
Re: (Score:2)
I get what they're doing
No, you don't. See:
And why does a sparse matrix compressor require an LLM anyway?
AC was right. Your ignorance is showing.
Just use Clippy (Score:4, Insightful)
They'll just want another $50/month/user (Score:2)
to use this thing.
The only appropriate thing to do with Excel (Score:2)
is to kill it off once and for all.
If your sheet is too complex to be understood by an average accountant in under a minute, you're probably using Excel for something you shouldn't be using it for. Databasing or making slick presentations probably, like 90%of Excel users...
Re: (Score:2)
Indeed. For complex things with tables, use stuff like pandas.
Wrong approach (Score:2)
Why would they use an LLM to try to comprehend tabular, mostly NUMERICAL data? LLMs are not a holy grail to all information processing. They are very good at many types of tasks, but they are not a good fit for everything, at least in their current incarnation - maybe in the future they are called just "Large modes" having all kinds of capabilities.
I think they just should train a separate neural model for understanding Excel-content and analyzing tabular data in general (would be beneficial for us databas
Re: (Score:2)
Re: (Score:2)
Why would they use an LLM to try to comprehend tabular, mostly NUMERICAL data? LLMs are not a holy grail to all information processing. They are very good at many types of tasks, but they are not a good fit for everything, at least in their current incarnation
Simple: MS (and others) are desperately searching for LLM applications. They are now scraping the bottom of the barrel.
Hallucinations might be hard to find (Score:2)
Hopefully they provide a way to catch hallucinations, which sounds harder than when you just read its text and think that sounds insane.
Re: (Score:2)
How would you catch hallucinations? That requires reasoning ability. LLMs cannot do reasoning, they can just fake it with hallucinations included.
Re: (Score:2)
Where do you draw the line at what is fake, or not?
You can talk to an LLM about its hallucinations, and once it looks at them in its context window, and it will have a reasonable discussion about its shortcomings.
Humans have an analogue to hallucination as well.
You just demonstrated that.
Re: (Score:2)
LLMs exhibit reasoning skills.
No, they do not. LLMs sometimes give the appearance of exhibiting reasoning skills, but they do not have any. The mathematics they use do not allow them.
Re: (Score:2)
No, they do not.
Yes, they do.
You can say that until you're blue in the face, but it's flatly incorrect.
LLMs sometimes give the appearance of exhibiting reasoning skills, but they do not have any.
You have no idea what they have, any more than I do.
The mathematics they use do not allow them.
ok, that's just idiocy.
The math that governs the entire fucking universe is really pretty damn simple.
Saying "the math doesn't allow for them" is fucking stupid.
[Facepalm] Amateur spreadsheets are bad enough (Score:2)
Slashdot has posted countless articles about how amateur coded spreadsheets are riddled with bugs.
We need an explosion of these like a hole in the head.
https://science.slashdot.org/s... [slashdot.org]
Something we've know for two decades
https://it.slashdot.org/story/... [slashdot.org]
Microsoft .. new .. (Score:2)
What could possibly go wrong ? .. i got the feeling this one's going to blow up.
Make backups , then backup your backups
In other news... (Score:3)
Microsoft now sell a screwdriver sharpener for people that use them as chisels.
Hello SheetLL (sounds like beatle) (Score:2)
"C4 yourself"
"Excuse me?"
"I8 some bad data... tumors in my cells"
"Oh no! Cancerous?"
"B9"
So excel is soo crappy, you need LLM assistance? (Score:2)
Yep, makes sense. Spreadsheets now with obscure bugs and hallucinations! A great win for all!