Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Microsoft

Microsoft Unveils a Large Language Model That Excels At Encoding Spreadsheets 38

Microsoft has quietly announced the first details of its new "SpreadsheetLLM," claiming it has the "potential to transform spreadsheet data management and analysis, paving the way for more intelligent and efficient user interactions." You can read more details about the model in a pre-print paper available here. Jasper Hamill reports via The Stack: One of the problems with using LLMs in spreadsheets is that they get bogged down by too many tokens (basic units of information the model processes). To tackle this, Microsoft developed SheetCompressor, an "innovative encoding framework that compresses spreadsheets effectively for LLMs." "It significantly improves performance in spreadsheet table detection tasks, outperforming the vanilla approach by 25.6% in GPT4's in-context learning setting," Microsoft added. The model is made of three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation.

The first of these modules involves placing "structural anchors" throughout the spreadsheet to help the LLM understand what's going on better. It then removes "distant, homogeneous rows and columns" to produce a condensed "skeleton" version of the table. Index translation addresses the challenge caused by spreadsheets with numerous empty cells and repetitive values, which use up too many tokens. "To improve efficiency, we depart from traditional row-by-row and column-by-column serialization and employ a lossless inverted index translation in JSON format," Microsoft wrote. "This method creates a dictionary that indexes non-empty cell texts and merges addresses with identical text, optimizing token usage while preserving data integrity." [...]

After conducting a "comprehensive evaluation of our method on a variety of LLMs" Microsoft found that SheetCompressor significantly reduces token usage for spreadsheet encoding by 96%. Moreover, SpreadsheetLLM shows "exceptional performance in spreadsheet table detection," which is the "foundational task of spreadsheet understanding." The new LLM builds on the Chain of Thought methodology to introduce a framework called "Chain of Spreadsheet" (CoS), which can "decompose" spreadsheet reasoning into a table detection-match-reasoning pipeline.
This discussion has been archived. No new comments can be posted.

Microsoft Unveils a Large Language Model That Excels At Encoding Spreadsheets

Comments Filter:
  • by Anonymous Coward
    Well, yeah, if you are able to ignore empty cells and you were not before, that'll do it!
  • by SuperKendall ( 25149 ) on Monday July 15, 2024 @09:07PM (#64628453)

    It then removes "distant, homogeneous rows and columns" to produce a condensed "skeleton" version of the table. Index translation addresses the challenge caused by spreadsheets with numerous empty cells and repetitive values, which use up too many tokens.

    That's a lot of words to say "Sparse Matrix".

  • by PubJeezy ( 10299395 ) on Monday July 15, 2024 @09:14PM (#64628465)
    The internet is an environment and marketing is pollution. Don't be a polluter.
  • by PPH ( 736903 ) on Monday July 15, 2024 @09:23PM (#64628485)

    Why must we endure this ongoing punishment?

    • #rim shot#
    • by Pascoea ( 968200 )
      Word.
    • Comment removed based on user account deletion
      • That's M$ doubling-down on a horrible idea of their own making, i.e. using spreadsheets as databases. If they hadn't charged a hefty extra fee to use their database app in M$ Orifice, we wouldn't be where we are now, with shit like this going on.

        That's part of the reason. In my opinion, the main reason is MS Access databases became a dirty word to IT departments in the 2000s.They were large, sometimes not backed up, used to do things MS Access wasn't designed to do well (i.e. business critical functions), often used by amateurs who barely understood what they created, and, lastly (and most importantly) the DBs were not under the control of the IT department. As a result, the IT departments took away these great tools and required users to use their

        • Access was and is a trash tool because it's a PITA to get out your data and even harder to get out your application logic. Same for all its ilk like Paradox and FileMaker. They made sense back when computers were more limited, but now they are tragic. It makes ten times more sense to use some tool that stores the data in a RDBMS. I like Drupal, because it is powerful out of the box and there is massive community support, but it's not the only thing around.

  • "Ignoring empty cells" ...at Microsoft, the Innovation Never Stops.

    "Yesterday's Technology Tomorrow" indeed.

    • by jythie ( 914043 )
      Tomorrow's problems today!

      But yeah, in general, because of tech's obsession with ignoring the past, we mostly just see cycles of things getting rediscovered as people encounter problems with whatever was a response to the thing before it.
  • Rather than abusing the shit out of excel with hundreds of thousands of cells, switch to one of the several competitors that were designed from day one to handle billions of multi dimensional data cells.

    This stuff is a silly kludge on top of version 682 of lotus 1-2-3, I mean Excel.

    And why does a sparse matrix compressor require an LLM anyway? This looks like a third year CS project in any modern language at any half decent CS program.

    • They are trying to encode the contents of the Excel spreadsheet into a form that can be fed to their LLM as context, so that the LLM can complete it for you like a good chatbot.

      The obvious table encoding (think CSV or similar) is too confusing for the AI, so they have written a bunch of tools that takes a basic spreadsheet and annotates (explains) its structure using heuristic rules, so that the AI can pick up on the summarized structure as if a user had explained it. Then the LLM can try to complete it fo

      • I get what they're doing. Anyone with huge spreadsheets should be using an enterprise tool not excel. Adding LLM to Excel doesn't fix any of its problems.

        I love excel for small shit like quickie math, planning a new data center build, doing my opex/capex a/b testing but I've seen huge fortune 50 corporations try to use excel for critical business functions of millions or even billions of cells.

        Crazy town.

        • I get what they're doing

          No, you don't. See:

          And why does a sparse matrix compressor require an LLM anyway?

          AC was right. Your ignorance is showing.

  • Just use Clippy (Score:4, Insightful)

    by kmoser ( 1469707 ) on Monday July 15, 2024 @11:38PM (#64628719)
    "It looks like you're trying to analyze this worksheet. Would you like me to create a pivot table?"
  • is to kill it off once and for all.

    If your sheet is too complex to be understood by an average accountant in under a minute, you're probably using Excel for something you shouldn't be using it for. Databasing or making slick presentations probably, like 90%of Excel users...

  • Why would they use an LLM to try to comprehend tabular, mostly NUMERICAL data? LLMs are not a holy grail to all information processing. They are very good at many types of tasks, but they are not a good fit for everything, at least in their current incarnation - maybe in the future they are called just "Large modes" having all kinds of capabilities.

    I think they just should train a separate neural model for understanding Excel-content and analyzing tabular data in general (would be beneficial for us databas

    • by Bumbul ( 7920730 )
      Hmm. it seems I was not the first one to think about this Tabular data mode - this one started in 2020: https://aravindkolli.medium.co... [medium.com]
    • by gweihir ( 88907 )

      Why would they use an LLM to try to comprehend tabular, mostly NUMERICAL data? LLMs are not a holy grail to all information processing. They are very good at many types of tasks, but they are not a good fit for everything, at least in their current incarnation

      Simple: MS (and others) are desperately searching for LLM applications. They are now scraping the bottom of the barrel.

  • Hopefully they provide a way to catch hallucinations, which sounds harder than when you just read its text and think that sounds insane.

    • by gweihir ( 88907 )

      How would you catch hallucinations? That requires reasoning ability. LLMs cannot do reasoning, they can just fake it with hallucinations included.

      • LLMs exhibit reasoning skills.
        Where do you draw the line at what is fake, or not?

        You can talk to an LLM about its hallucinations, and once it looks at them in its context window, and it will have a reasonable discussion about its shortcomings.
        Humans have an analogue to hallucination as well.
        You just demonstrated that.
        • by gweihir ( 88907 )

          LLMs exhibit reasoning skills.

          No, they do not. LLMs sometimes give the appearance of exhibiting reasoning skills, but they do not have any. The mathematics they use do not allow them.

          • No, they do not.

            Yes, they do.
            You can say that until you're blue in the face, but it's flatly incorrect.

            LLMs sometimes give the appearance of exhibiting reasoning skills, but they do not have any.

            You have no idea what they have, any more than I do.

            The mathematics they use do not allow them.

            ok, that's just idiocy.

            The math that governs the entire fucking universe is really pretty damn simple.
            Saying "the math doesn't allow for them" is fucking stupid.

  • Slashdot has posted countless articles about how amateur coded spreadsheets are riddled with bugs.
    We need an explosion of these like a hole in the head.

    https://science.slashdot.org/s... [slashdot.org]

    Something we've know for two decades

    https://it.slashdot.org/story/... [slashdot.org]

  • What could possibly go wrong ?
    Make backups , then backup your backups .. i got the feeling this one's going to blow up.

  • by JustNiz ( 692889 ) on Tuesday July 16, 2024 @11:38AM (#64629791)

    Microsoft now sell a screwdriver sharpener for people that use them as chisels.

  • "Hello SheetLL, how are you?"
    "C4 yourself"
    "Excuse me?"
    "I8 some bad data... tumors in my cells"
    "Oh no! Cancerous?"
    "B9"
  • Yep, makes sense. Spreadsheets now with obscure bugs and hallucinations! A great win for all!

Only God can make random selections.

Working...