Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
AI

Mistral Adds a New API That Turns Any PDF Document Into an AI-Ready Markdown File 24

Mistral has launched a new multimodal OCR API that converts complex PDF documents into AI-friendly Markdown files. The API is designed for efficiency, handles visual elements like illustrations, supports complex formatting such as mathematical expressions, and reportedly outperforms similar offerings from major competitors. TechCrunch reports: Unlike most OCR APIs, Mistral OCR is a multimodal API, meaning that it can detect when there are illustrations and photos intertwined with blocks of text. The OCR API creates bounding boxes around these graphical elements and includes them in the output. Mistral OCR also doesn't just output a big wall of text; the output is formatted in Markdown, a formatting syntax that developers use to add links, headers, and other formatting elements to a plain text file.

Mistral OCR is available on Mistral's own API platform or through its cloud partners (AWS, Azure, Google Cloud Vertex, etc.). And for companies working with classified or sensitive data, Mistral offers on-premise deployment. According to the Paris-based AI company, Mistral OCR performs better than APIs from Google, Microsoft, and OpenAI. The company has tested its OCR model with complex documents that include mathematical expressions (LaTeX formatting), advanced layouts, or tables. It is also supposed to perform better with non-English documents. [...]

Mistral is also using Mistral OCR for its own AI assistant Le Chat. When a user uploads a PDF file, the company uses Mistral OCR in the background to understand what's in the document before processing the text. Companies and developers will most likely use Mistral OCR with a RAG (aka Retrieval-Augmented Generation) system to use multimodal documents as input in an LLM. And there are many potential use cases. For instance, we could envisage law firms using it to help them swiftly plough through huge volumes of documents.
"Over the years, organizations have accumulated numerous documents, often in PDF or slide formats, which are inaccessible to LLMs, particularly RAG systems. With Mistral OCR, our customers can now convert rich and complex documents into readable content in all languages," said Mistral co-founder and chief science officer Guillaume Lample.

"This is a crucial step toward the widespread adoption of AI assistants in companies that need to simplify access to their vast internal documentation," he added.

Mistral Adds a New API That Turns Any PDF Document Into an AI-Ready Markdown File

Comments Filter:
  • by ClarkEvans ( 102211 ) on Friday March 07, 2025 @05:18AM (#65217333) Homepage

    I get that AI is costly, hard to run on the average desktop, and hence APIs. However, it seems we've shifted as a community to fully supporting proprietary web APIs where we not only don't have source code but even lack the binaries. Couldn't slashdot moderators help by promoting truly open source tools, especially AI, that can be run locally, even if it requires a 10k hardware? It seems very reasonable if a company sells API as cost-sharing model. Even so, at least we could support entities that fully open their models and have active "hacker" communities that are not bound to a proprietary service.

    • "Couldn't slashdot moderators help by promoting truly open source tools"

      Hmm. We can assume that moderator help would be marking any comment not mentioning Open Source as "-1 Troll" then ?
    • [ Sorry, second comment, different point. ]

      I agree partly. I will not use this proprietary API but will use open source libraries to do the same job.

      BUT:
      As a developer, had I written, say, something to do this and wanted to earn money to fund myself to do further development, the simple, regrettable fact is that I think I could make more by releasing it as a proprietary API than by releasing it as an open source library.

      To help Open Source, that is what we have to change.
      • by butlerm ( 3112 ) on Friday March 07, 2025 @07:29AM (#65217471)

        I would not use a proprietary API from a company I had never heard of unless it was at least ten times better than the alternative and was either derived from an open API or the owners welcomes others making compatible implementations without making a federal case out of it. Companies generally do need closed source implementations to be profitable but if they want anyone to use their product they should at least offer open, documented interfaces. Imagine a integrated circuit manufacturer that wanted board manufacturers to use their chips without so much as providing a datasheet. What a waste of time that would be. The situtation with software is similar, to the point that many software vendors are reluctant to post a single screenshot on their product marketing website - i.e. forget about any API, we are not even going to show you what our product looks like. Not sure why anyone sensible would want to make a major purchase from a company like that.

    • by Njovich ( 553857 )

      Mistral gives away a lot, but at some point they also need revenue generating stuff.

    • I get that AI is costly, hard to run on the average desktop, and hence APIs. However, it seems we've shifted as a community to fully supporting proprietary web APIs where we not only don't have source code but even lack the binaries. Couldn't slashdot moderators help by promoting truly open source tools, especially AI, that can be run locally, even if it requires a 10k hardware? It seems very reasonable if a company sells API as cost-sharing model. Even so, at least we could support entities that fully open their models and have active "hacker" communities that are not bound to a proprietary service.

      Yours is a good point. Although I'd like to add, most developers don't buy hardware aside from their workstations anymore, because they buy cloud services. Nevermind AW$, companies like VULTR and Digital Ocean allow users affordable access to a variety of services and are including more and more AI-relevant services too.

      Our choices are expanding all the time. One might purchase a month of claude.ai directly or time with Deepseek on Azure [microsoft.com], for example.

    • by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Friday March 07, 2025 @09:36AM (#65217729) Homepage Journal

      If it helps any, Tesseract is pretty good at OCRing weird PDFs. There are various flow options that will allow you to organize the scanned text in various ways which either retain the structure or don't. AFAIK it doesn't have an option to provide any information about text styles, but I'm not going to check into that in depth right now. I only know that it will do this at all because I used it recently and it successfully put some text which was next to some other text below it in the output file by using the flag --psm 6

      • I might also note that you do have to convert the PDF to an Image before Tesseract will read it. It doesn't render PDFs for you. There is however OCRmyPDF [github.com] which uses Tesseract.

    • by allo ( 1728082 )

      Hey, Mistral released quite a few of the best open weight models. Let them have some API-only thing to make profit. And they even got away from their research-and-noncommercial license and switched back to Apache 2 for their releases. If you want to criticize APIs, start with the so called OpenAI company.

  • Everything will be stolen. No one will have any reason to work--even if there is any job for them. Fuck you AI. Fuck you nVidia!
    • by gweihir ( 88907 )

      Sure. That is why Linux does not exist: Nobody has reason to work on it since everything will be stolen anyways!

      • Sure. That is why Linux does not exist: Nobody has reason to work on it since everything will be stolen anyways!

        Objection...relevance?

        • by gweihir ( 88907 )

          "Everything will be stolen. No one will have any reason to work"...

          Unless you count working on Linux as non-work?

    • Everything will be stolen. That was my thought too. A very large number of corporate quarterly reports, largely PDFs of PowerPoint slide shows are now suitable for AI training.

  • Apparently, the supposed head-start the US has is mostly mythical....

    • by HiThere ( 15173 )

      This doesn't sound like something that uses much AI, just something that's useful for creating data to feed to an AI. So it's quite reasonable for an AI company to create it, but that doesn't mean it's AI, much less an advanced AI.

      • by gweihir ( 88907 )

        Read the description of how it works. But your confusion is understandable. This AI application actually has a good business-case, unlike most of the current hype.

  • by bill_mcgonigle ( 4333 ) * on Friday March 07, 2025 @11:24AM (#65217959) Homepage Journal

    This is a cool piece of software - I wish it were open source like much of the Mistral code.

    However there's an inefficiency belied by bragging about being able to turn LaTeX into LaTeX with AI.

    OK, markdown, but the next version...

    It seems like something we'll outgrow with an evolving culture. Timestamping a hash on a blockchain should be sufficient for tracing authenticity.

    • by ceoyoyo ( 59147 )

      It's not turning LaTeX into LaTeX or markdown. That would be silly. It's turning *PDF* into markdown.

      That also sounds silly, but sadly it is not. Some PDF files are reasonably assembled and you can parse them fairly easily. Many are so crazy that it's far easier to render them to an image then use OCR to get back the text.

  • The article uses the terms PDF and OCR.
    If you have the PDF file, you don't need OCR.
    If you have OCR, it will work on any printed text, regardless of how it was created.

  • by schweini ( 607711 ) on Friday March 07, 2025 @01:01PM (#65218243)
    Amongst the AI 'chatGPT' hype, I think people haven't celebreated the incredible progress that OCR has made recently. I don't know how exactly, but the OCR capabiities of these multimodal models is incredible. You can e.g. take a kind of blurry photo of a menu in a bar or restaurant, and ask for suggestions. Just the OCR aspect of that is omething that was completely outside of OCRs league just a few years ago.

You have junk mail.

Working...