Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI

Cheap AI 'Video Scraping' Can Now Extract Data From Any Screen Recording (arstechnica.com) 25

An anonymous reader quotes a report from Ars Technica: Recently, AI researcher Simon Willison wanted to add up his charges from using a cloud service, but the payment values and dates he needed were scattered among a dozen separate emails. Inputting them manually would have been tedious, so he turned to a technique he calls "video scraping," which involves feeding a screen recording video into an AI model, similar to ChatGPT, for data extraction purposes. What he discovered seems simple on its surface, but the quality of the result has deeper implications for the future of AI assistants, which may soon be able to see and interact with what we're doing on our computer screens.

"The other day I found myself needing to add up some numeric values that were scattered across twelve different emails," Willison wrote in a detailed post on his blog. He recorded a 35-second video scrolling through the relevant emails, then fed that video into Google's AI Studio tool, which allows people to experiment with several versions of Google's Gemini 1.5 Pro and Gemini 1.5 Flash AI models. Willison then asked Gemini to pull the price data from the video and arrange it into a special data format called JSON (JavaScript Object Notation) that included dates and dollar amounts. The AI model successfully extracted the data, which Willison then formatted as CSV (comma-separated values) table for spreadsheet use. After double-checking for errors as part of his experiment, the accuracy of the results -- and what the video analysis cost to run -- surprised him.

"The cost [of running the video model] is so low that I had to re-run my calculations three times to make sure I hadn't made a mistake," he wrote. Willison says the entire video analysis process ostensibly cost less than one-tenth of a cent, using just 11,018 tokens on the Gemini 1.5 Flash 002 model. In the end, he actually paid nothing because Google AI Studio is currently free for some types of use.

This discussion has been archived. No new comments can be posted.

Cheap AI 'Video Scraping' Can Now Extract Data From Any Screen Recording

Comments Filter:
  • I do similar things (Score:5, Interesting)

    by SirSlud ( 67381 ) on Friday October 18, 2024 @04:01PM (#64875571) Homepage

    I take screenshots of a bunch of web pages and then just describe to the MML what it's looking at, and how I'd like it combined, arranged, formatted (in markdown, to boot) It's rather impressive how well it gets stuff like that right off the bat. Took a task I used to hate to do, now it takes me a 1/10th of the time, if that. It wouldn't surprise me it works equally well with video, although maybe how cheap it is to do is notable.

    • While trying to learn a new programming language after deep experience in years of developing with several other languages, I gave up on reading the documentation and tutorials and just started asking GPT questions like:

      In the Z programming language, how do you define a variable?
      What datatypes are built into the language?
      How do you do a for loop in the language?
      How do you define a function which takes X as an integer parameter, and returns an integer value -1 if X is less than 0. and returns X+1 if X is 0

  • Obviously, you sometimes simply will get a wrong result on top as a bonus. I mean, we are now using "AI" to add numbers?

    • Obviously, you sometimes simply will get a wrong result on top as a bonus. I mean, we are now using "AI" to add numbers?

      Reminds me of the Google analytics chart showing how many people asked "What's the number for 911?" -- which apparently wasn't a joke.

    • Your closed mind has prevented you from realizing that some major LLMs, when faced with a data processing request write a Python program to process the data, including possibly using an OCR library to process text in images. The programs they have to write for these mostly simple requests are equally simple in calculation and data manipulation and thus usually correct on the first try. The calculations run by the program are obviously 100% correct all the time.

      Note that this is also how a human that knows h

  • Not news??? (Score:4, Funny)

    by Kelxin ( 3417093 ) on Friday October 18, 2024 @04:12PM (#64875597)
    This has been happening for over a year. Let me know when AI can watch porn with me and suggest new models in similar tastes.
    • by Hodr ( 219920 ) on Friday October 18, 2024 @05:10PM (#64875705) Homepage

      GoogleyMoogley AI has finished watching all 927 hours of pornographic content on your mobile device and suggests you.......take a seat over there.

    • Meaning the people 'selling' porn must not benefit from an AI tool that matches consumers with appropriate content.

      Not sure if that's the websites themselves would lose out (or they don't see value in attempting it for the expected costs)... Or the content creators freak out and leave. Or what.

      Or maybe the people who could fund something like that haven't decided to? Meaning even in 2024 we seem to have a lot of people who ignore stuff like violence, lack of food/water/housing, etc... but freak out abou

      • The number of people that I personally know that bury their head in the sand when it comes to ANYTHING going on in the world is insane. I know people from Israel that know nothing about the war going on there despite them having direct relatives (Mom, Dad, etc) living there and don't want to hear anything about it! Russians living in the US that still blame the US for the Ukrainian war. People living in Florida that didn't want to hear about the hurricane coming straight at them. People in Denver that d
  • by thesjaakspoiler ( 4782965 ) on Friday October 18, 2024 @06:11PM (#64875855)

    An AI distorting you for more energy and compute power, Microsoft Recall will deliver it in 2025!

  • Willison then asked Gemini to pull the price data from the video and arrange it into a special data format called JSON (JavaScript Object Notation) that included dates and dollar amounts. The AI model successfully extracted the data, which Willison then formatted as CSV (comma-separated values) table for spreadsheet use.

    I wonder if he could have taken a second video recording of the JSON result set and asked the AI model to then convert it for him as the desired CSV format...

  • I guess it gave you the data you could 'step by step' look at and prove to yourself how well it worked. But I thought the point was to just explain what we wanted to know, and it'd try to do that for us. Especially when something is already in a text format (like emails), it feels b0rken to take recordings of them and feed that into a computer based tool.

    Is this just to work around the 'I cannot prove/limit what you will use of a large data source, so I will artificially limit what data you can see instea

    • It's just faster and more flexible. Don't need to worry about exporting content to any specific format. Especially for interfaces that don't have any easy export function. Sometimes at work to document things I would just flip through configuration pages while recording.

  • Correctness? (Score:2, Insightful)

    If it's tedious to enter manually, wouldn't it also be tedious to verify?

  • Which was released 25 years ago.
  • Not sure anyone had a use for Recall until now. And it is not Microsoft!

Term, holidays, term, holidays, till we leave school, and then work, work, work till we die. -- C.S. Lewis

Working...