Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AI

Researchers Have a Magic Tool To Understand AI: Harry Potter (bloomberg.com) 89

More than two decades after J.K. Rowling introduced the world to a universe of magical creatures, forbidden forests and a teenage wizard, Harry Potter is finding renewed relevance in a very different body of literature: AI research. From a report: A growing number of researchers are using the best-selling Harry Potter books to experiment with generative artificial intelligence technology, citing the series' enduring influence in popular culture and the wide range of language data and complex wordplay within its pages. Reviewing a list of studies and academic papers referencing Harry Potter offers a snapshot into cutting-edge AI research -- and some of the thorniest questions facing the technology.

In perhaps the most notable recent example, Harry, Hermione and Ron star in a paper titled "Who's Harry Potter?" that sheds light on a new technique helping large language models to selectively forget information. It's a high-stakes task for the industry: Large language models, which power AI chatbots, are built on vast amounts of online data, including copyrighted material and other problematic content. That has led to lawsuits and public scrutiny for some AI companies. The paper's authors, Microsoft researchers Mark Russinovich and Ronen Eldan, said they've demonstrated that AI models can be altered or edited to remove any knowledge of the existence of the Harry Potter books, including characters and plots, without sacrificing the AI system's overall decision-making and analytical abilities.

The duo said they chose the books because of their universal familiarity. "We believed that it would be easier for people in the research community to evaluate the model resulting from our technique and confirm for themselves that the content has indeed been 'unlearned,'" said Russinovich, chief technology officer of Microsoft Azure. "Almost anyone can come up with prompts for the model that would probe whether or not it 'knows' the books. Even people who haven't read the books would be aware of plot elements and characters."

This discussion has been archived. No new comments can be posted.

Researchers Have a Magic Tool To Understand AI: Harry Potter

Comments Filter:
  • Say it as it is. Otherwise it's implied that they are remotely unbiased. "We will not infringing on copyrights any more, pinky swear!"
    • "We will not infringing on copyrights any more, pinky swear!"

      Simply knowing Harry Potter is not a violation of copyright - nobody has been sued (yet?) for having a copy in their head after reading it. The problem is knowing when it is legal to reproduce that information.

      The message that this forgetful solution sends is that the nuances of copyright law in the modern digital age have become so insanely complex that we can't even train an AI to follow them so the only thing we can do is tell it to forget anything under copyright. So good luck to all us humans who h

      • by gweihir ( 88907 )

        Storing something in a computer system is _not_ the same as "having it in your head". The former is always copyright infringement unless one of the very narrow fair-use provisions applies. The second is never copyright infringement as remembering something is not "copy, distribute, adapt, display, and perform". Now, if you recite what you remember publicly, that is different and may be copyright infringement, depending on the details.

        Yes, the law regards people and machines as different. Deal with it.

        • Yes, the law regards people and machines as different. Deal with it.

          No, the law does not differentiate in this case. The difference comes in how the copyright holder is likely to treat it. If you asked me to tell you something about Harry Potter I could likely do that without being sued because if they sued me then suddenly a lot of people would become very worried about having any sort of conversation about films, books music etc. and would likely demand the laws be changed.

          With AI this is less likely to happen and they are run by large corporations with lots of money

        • Storing something in a computer system is _not_ the same as "having it in your head". The former is always copyright infringement unless one of the very narrow fair-use provisions applies. The second is never copyright infringement as remembering something is not "copy, distribute, adapt, display, and perform". Now, if you recite what you remember publicly, that is different and may be copyright infringement, depending on the details.

          Yes, the law regards people and machines as different. Deal with it.

          I invite anyone who believe copyright law makes some kind of distinction or exception for humans to cite relevant portions of law which they believe supports their assumptions.

  • There isn't any Artificial Intelligence. It's just automated pattern matching. With a big dose of garbage in, garbage out.
    The C-Suite, Marketing and Media are running with the big lie to chase the clueless money.
    As for using Harry Potter, fantasy seems like a great match for what they are calling AI today.
    • by diffract ( 7165501 ) on Tuesday December 26, 2023 @11:47PM (#64108249)
      whatever you want to call it, AI or large language models have been very helpful at co-authoring code, email, article and book prototypes, etc. As well as providing amazing artwork through stable diffusion. Not what anyone would call "garbage out" or "smoke and mirrors."
      • by AmiMoJo ( 196126 )

        Is training them on Harry Porter going to improve them though? Those books are renowned for their bad writing and questionable choices.

        • People could easily ask for an RPG with the model as game master. And that would not sit well with the copyright owners.
        • by gweihir ( 88907 )

          You should not listen to that crap propaganda. The books are actually pretty good and there are no "questionable choices". That some people are butt-hurt about Rowling not cheering for the LGBTQIA-whatever community has lead to a lot of lies about her work being pushed. Shows that people in that community are just as often assholes with no honor or integrity as other people, i.e. quite often.

          • by AmiMoJo ( 196126 )

            Like that time she created time travel devices that are obviously incredibly powerful. When she realized that any future problems could be resolved by simply going back in time to nip them in the bud, her solution was to put them all on a shelf that promptly collapsed and destroyed them.

            You'd think that the wizards would take better care of their WMD, which for some reason they keep at a school.

            Or how about the slave elves? When white saviour Hermione tries to liberate them, she is mocked and the whole thin

            • Or how about the slave elves? When white saviour Hermione tries to liberate them, she is mocked and the whole thing played for laughs. But the fact is that they really are slaves, and mistreated as a matter of routine.

              You mean Rowling added real life to her story?

            • by gweihir ( 88907 )

              Dude, it is fantasy. Relax. Also maybe have a look at the numbers sold, which are a reliable measure of the quality of an entertainment product.

              You let yourself be manipulated by some professional victims. That is pretty pathetic.

              • by AmiMoJo ( 196126 )

                What I find most interesting is that when Rowling tried writing under a pen name, her adult novels didn't sell. At least not until she revealed it was really her.

                She's a bad writer who had a few good ideas.

      • AI or large language models have been very helpful at co-authoring code, email, article and book prototypes, etc. As well as providing amazing artwork through stable diffusion.

        Examples of these stellar achievements, please. Specifically, good code solving original, non-trivial problem, amazing artwork, or an interesting article that isn't filling out a template with some data.

        Also, WTF is a "book prototype"?

        • Re: (Score:3, Insightful)

          by gweihir ( 88907 )

          I have seen a few. One was an instantly created academic lecture on a topic, demonstrated by a "teaching methodology expert". The quality of the result was so laughably bad that my students would have been deeply offended to have their time wasted this way. It was also unfixable, i.e. not even the structure was good. All it did was allow you to waste the students time with minimal effort on your side. Another example was when some CS students of mine tried it on a very simple, very basic firewall exercise,

          • I've also seen "physics data analysis" that basically assigns coefficients on a "decision tree" from data where no patterns attributable to real physics are present. They nicely confirm all we saw there was noise anyway :)

            • by gweihir ( 88907 )

              Yep, pretty much. Delivers a result that looks good but is completely unusable or meaningless or even dangerous as soon as you ask a non-trivial question.

          • It's not a finished product yet, but there's something important here. Code autocomplete is noticeably better than a few years ago, for one thing. Give it time to mature.
            • by gweihir ( 88907 )

              This _is_ mature. What we are seeing is the best it can do.

              • Yes, because OBVIOUSLY it will never be improved. Ever. /s

                • by gweihir ( 88907 )

                  That is beyond stupid. Obviously even a mature product can be improved on occasion. Just not fundamentally. Take your religion-like believe in AI someplace else or get an actual clue about the tech.

                  • I have no religion-like belief in AI, but I work with it, teach it, and understand it. But you don't have to be an expert, even a child could look at the advances in LLMs and Diffusers over the last two years and have a clue about how we are just at the beginnings with these technologies. It's not all about fundamentally changing the core tech either, many of the big advances now are because of new tools on the edges of that core, things like ControlNet, new systems to guide the networks. Every week there i

                    • by gweihir ( 88907 )

                      Well, you are obviously blind to what is actually going on. Lets revisit this in a few years when the hype is over and has left as little or less behind than the previous AI hypes.

                  • As a top-level capability it may be near its limits. As a subsystem of a larger capability it almost certainly is not. In other words, improving the model and improving how we use the model are two different things.
                    • by gweihir ( 88907 )

                      Where did you copy that bullshit language?

                    • Think about it. A natural language input is converted into related text. The accuracy doesn't matter----what matters is that a machine has learned CONCEPTS. In some sense, anyway. It's able to understand that a query about the moon and mars is related to space stuff. That's astonishing. There's definitely something very important here. We just need to figure out how best to use it.
      • I could not agree less. I recently asked chatGPT to tell me how many '1' digits are in the string "000100". It's response? "2".
        Actually, anything that involves any form of analysis demonstrates the inability of LLM AI. If you want an AI to write a quicksort algorithm in python, it will have a go -because it's got great examples to choose from. However, if you provide a set of inputs and their corresponding outputs, and ask the AI to identify the underlying function, it will fail dismally. Even prett
        • It doesn't have to be immediately and wholly correct in order to be useful. LLMs are currently decent at two things: they give you straight answers to questions you'd spend much longer on Google searching for and they can generate things very quickly that would take a lot of time to create yourself. In both cases, you're likely to get some degree of garbage. But, depending on the signal to noise ratio (which can be improved with good/clever prompting), the reduction in time spent on research and creation is
        • I could not agree less. I recently asked chatGPT to tell me how many '1' digits are in the string "000100". It's response? "2".

          In this case the model is likely confused by the poorly worded question. It probably thinks you are asking how many single digits are in the string (2... as in "0" and "1") Ask it for clarification and see what it says.

          Instead rephrase the question:
          How many times does the character "1" appear in the string "000100"?

          Not that the models are good at this shit anyway. GPT-4 tends to write and execute programs to answer these types of questions because they are no better than counting the number of 1s in a st

          • Those aren't even interesting mappings. Try this - it's a reinterpretation of the bresenham line algorithm. ``` using Python3.10, write a function that, given parameter n (a positive integer), follows these criteria: (01) return a list of 2n+2 strings (zero-indexed, from 0 to 2n+1) (02) each element of the list is a string of 2n+1 characters (03) all characters are either 1 or 0. (04) each string is a palindrome (05) for odd-index strings, the central character is a 1. (0
      • by chthon ( 580889 )

        It seems to me that these systems are big if..then..else.. generators. Just to emphasize the fact that it is purely mechanic, and still falls under the restrictions of GÃdel's Incompleteness Theorem and the Halting Problem.

        As for support for coding, can the model explain why something needs to be coded in a certain way? If I work with ReSharper, I can always check why a certain style needs to be used.

    • Good thing you are smart and don't invest in AI like those clueless companies. What do they know about AI anyway? They don't even know it is fake. It takes a couch commentator to fix the truth.
    • by gweihir ( 88907 )

      Quite true. However, the nil wits that cannot see what is and instead project their fantasies are ever present in every one of the by now numerous AI hypes. All this shows is that in many people, natural intelligence is not really present in any significant amount. The claims were just as grand or even more so 40 years ago, and they did never pan out. For example, I remember the hype about fully interactive and autonomous "household assistance robots" about 35 years ago, of which exactly nothing was true or

  • Perhaps it's something that's become more common with younger millennials, but is Harry Potter really that well known? I think I can name 2 characters.

    If "well known" is a criteria, particularly since Harry Potter isn't much more than a allegory for Christ, I'd think that Christianity (or simply the bible) would be a better topic, given the much more substantial body of research and reference, and the fact that 63% of the population is at least nominally Christian (which I'd think would be at least a rough

    • The problem I would see with the Holy Bible is that to much of it would be available in single verses that it would be easy to replicate even without the complete text. In addition everything is labeled for location; such as John 1:1:, Acts 2:1, etc
      Now an interesting try would be to see if some AI could replicate the complete text from just those verses and do so in the various translations. For english there are just a few translations that are widely used and the style and wording is different enough tha
    • You clearly don't know much about Harry Potter. There were films, so yeah, lots of people know the story.
    • by Chas ( 5144 )

      The first book is the third-most read book in the English language. 120m copies.
      The entire series makes it the most read work of fiction in history (450 million-plus, or roughly 2.2x the distribution of The Bible.

      It also has seen translation into 88 languages (and counting). including Latin and Ancient Greek, being the longest work in Ancient Greek since the 3rd Century AD.

      So yes, it's a recent phenomenon. But the series is fairly firmly ensconced as a cultural touchstone.

    • Well, how much other books can you name that you haven't read but still know two characters?

    • by Anonymous Coward
      You wan't the AIs to hallucinate even more? Because training them on the bible will do that. At least Harry Potter is more grounded in reality.
    • I think I can name 2 characters.

      Oh c'mon, I have zero interest in it and even I know Snake and Dumblebore. There was a meme about them killing each other.

      Then there's that evil kid where the hat almost puked, and that nose-less wizard. I think the latter is some sort of god in the story.

      And I remember some sort of game where Harry has to catch someone's snatch. Today that would be enough to make him president.

    • Perhaps it's something that's become more common with younger millennials

      The first book was published in 1997. Kids who were the target age at the time are now almost 40.

    • I am confused, on the one hand you brag that you know maybe two characters from the Harry Potter books/movies and yet on the other hand you claim that you know enough aboutit to make the statement that it is clearly an allegory for Christ. Which is it? Do you know it well enough to make the second statement? Or are you ignorant of it so much that you don't know more than about 2 characters?
  • Wouldn't Twilight be a better target? Oh wait . . . are they pretending this memory-hole tech is good or evil?
  • He's something special for sure. Started as a low level NT hacker, now an AI researcher.
  • A few years ago a Harry Potter chapter was written by a predictive text robot: "Harry Potter and the Portrait of what Looked Like a Large Pile of Ash". Here's an animated version of it. https://www.youtube.com/watch?... [youtube.com]

  • there are two tools to "understand" AI, good knowledge of statistics and good knowledge of numerical methods. abra-pokeabra, hermiona! isn't on the list, sorry.

  • by Visarga ( 1071662 ) on Wednesday December 27, 2023 @03:56AM (#64108489)
    This is not AI, it is artificial dumbing to appease copyright owners who want to block derivatives.
    • by gweihir ( 88907 )

      Probably. At this time, it is a not peer-reviewed publication on arxiv, that also claims to be the first that is able to do this. From a brief skimming of the paper, it is at best, a sketch of an idea that may or may not work, not an actual result.

    • by tlhIngan ( 30335 )

      This is not AI, it is artificial dumbing to appease copyright owners who want to block derivatives.

      Derivative works are already blocked by standard "all rights reserved" copyright law. It's a fundamental cornerstone.

      If not, then things like the GPL wouldn't work, because releasing a modified version is a derivative work. Under standard copyright law, that would not be allowed, but the GPL lets you short-circuit that by if you agree to the GPL, you can release your derivative work. If you don't agree to the

    • Trans lives matter.

  • "The wizard crashed"
    "Why, did the levitate spell fizzle?"

  • by piojo ( 995934 ) on Wednesday December 27, 2023 @06:46AM (#64108631)

    This is totally backwards. What the copyright holders would want is for the model to not have used their works, or to undo whatever learning was done unlawfully. What's described here is keeping the learning, but wiping out the concept of the work, which is different.

    If you forgot the title of every work you ever read, it would not significantly diminish your knowledge. In this case they are presumably forgetting most of the character names as well, but that won't diminish what the model learned about the Hero's journey [wikipedia.org] from these stories, nor what it learned about writing for children nor about representations of evil.

    Not being an academic, I routinely fail to learn the titles and authors of papers I've skimmed, but like the "forgetting" in this summary, the omission doesn't really reduce what I've learned.

    (Let me know if the article goes into more detail and proves me wrong. It's paywalled.)

    • by Hodr ( 219920 )

      Unless they can prove they created a new archetype story how is this different from what every author in the world does, retell the same few stories with different names?

      • by piojo ( 995934 )

        It's not, provided 1) they got the books legally, and 2) the information isn't stored verbatim somewhere. But any model that can be tricked or debugged to show training data is storing it (or at least those sections) verbatim. That would be fair use if it is only excerpts.

        But I'm not sure how they got the books.

  • I don't understand the point of this research other than some sort of fascist dream of forgetting information that is no longer politically acceptable.
  • Researchers Have a Magic Tool To Understand AI: Harry Potter

    "After analyzing millions of AI-generated images, they concluded, "Mostly Hermione.' "

  • Your search term returned no results anywhere from anything or any database. Please reapply for humanhood at your local AI*mart.

  • "complex wordplay"

    LOL. My 7 year old keeps finding grammar errors

To communicate is the beginning of understanding. -- AT&T

Working...