Researchers Have a Magic Tool To Understand AI: Harry Potter (bloomberg.com) 89
More than two decades after J.K. Rowling introduced the world to a universe of magical creatures, forbidden forests and a teenage wizard, Harry Potter is finding renewed relevance in a very different body of literature: AI research. From a report: A growing number of researchers are using the best-selling Harry Potter books to experiment with generative artificial intelligence technology, citing the series' enduring influence in popular culture and the wide range of language data and complex wordplay within its pages. Reviewing a list of studies and academic papers referencing Harry Potter offers a snapshot into cutting-edge AI research -- and some of the thorniest questions facing the technology.
In perhaps the most notable recent example, Harry, Hermione and Ron star in a paper titled "Who's Harry Potter?" that sheds light on a new technique helping large language models to selectively forget information. It's a high-stakes task for the industry: Large language models, which power AI chatbots, are built on vast amounts of online data, including copyrighted material and other problematic content. That has led to lawsuits and public scrutiny for some AI companies. The paper's authors, Microsoft researchers Mark Russinovich and Ronen Eldan, said they've demonstrated that AI models can be altered or edited to remove any knowledge of the existence of the Harry Potter books, including characters and plots, without sacrificing the AI system's overall decision-making and analytical abilities.
The duo said they chose the books because of their universal familiarity. "We believed that it would be easier for people in the research community to evaluate the model resulting from our technique and confirm for themselves that the content has indeed been 'unlearned,'" said Russinovich, chief technology officer of Microsoft Azure. "Almost anyone can come up with prompts for the model that would probe whether or not it 'knows' the books. Even people who haven't read the books would be aware of plot elements and characters."
In perhaps the most notable recent example, Harry, Hermione and Ron star in a paper titled "Who's Harry Potter?" that sheds light on a new technique helping large language models to selectively forget information. It's a high-stakes task for the industry: Large language models, which power AI chatbots, are built on vast amounts of online data, including copyrighted material and other problematic content. That has led to lawsuits and public scrutiny for some AI companies. The paper's authors, Microsoft researchers Mark Russinovich and Ronen Eldan, said they've demonstrated that AI models can be altered or edited to remove any knowledge of the existence of the Harry Potter books, including characters and plots, without sacrificing the AI system's overall decision-making and analytical abilities.
The duo said they chose the books because of their universal familiarity. "We believed that it would be easier for people in the research community to evaluate the model resulting from our technique and confirm for themselves that the content has indeed been 'unlearned,'" said Russinovich, chief technology officer of Microsoft Azure. "Almost anyone can come up with prompts for the model that would probe whether or not it 'knows' the books. Even people who haven't read the books would be aware of plot elements and characters."
"Microsoft spokespeople" (Score:1)
Copyright Laws too Hard to Follow (Score:2)
"We will not infringing on copyrights any more, pinky swear!"
Simply knowing Harry Potter is not a violation of copyright - nobody has been sued (yet?) for having a copy in their head after reading it. The problem is knowing when it is legal to reproduce that information.
The message that this forgetful solution sends is that the nuances of copyright law in the modern digital age have become so insanely complex that we can't even train an AI to follow them so the only thing we can do is tell it to forget anything under copyright. So good luck to all us humans who h
Re: (Score:2)
Storing something in a computer system is _not_ the same as "having it in your head". The former is always copyright infringement unless one of the very narrow fair-use provisions applies. The second is never copyright infringement as remembering something is not "copy, distribute, adapt, display, and perform". Now, if you recite what you remember publicly, that is different and may be copyright infringement, depending on the details.
Yes, the law regards people and machines as different. Deal with it.
Re: (Score:2)
Yes, the law regards people and machines as different. Deal with it.
No, the law does not differentiate in this case. The difference comes in how the copyright holder is likely to treat it. If you asked me to tell you something about Harry Potter I could likely do that without being sued because if they sued me then suddenly a lot of people would become very worried about having any sort of conversation about films, books music etc. and would likely demand the laws be changed.
With AI this is less likely to happen and they are run by large corporations with lots of money
Re: (Score:2)
Storing something in a computer system is _not_ the same as "having it in your head". The former is always copyright infringement unless one of the very narrow fair-use provisions applies. The second is never copyright infringement as remembering something is not "copy, distribute, adapt, display, and perform". Now, if you recite what you remember publicly, that is different and may be copyright infringement, depending on the details.
Yes, the law regards people and machines as different. Deal with it.
I invite anyone who believe copyright law makes some kind of distinction or exception for humans to cite relevant portions of law which they believe supports their assumptions.
Todays AI is all smoke and mirrors! (Score:2, Insightful)
The C-Suite, Marketing and Media are running with the big lie to chase the clueless money.
As for using Harry Potter, fantasy seems like a great match for what they are calling AI today.
Re:Todays AI is all smoke and mirrors! (Score:5, Insightful)
Re: (Score:1)
Is training them on Harry Porter going to improve them though? Those books are renowned for their bad writing and questionable choices.
Re: (Score:3)
I was thinking of the obvious plot holes and examples of where she clearly didn't think things through and snookered herself. Or not bothering to research Chinese names before creating Chinese characters, and calling the token black guy "Shackleton".
Re: (Score:1)
Are you shitting me?
You're wondering why a BRITISH author, writing fantasy fiction about the magical society in BRITAIN and Europe, primarily based on a BRITISH-BASED SCHOOL for UK children isn't doing in-depth reports for NON-ASSIMILATED students?
And the ambiguous ancestry of black guy precludes him from being named SHACKLEBOLT?
Do you even know what a shackle bolt is used for?
Of course you don't.
Because you agree with The Current Thing! And you know NOTHING beyond the pablum you're peddling.
Re: (Score:2)
Re: (Score:1)
You should not listen to that crap propaganda. The books are actually pretty good and there are no "questionable choices". That some people are butt-hurt about Rowling not cheering for the LGBTQIA-whatever community has lead to a lot of lies about her work being pushed. Shows that people in that community are just as often assholes with no honor or integrity as other people, i.e. quite often.
Re: (Score:3)
Like that time she created time travel devices that are obviously incredibly powerful. When she realized that any future problems could be resolved by simply going back in time to nip them in the bud, her solution was to put them all on a shelf that promptly collapsed and destroyed them.
You'd think that the wizards would take better care of their WMD, which for some reason they keep at a school.
Or how about the slave elves? When white saviour Hermione tries to liberate them, she is mocked and the whole thin
Re: (Score:3)
Or how about the slave elves? When white saviour Hermione tries to liberate them, she is mocked and the whole thing played for laughs. But the fact is that they really are slaves, and mistreated as a matter of routine.
You mean Rowling added real life to her story?
Re: (Score:2)
Dude, it is fantasy. Relax. Also maybe have a look at the numbers sold, which are a reliable measure of the quality of an entertainment product.
You let yourself be manipulated by some professional victims. That is pretty pathetic.
Re: (Score:2)
What I find most interesting is that when Rowling tried writing under a pen name, her adult novels didn't sell. At least not until she revealed it was really her.
She's a bad writer who had a few good ideas.
Re: (Score:1)
Re: (Score:1)
AI or large language models have been very helpful at co-authoring code, email, article and book prototypes, etc. As well as providing amazing artwork through stable diffusion.
Examples of these stellar achievements, please. Specifically, good code solving original, non-trivial problem, amazing artwork, or an interesting article that isn't filling out a template with some data.
Also, WTF is a "book prototype"?
Re: (Score:3, Insightful)
I have seen a few. One was an instantly created academic lecture on a topic, demonstrated by a "teaching methodology expert". The quality of the result was so laughably bad that my students would have been deeply offended to have their time wasted this way. It was also unfixable, i.e. not even the structure was good. All it did was allow you to waste the students time with minimal effort on your side. Another example was when some CS students of mine tried it on a very simple, very basic firewall exercise,
Re: (Score:2)
I've also seen "physics data analysis" that basically assigns coefficients on a "decision tree" from data where no patterns attributable to real physics are present. They nicely confirm all we saw there was noise anyway :)
Re: (Score:2)
Yep, pretty much. Delivers a result that looks good but is completely unusable or meaningless or even dangerous as soon as you ask a non-trivial question.
Re: Todays AI is all smoke and mirrors! (Score:2)
Re: (Score:2)
This _is_ mature. What we are seeing is the best it can do.
Re: (Score:2)
Yes, because OBVIOUSLY it will never be improved. Ever. /s
Re: (Score:2)
That is beyond stupid. Obviously even a mature product can be improved on occasion. Just not fundamentally. Take your religion-like believe in AI someplace else or get an actual clue about the tech.
Re: (Score:2)
I have no religion-like belief in AI, but I work with it, teach it, and understand it. But you don't have to be an expert, even a child could look at the advances in LLMs and Diffusers over the last two years and have a clue about how we are just at the beginnings with these technologies. It's not all about fundamentally changing the core tech either, many of the big advances now are because of new tools on the edges of that core, things like ControlNet, new systems to guide the networks. Every week there i
Re: (Score:2)
Well, you are obviously blind to what is actually going on. Lets revisit this in a few years when the hype is over and has left as little or less behind than the previous AI hypes.
Re: Todays AI is all smoke and mirrors! (Score:2)
Re: (Score:2)
Where did you copy that bullshit language?
Re: Todays AI is all smoke and mirrors! (Score:2)
Re: (Score:2)
I understand that I see your downmodding instead of examples. You should try harder if you want to impress me.
Re: (Score:2)
I still don't know what a "book prototype" is, although I've written more books than you have, troll. Mostly because there is no "book prototype", there is a book idea, a book plan, a book draft, but no "prototype", smartypants.
And you're still so short on examples :)))
Re: (Score:2)
The level there is lower than the basement floor of a Shubya manga factory. And there is zero creativity, just rehash of uploaded images, and zero talent.
Had the people that have lost their time to produce this crap spent the same time actually learning to draw even as a hobby, we'd get better results.
Re: (Score:2)
Actually, anything that involves any form of analysis demonstrates the inability of LLM AI. If you want an AI to write a quicksort algorithm in python, it will have a go -because it's got great examples to choose from. However, if you provide a set of inputs and their corresponding outputs, and ask the AI to identify the underlying function, it will fail dismally. Even prett
Perfection is the enemy of progress (Score:3)
Re: (Score:2)
I could not agree less. I recently asked chatGPT to tell me how many '1' digits are in the string "000100". It's response? "2".
In this case the model is likely confused by the poorly worded question. It probably thinks you are asking how many single digits are in the string (2... as in "0" and "1") Ask it for clarification and see what it says.
Instead rephrase the question:
How many times does the character "1" appear in the string "000100"?
Not that the models are good at this shit anyway. GPT-4 tends to write and execute programs to answer these types of questions because they are no better than counting the number of 1s in a st
Re: (Score:2)
Re: (Score:2)
It seems to me that these systems are big if..then..else.. generators. Just to emphasize the fact that it is purely mechanic, and still falls under the restrictions of GÃdel's Incompleteness Theorem and the Halting Problem.
As for support for coding, can the model explain why something needs to be coded in a certain way? If I work with ReSharper, I can always check why a certain style needs to be used.
Re: (Score:2)
Re: (Score:2)
Quite true. However, the nil wits that cannot see what is and instead project their fantasies are ever present in every one of the by now numerous AI hypes. All this shows is that in many people, natural intelligence is not really present in any significant amount. The claims were just as grand or even more so 40 years ago, and they did never pan out. For example, I remember the hype about fully interactive and autonomous "household assistance robots" about 35 years ago, of which exactly nothing was true or
Well known? (Score:1)
Perhaps it's something that's become more common with younger millennials, but is Harry Potter really that well known? I think I can name 2 characters.
If "well known" is a criteria, particularly since Harry Potter isn't much more than a allegory for Christ, I'd think that Christianity (or simply the bible) would be a better topic, given the much more substantial body of research and reference, and the fact that 63% of the population is at least nominally Christian (which I'd think would be at least a rough
Re: (Score:1)
Now an interesting try would be to see if some AI could replicate the complete text from just those verses and do so in the various translations. For english there are just a few translations that are widely used and the style and wording is different enough tha
Re: Well known? (Score:3)
Re: (Score:2)
The first book is the third-most read book in the English language. 120m copies.
The entire series makes it the most read work of fiction in history (450 million-plus, or roughly 2.2x the distribution of The Bible.
It also has seen translation into 88 languages (and counting). including Latin and Ancient Greek, being the longest work in Ancient Greek since the 3rd Century AD.
So yes, it's a recent phenomenon. But the series is fairly firmly ensconced as a cultural touchstone.
Re: (Score:2)
Well, how much other books can you name that you haven't read but still know two characters?
Re: Well known? (Score:2)
Re: (Score:1)
Re: (Score:2)
I think I can name 2 characters.
Oh c'mon, I have zero interest in it and even I know Snake and Dumblebore. There was a meme about them killing each other.
Then there's that evil kid where the hat almost puked, and that nose-less wizard. I think the latter is some sort of god in the story.
And I remember some sort of game where Harry has to catch someone's snatch. Today that would be enough to make him president.
Re: (Score:3)
Perhaps it's something that's become more common with younger millennials
The first book was published in 1997. Kids who were the target age at the time are now almost 40.
Re: (Score:2)
Off by a decade, but nice try!
Re: (Score:2)
Re: (Score:2)
Harry Potter? (Score:2)
Re: (Score:2)
Mark Russinovich (Score:2)
Harry Potter chapter done by predictive text robot (Score:2)
A few years ago a Harry Potter chapter was written by a predictive text robot: "Harry Potter and the Portrait of what Looked Like a Large Pile of Ash". Here's an animated version of it. https://www.youtube.com/watch?... [youtube.com]
Re: (Score:2)
Quite moronic, I imagine this could have been told by someone with an IQ of less than 70, but with an exceptionally large vocabulary.
Re:Harry Potter chapter done by predictive text ro (Score:5, Insightful)
Come to think of it, "someone with an IQ of less than 70, but with an exceptionally large vocabulary" is a pretty good characterization of these LLMs.
Re: (Score:2)
Came here to say just that. Large vocabulary and an IQ below room temperature... sounds like an LLM.
Re: (Score:2)
A variation would be "well-spoken but utterly dumb". That effect can be created in humans by a specific type of "education".
Re: (Score:2)
What that brings to mind was this lovely takedown of the President (at the time); you get just as as much coherence using iOS's predictive text [youtube.com].
Re: (Score:2)
And more honest, too. Sad.
LOL (Score:2)
there are two tools to "understand" AI, good knowledge of statistics and good knowledge of numerical methods. abra-pokeabra, hermiona! isn't on the list, sorry.
Artificial Dumbing Research (Score:3)
Re: (Score:2)
Probably. At this time, it is a not peer-reviewed publication on arxiv, that also claims to be the first that is able to do this. From a brief skimming of the paper, it is at best, a sketch of an idea that may or may not work, not an actual result.
Re: (Score:2)
Derivative works are already blocked by standard "all rights reserved" copyright law. It's a fundamental cornerstone.
If not, then things like the GPL wouldn't work, because releasing a modified version is a derivative work. Under standard copyright law, that would not be allowed, but the GPL lets you short-circuit that by if you agree to the GPL, you can release your derivative work. If you don't agree to the
Re: (Score:2)
Trans lives matter.
A new round of IT jokes coming (Score:2)
"The wizard crashed"
"Why, did the levitate spell fizzle?"
Totally backwards (Score:3)
This is totally backwards. What the copyright holders would want is for the model to not have used their works, or to undo whatever learning was done unlawfully. What's described here is keeping the learning, but wiping out the concept of the work, which is different.
If you forgot the title of every work you ever read, it would not significantly diminish your knowledge. In this case they are presumably forgetting most of the character names as well, but that won't diminish what the model learned about the Hero's journey [wikipedia.org] from these stories, nor what it learned about writing for children nor about representations of evil.
Not being an academic, I routinely fail to learn the titles and authors of papers I've skimmed, but like the "forgetting" in this summary, the omission doesn't really reduce what I've learned.
(Let me know if the article goes into more detail and proves me wrong. It's paywalled.)
Re: (Score:2)
Unless they can prove they created a new archetype story how is this different from what every author in the world does, retell the same few stories with different names?
Re: (Score:2)
It's not, provided 1) they got the books legally, and 2) the information isn't stored verbatim somewhere. But any model that can be tricked or debugged to show training data is storing it (or at least those sections) verbatim. That would be fair use if it is only excerpts.
But I'm not sure how they got the books.
Still trying to erase J.K. Rowling? (Score:2)
Mostly (Score:2)
"After analyzing millions of AI-generated images, they concluded, "Mostly Hermione.' "
Disappear people from the net (Score:2)
Your search term returned no results anywhere from anything or any database. Please reapply for humanhood at your local AI*mart.
complex wordplay... (Score:1)
"complex wordplay"
LOL. My 7 year old keeps finding grammar errors