Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI

DeepMind AI Tool Helps Historians Restore Ancient Texts (theregister.com) 15

AI software can help historians interpret and date ancient texts by reconstructing works destroyed over time, according to a new paper published in Nature. The Register reports: A team of computer scientists and experts in classical studies led by DeepMind and Ca' Foscari University of Venice trained a transformer-based neural network to restore inscriptions written in ancient Greek between 7th century BC and 5th century AD. The model, named "Ithaca" after the home of legendary Greek king Odysseus, can also estimate when the text was written and where it might have originated. By recovering fragments of text on broken pieces of pottery or blurry scripts, for example, researchers can begin translating them and learn more about ancient civilizations. [...] Why ancient Greek? The researchers said the variable content and available context in the Greek epigraphic record made it an "excellent challenge" for language processing, plus the large body of (digitized) written texts that is currently available -- essential for training the model.

First, the text needs to be transcribed by scanning an image of an old object or script. The text is then fed into Ithaca for analysis. It works by predicting lost or blurry characters to restore words as outputs. The software generates and ranks a list of its top predictions; epigraphists can then scroll through them and judge whether the model's guesses seem accurate or not. The best results are reached when human and machine work together. When experts worked alone, they were 25 per cent accurate at piecing together ancient artefacts, but when they collaborated with Ithaca the accuracy level jumped up to 72 per cent. Ithaca's performance on its own is about 62 per cent, for comparison. It's also 71 per cent at pinpointing the location of where the text was written, and can date works to within 30 years of their creation between 800BC and 800AD.

Ithaca was trained on over 63,000 Greek inscriptions containing over three million words from The Packard Humanities Institute's Searchable Greek Inscriptions public dataset. The team masked portions of the text and tasked the model with filling in the blanks. Ithaca analyses other words in a given sentence for context when generating characters. [...] DeepMind is now adjusting its model to adapt to other types of old writing systems, like Akkadian developed in Mesopotamia, Demotic from ancient Egypt, to Mayan originating from Central America and ancient Hebrew.

This discussion has been archived. No new comments can be posted.

DeepMind AI Tool Helps Historians Restore Ancient Texts

Comments Filter:
  • by fox171171 ( 1425329 ) on Friday March 11, 2022 @05:27AM (#62346813)
    If it's anything like the autocorrect on my phone, this would be a disaster.
    • It's nothing like autocorrect on any phone. It provides probabilities of what missing text is [nature.com] based on multiple factors.

      • I think phone autocorrect algorithms are proprietary, but I'm pretty sure they use multiple factors as well, including a language model (which in some sense is what Ithaca uses, although it has a lot more memory to throw at the problem than your phone does). The phone also uses keyboard proximity (you swiped a 's', but a 'd' fits the context better, and is adjacent to the 's'). Ithaca has no such keyboard proximity, of course. It could, but apparently does not, use partial letters (like the right-down st

        • think phone autocorrect algorithms are proprietary, but I'm pretty sure they use multiple factors as well, including a language model

          Whatever they use, it wasn't based on messages that people actually send (at least initially) because of major gaffes it would make. Autocorrect also doesn't just correct the previous/current word and assumes your entire sentence my be in total error. I'm sure it's improved in the last 10 years but it started out being awful.

          which in some sense is what Ithaca uses,

          However, it doesn't modify any text it is given. Rather, it only predicts in the text that is omitted and fits. These are very different objectives.

          It could, but apparently does not, use partial letters (like the right-down stroke of a kappa, or an arc of the circle of a phi or omicron or a couple other letters, depending on the position of the arc), which a human would use, and which a suitably trained machine learned system could use.

          I noticed this too. I think the

  • It occurs to me that we can date clay that has been baked (say, when the building the tablet was in burned down) by the hydrological and OSL methods. Any text not used in the training set but not also dated by these newer direct methods (not everyone knows about them and not all archaeologists that do have them available) could be tested to verify that the age the AI is predicting is within the error bars of these direct measurements.

    I'm sure they're doing this anyway, but most of the articles I've seen fix

  • ai weaknesses (Score:5, Insightful)

    by antus ( 6211764 ) on Friday March 11, 2022 @06:34AM (#62346935)
    We need to be careful with AI technology. I have noticed recently that in break ins with security cam footage, people are running AI over the low quality images to generate a clearer image of what was there. Only its not what was there. Its a representation that looks good by human standards based on previously learnt samples. Cant see the eye? Well here's an eye that looks about right for the picture. Your almost building a deep fake, that looks believable. If the real assailant had any notable features, that would likely not be restored. Push the AI too far and they might have other peoples eyes. Fine for art, not fine for evidence or learning. The same goes for this text. Any script that is rebuilt which has any new and different information worth while of study is not likely to be rebuilt by the AI as it has not learnt that pattern before. Instead you'll get a generic representation of texts that are about the same as other texts we already know about. So we must ask, how much value is there that? Perhaps there is some value, but we need to keep the above in mind before we generate a world of repetitive boring information then use it to conclude we know it all - even if we don't.
    • The research paper even writes in it's abstract, [nature.com]

      While Ithaca alone achieves 62% accuracy when restoring damaged texts, the use of Ithaca by historians improved their accuracy from 25% to 72%, confirming the synergistic effect of this research tool.

      This is only a tool to help humans and we know it.

    • That's the same thing that people use when trying to reconstruct ancient texts on their own. So it is no worse (and, if you read the article, considerably better) than people on their own.

  • An additional point (Score:4, Interesting)

    by jd ( 1658 ) <imipak@yahoGINSBERGo.com minus poet> on Friday March 11, 2022 @06:50AM (#62346969) Homepage Journal

    The numerous tablets destroyed by ISIS or stolen by unknowns from the Baghdad Museum were neither recorded nor transcribed. True, they weren't written in Greek, but the AI will eventually be extended to other languages.

    And because academia has been starved of any meaningful funds for many decades, especially in the field of archaeology, we've lost a significant fraction of what early recorded history had survived into the 21st history.

    Although many ancient records were saved in Timbuktu from the terrorist destruction of the Ahmed Baba Institute, they are in poor condition as a result of the inability to keep them in suitable conditions (which is understandable), 2,000 of those records are believed to have been destroyed.

    From what I understand, not all of these manuscripts have been studied and nobody seems to know what they all are. However, the collection dates back to the time when we know manuscripts from the Imperial Library of Constantinople were still circulating in that area. (When first built, the Imperial Library had copies of all the texts in the Great Library of Alexandria, which is how the Archimedes text survived.)

    The surviving texts have now been scanned and put online, but if historians and archaeologists had been better funded, that would have happened BEFORE 2,000 of the texts were burned up.

    There will be many, many situations around the world where ancient documents are being lost or placed at grievous risk because people naively assume anything they don't understand is automatically unimportant, irrelevant, or evil.

    This does rather limit what we can understand about the past and it prevents things like this new tool from shining any new light on how civilizations came to be as they are.

    Ignorance is always an expensive hobby.

  • The Packard Humanities Institute was founded by David Packard, Jr., son of HP co-founder David Packard. David Jr. has a PhD. in classical studies and was an early leader in using computers to analyze ancient texts.
    -

    (Side note: There's an old jest that there are more unknown papyri lost in the archives of Europe's libraries and museums than remain buried in the sands of Egypt.)

    --
    .nosig

  • AI restoring texts just means "let's assume this one is the same as the ones I already saw." It's just plausible hallucination.

Intel CPUs are not defective, they just act that way. -- Henry Spencer

Working...