DeepMind AI Tool Helps Historians Restore Ancient Texts (theregister.com) 15
AI software can help historians interpret and date ancient texts by reconstructing works destroyed over time, according to a new paper published in Nature. The Register reports: A team of computer scientists and experts in classical studies led by DeepMind and Ca' Foscari University of Venice trained a transformer-based neural network to restore inscriptions written in ancient Greek between 7th century BC and 5th century AD. The model, named "Ithaca" after the home of legendary Greek king Odysseus, can also estimate when the text was written and where it might have originated. By recovering fragments of text on broken pieces of pottery or blurry scripts, for example, researchers can begin translating them and learn more about ancient civilizations. [...] Why ancient Greek? The researchers said the variable content and available context in the Greek epigraphic record made it an "excellent challenge" for language processing, plus the large body of (digitized) written texts that is currently available -- essential for training the model.
First, the text needs to be transcribed by scanning an image of an old object or script. The text is then fed into Ithaca for analysis. It works by predicting lost or blurry characters to restore words as outputs. The software generates and ranks a list of its top predictions; epigraphists can then scroll through them and judge whether the model's guesses seem accurate or not. The best results are reached when human and machine work together. When experts worked alone, they were 25 per cent accurate at piecing together ancient artefacts, but when they collaborated with Ithaca the accuracy level jumped up to 72 per cent. Ithaca's performance on its own is about 62 per cent, for comparison. It's also 71 per cent at pinpointing the location of where the text was written, and can date works to within 30 years of their creation between 800BC and 800AD.
Ithaca was trained on over 63,000 Greek inscriptions containing over three million words from The Packard Humanities Institute's Searchable Greek Inscriptions public dataset. The team masked portions of the text and tasked the model with filling in the blanks. Ithaca analyses other words in a given sentence for context when generating characters. [...] DeepMind is now adjusting its model to adapt to other types of old writing systems, like Akkadian developed in Mesopotamia, Demotic from ancient Egypt, to Mayan originating from Central America and ancient Hebrew.
First, the text needs to be transcribed by scanning an image of an old object or script. The text is then fed into Ithaca for analysis. It works by predicting lost or blurry characters to restore words as outputs. The software generates and ranks a list of its top predictions; epigraphists can then scroll through them and judge whether the model's guesses seem accurate or not. The best results are reached when human and machine work together. When experts worked alone, they were 25 per cent accurate at piecing together ancient artefacts, but when they collaborated with Ithaca the accuracy level jumped up to 72 per cent. Ithaca's performance on its own is about 62 per cent, for comparison. It's also 71 per cent at pinpointing the location of where the text was written, and can date works to within 30 years of their creation between 800BC and 800AD.
Ithaca was trained on over 63,000 Greek inscriptions containing over three million words from The Packard Humanities Institute's Searchable Greek Inscriptions public dataset. The team masked portions of the text and tasked the model with filling in the blanks. Ithaca analyses other words in a given sentence for context when generating characters. [...] DeepMind is now adjusting its model to adapt to other types of old writing systems, like Akkadian developed in Mesopotamia, Demotic from ancient Egypt, to Mayan originating from Central America and ancient Hebrew.
free Brittney Griner (Score:2)
Autocorrect (Score:3)
Good news! (Score:2)
It's nothing like autocorrect on any phone. It provides probabilities of what missing text is [nature.com] based on multiple factors.
Re: (Score:2)
I think phone autocorrect algorithms are proprietary, but I'm pretty sure they use multiple factors as well, including a language model (which in some sense is what Ithaca uses, although it has a lot more memory to throw at the problem than your phone does). The phone also uses keyboard proximity (you swiped a 's', but a 'd' fits the context better, and is adjacent to the 's'). Ithaca has no such keyboard proximity, of course. It could, but apparently does not, use partial letters (like the right-down st
Re: (Score:2)
think phone autocorrect algorithms are proprietary, but I'm pretty sure they use multiple factors as well, including a language model
Whatever they use, it wasn't based on messages that people actually send (at least initially) because of major gaffes it would make. Autocorrect also doesn't just correct the previous/current word and assumes your entire sentence my be in total error. I'm sure it's improved in the last 10 years but it started out being awful.
which in some sense is what Ithaca uses,
However, it doesn't modify any text it is given. Rather, it only predicts in the text that is omitted and fits. These are very different objectives.
It could, but apparently does not, use partial letters (like the right-down stroke of a kappa, or an arc of the circle of a phi or omicron or a couple other letters, depending on the position of the arc), which a human would use, and which a suitably trained machine learned system could use.
I noticed this too. I think the
An interesting approach (Score:2)
It occurs to me that we can date clay that has been baked (say, when the building the tablet was in burned down) by the hydrological and OSL methods. Any text not used in the training set but not also dated by these newer direct methods (not everyone knows about them and not all archaeologists that do have them available) could be tested to verify that the age the AI is predicting is within the error bars of these direct measurements.
I'm sure they're doing this anyway, but most of the articles I've seen fix
ai weaknesses (Score:5, Insightful)
They know. (Score:2)
The research paper even writes in it's abstract, [nature.com]
While Ithaca alone achieves 62% accuracy when restoring damaged texts, the use of Ithaca by historians improved their accuracy from 25% to 72%, confirming the synergistic effect of this research tool.
This is only a tool to help humans and we know it.
Re: (Score:2)
That's the same thing that people use when trying to reconstruct ancient texts on their own. So it is no worse (and, if you read the article, considerably better) than people on their own.
An additional point (Score:4, Interesting)
The numerous tablets destroyed by ISIS or stolen by unknowns from the Baghdad Museum were neither recorded nor transcribed. True, they weren't written in Greek, but the AI will eventually be extended to other languages.
And because academia has been starved of any meaningful funds for many decades, especially in the field of archaeology, we've lost a significant fraction of what early recorded history had survived into the 21st history.
Although many ancient records were saved in Timbuktu from the terrorist destruction of the Ahmed Baba Institute, they are in poor condition as a result of the inability to keep them in suitable conditions (which is understandable), 2,000 of those records are believed to have been destroyed.
From what I understand, not all of these manuscripts have been studied and nobody seems to know what they all are. However, the collection dates back to the time when we know manuscripts from the Imperial Library of Constantinople were still circulating in that area. (When first built, the Imperial Library had copies of all the texts in the Great Library of Alexandria, which is how the Archimedes text survived.)
The surviving texts have now been scanned and put online, but if historians and archaeologists had been better funded, that would have happened BEFORE 2,000 of the texts were burned up.
There will be many, many situations around the world where ancient documents are being lost or placed at grievous risk because people naively assume anything they don't understand is automatically unimportant, irrelevant, or evil.
This does rather limit what we can understand about the past and it prevents things like this new tool from shining any new light on how civilizations came to be as they are.
Ignorance is always an expensive hobby.
Packard Humanities Institute (Score:2)
-
(Side note: There's an old jest that there are more unknown papyri lost in the archives of Europe's libraries and museums than remain buried in the sands of Egypt.)
--
.nosig
Hallucination (Score:2)
AI restoring texts just means "let's assume this one is the same as the ones I already saw." It's just plausible hallucination.
Re: (Score:2)
No, it's not. Go read the article.