How AI is Unlocking Ancient Texts (nature.com) 52

Posted by msmash on Wednesday January 01, 2025 @07:00PM from the rewriting-history dept.

AI is unlocking ancient texts previously thought unreadable, potentially revolutionizing historical research, according to a Nature article. Neural networks have successfully decoded burned Roman scrolls from Herculaneum, deciphered ancient Chinese oracle bones, and translated vast Korean royal archives.

In a breakthrough achievement, researchers used AI to reveal 16 columns of Greek philosophical text from a charred Herculaneum scroll that had been unreadable for 2,000 years. The technology could help scholars access hundreds more unopened scrolls from Herculaneum and other historical collections worldwide.

This discussion has been archived. No new comments can be posted.

How AI is Unlocking Ancient Texts

Load All Comments

Search 52 Comments Log In/Create an Account

Comments Filter:

Why do they trust the results? (Score:5, Interesting)

by Anonymous Coward writes: on Wednesday January 01, 2025 @07:05PM (#65055855)

The one thing everyone knows about current neural net technology is the tendency to "hallucinate".
If people couldn't read what was there, why do they think the AI got it right?

Share
twitter facebook
- - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    if they can verify it, then we didn't really need AI to read it at all, did we?
    - Re: (Score:3)
      
      by xevioso ( 598654 ) writes:
      
      That's not how that how that works. The experts sees what the AI produces, and says, "Huh, I never thought of that; lemme check that against what we do know to see if it is consistent, or more importantly, if we could have arrived at the same result if we only had the information AI provided."
      It's not like these folks accept what AI produces and then doesn't double check the result or submit it for peer review.
      - Re:Why do they trust the results? (Score:5, Informative)
        
        by Rei ( 128717 ) writes: on Thursday January 02, 2025 @11:01AM (#65056957) Homepage
        
        To be more specific, the TimeSformer-based tool used for decyphering the Herculaneum scrolls isn't even making text; it's making images, pixel at a time. It wasn't trained on text; it was trained on CT images for the presence and absence of ink. They took the few examples that they have where they know where the ink was and where it wasn't, broke it up into little chunks (like 1/64th the size of a letter), and trained the model on these little chunks, from the MRI data, to detect whether the fibres in that location likely had ink or not when they burned.
        If it was just making things up, it would be gibberish. Not even letters. Let alone linguistically perfect for the place and time of its finding.
        These sorts of "BuT iTs JuSt HaLlUcInAtIoN!" from internet geniuses who think they're so much smarter than everyone else and can't be bothered to read articles gets tiring.
        
        Parent Share
        twitter facebook
    - Re: Why do they trust the results? (Score:3)
      
      by ZERO1ZERO ( 948669 ) writes:
      
      Dont confuse verification with calculation. Its entirely possible to verify something without knowing how to achieve the answer. Take the most trivial example of 72-16 =56 you can verify the answer by adding 56 to 16 and if you get 72 then its probly right. You donâ(TM)t need to know got to subtract to verify a subtraction arithmetic.
    - Re:Why do they trust the results? (Score:5, Insightful)
      
      by ShanghaiBill ( 739463 ) writes: on Wednesday January 01, 2025 @11:43PM (#65056307)
      
      if they can verify it, then we didn't really need AI to read it at all, did we?
      Verification is not the same as creation.
      You can verify the factors of a 200-digit composite number in a microsecond, but finding those factors may take longer than the lifetime of the Universe.
      Verifying a decoded message is easy because the resulting plaintext is grammatically and semantically valid. But finding the key is far harder.
      Reading a charred paper is very similar to cryptanalysis.
      
      Parent Share
      twitter facebook
  - Re: (Score:1)
    
    by dfghjk ( 711126 ) writes:
    
    No, that's not the reason. Because no such verification is done.
    - Re: (Score:1)
      
      by bagofbeans ( 567926 ) writes:
      
      Ai is a statistical model. As such, it comes up with a very good guess as to the meaning by interpolating between the not-understood parts. However, the "guess as to the meaning" involves cultural understanding, which involves analysis risk, and the results will end up being reported as being accurate as opposed to being a guess (good or not).
      For example, if someone in UK says "There's Ian on the dog for you", a Brit will know that the word "dog" means "phone" (Dog and bone: phone). Without that cultural un
      - Re: (Score:3)
        
        by newcastlejon ( 1483695 ) writes:
        
        if someone in UK says "There's Ian on the dog for you", a Brit will know that the word "dog" means "phone"
        Speak for yourself mate.
      - Re: (Score:1)
        
        by Zontar_Thing_From_Ve ( 949321 ) writes:
        
        For example, if someone in UK says "There's Ian on the dog for you", a Brit will know that the word "dog" means "phone" (Dog and bone: phone). Without that cultural understanding, the phrase is likely to be misunderstood.
        God save us all from Cockney Rhyming Slang. That crap means whatever the hell the speaker thinks it means and whatever he teaches other people it means. Here's an example.
        
        What the hell do you think I mean by saying this? I'm not British either, but I can make crap up.
        
        Orange is sinister.
        
        It means: Starmer is the prime minister.
        
        So HTF does it mean that? Farmers grow oranges. Farmer rhymes with Starmer. The John Lennon song "Give Peace A Chance" mentions "....minsters and siniste
        
        Re: (Score:1)
        
        by Anonymous Coward writes:
        
        Orange is sinister.
        
        Yeah, I can see it coming alright, I bet next FA in line for publication title is: "Ancient texts confirm orange man bad"!
        Ancient texts meanings will be adjusted to whatever fits the narrative and we will now have ancient texts on top of "science" (pseudo-science) to push any convenient narrative.
        If you disagree with me, you disagree with "science" and "ancient texts". We have been aware of what I am saying for thousands of years!
      - Re:Why do they trust the results? (Score:4, Informative)
        
        by Rei ( 128717 ) writes: on Thursday January 02, 2025 @08:29AM (#65056649) Homepage
        
        Ai is a statistical model
        You're thinking of Markov Chains.
        And yes, technically everything in the universe can be described by "statistics". But that does not give a sense of how Transformers actually *works*. If you wouldn't call the workings of a CPU "statistics", stop calling Transformers "statistics". The only** random selection comes in deciding - since the final latent state doesn't exist exactly on a token boundary (Transformers is nonlinguistic), which of the nearest tokens to use.
        ** I say "the only", but there is another non-deterministic aspect, which is libraries like xformers, which are widely used for accelerated performance, but are nondeterministic in that the floating point ops don't always yield the exact same results. But in that regard, human brains are *highly* nondeterministic by comparison. Thankfully, these sorts of weighting processes are extremely forgiving of noise.
        To try to sum up, in as few words as possible, how Transformers works:
        * A pinch in a neural network creates a bottleneck; if trained to reproduce the input content on the output side across the bottleneck, this requires the creation of a dense representation that captures only the key essence of the input, with the input side forming an encoder, the output side forming a decoder, and the bottleneck being a latent representation.
        * Latents have interesting properties, in that you can do math on concepts. Cosine distances show how closely two latent concepts relate to each other. You can additively combine concepts ("king - man + woman ~= queen"), and indeed, "direction" forms concepts, not just positions. And since latent spaces are typically so vastly high dimensional (hundreds or thousands of dimensions), the number of distinct directions for encoding concepts becomes basically unlimited.
        * The attention mechanism allows the network to selectively decide which latents from elsewhere in the previous layer to merge into the current latent, and with what weightings / transformation.
        * The FFNs are DNNs. In a DNN, every neuron, and every group of neurons, can be thought of as a detector-generator. Each neuron subdivides its input space by a fuzzy hyperplane, and in effect answers a (superposition of) question(s) formed by its input space with an answer between "yes", "no", and "sort of". Each layer builds off the answers of the previous layer, in effect asking more complex questions and getting more complex answers. It's easiest just to see it [distill.pub].
        * In Transformers, the FFNs *detect concepts* in the latent they're given as inputs, and *encode the consequences of those concepts* into the output latent. For example, if you asked a LLM in what movie did Tom Hanks act alongside a volleyball, that information comes from a FFN: it detects the encoded concepts for Tom Hanks, a volleyball, acting alongside, etc, and encodes the concept for "Castaway".
        * This does not happen in one step, but rather, across dozens or even hundreds of Transformers layer, each layer being the pairing of an attention block and a FFN, and said FFNs in turn having many layers. Each only does a small portion of the given task, but encodes useful information for the next layer to complete its portion of the task.
        What they are not: Markov chains. They don't even meet the Markov criterea. Markov chains are not an attempt to model the underlying functionality of a process, but just their results. And this is *very visibly* distinct from mechanisms that model underlying processes. It becomes impossible to extend Markov chains out past a few to a dozen or two observations in most systems, because the amount of statistical data you'd need becomes both impossible to acquire and impossible to store. It's why, for example, Markov Chain sentence completion tends to ramble and talk itself in circles, while Transformers does not.
        To go further than these short contexts, you have to actually have an *unde
        Read the rest of this comment...
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by Surak_Prime ( 160061 ) writes:
        
        That was an awful lot of words to explain Transformers and not *once* did you mention that the Autobots wage their battle to destroy the evil forces of the Decepticons.
  - Re: (Score:2)
    
    by onepoint ( 301486 ) writes:
    
    the verification of this type of tool is easy because of some known science that has been discover.
    history has shown us, that once we discover something, we get a good laugh about it and say "duh, I should have thought of that sooner"
    I explain in another answer above this one, how they most likely get the correct image of the inside of the scroll.
- Re: (Score:3)
  
  by FudRucker ( 866063 ) writes:
  
  Thats what im thinking, confirmation bias will whitewash this into enhancing the established religious delusions
- Re: (Score:2)
  
  by dfghjk ( 711126 ) writes:
  
  It's not a tendency, it always hallucinates. It's just called a hallucination when you don't like the results.
  The entire term is an excuse, the fact is that AI works the way it works and its "designers" use creative language to suggest that there is merely an occasional problem that will get resolved. No, AI makes shit up always, and sometimes it's particularly bad.
- Re: (Score:2)
  
  by onepoint ( 301486 ) writes:
  
  This is the proper question to ask because it's the fundamental way of the scientific method.
  So let's start with some basics on your question, and develop some trust in the answer using the burned scrolls, I have no interest in the outcome, I just study a lot so I can ask better questions.
  1) so via radiography we can estimate and or determine the x line of the scroll with y being the depth, and different atomic or chemical signatures
  2) via tangent space of a Riemannian manifold ( I think I said that right,
- Check with Translated Texts (Score:3)
  
  by Roger W Moore ( 538166 ) writes:
  
  They probably do the same thing that we do in particle physics with machine learning algorithms: measure the performance using some examples that you already know the answer for but which was not part of the training sample. In physics this is typically simulated data where you know what the true physics happening was but for this I'd just hold back some scrolls where humans have already translated them and then feed them into the algorithm and see whether the output matches the human translation.
  
  You do
- Re: (Score:2)
  
  by quenda ( 644621 ) writes:
  
  Why? A question both obvious and loaded.
  Why is parent assuming they "trust the results"? Even though he has heard of AI hallucination, he seems to stupidly assume the experts have not.
  Classic case of Dunning Kruger.
- Re: (Score:3)
  
  by Hoi Polloi ( 522990 ) writes:
  
  In the case of the burned scrolls it is being used to make the text legible from xrays. The output is going to quickly be determined if it makes sense or not because we can read ancient greek and latin.
  - Re:Why do they trust the results? (Score:4, Funny)
    
    by arglebargle_xiv ( 2212710 ) writes: on Thursday January 02, 2025 @07:25AM (#65056615)
    
    So far the AI interpretation seems legit, the Roman sellers description of the iPhone 16 is pretty spot on, and Marcus Cornelius' musings on the Linux kernel scheduling mechanisms also seem to pass muster.
    
    Parent Share
    twitter facebook
- They test on artificially created 'lost' texts (Score:3)
  
  by bjamesv ( 1528503 ) writes:
  
  Per TFA:
  In tests, Ithaca restored artificially produced gaps in ancient texts with 62% accuracy, compared with 25% for human experts. But experts aided by Ithaca’s suggestions had the best results of all, filling gaps with an accuracy of 72%. Ithaca also identified the geographical origins of inscriptions with 71% accuracy, and dated them to within 30 years of accepted estimates.
  So for 2000 years, people could only guess well as to what about 25% of these missing portions would have meant. With modern statistic ("AI") they can produce good guesses 47% better (for total of accurate 71% of the time) or.. just run the system with no human guidance at all and accept a "miss rate" that is 9% worse than when expert supervision is there.
- Re: (Score:3)
  
  by Berkyjay ( 1225604 ) writes:
  
  Because you seem to have a misunderstanding about AI and the various ways it's used. They're not taking this stuff and plugging it into Chat GPT. They have researcher developers who build custom neural network software to analyze their data. The reason the pop LLMs hallucinate is because they're fed massive amounts of random language or visual data and tasked with generating a response that matches the highest probability of a match. So if that highest probability is 60%, well that's the response you wi
- Re: (Score:2)
  
  by JasterBobaMereel ( 1102861 ) writes:
  
  LLM's hallucinate - this is a neural Net, the other much more reliable and successful AI, the one that is likely to actually be the future of AI
  They also occasionally hallucinate, but only about the same amount as experts in the highly specific field they net was trained in
- Re: (Score:2)
  
  by RobinH ( 124750 ) writes:
  
  Exactly. Any "reasonable" result would be considered correct, and AI's only strength is producing output that looks reasonable.
  I work with a bunch of tool makers. They have this saying, "nobody calibrates their calipers until it gives you an answer you don't like." If it tells you the part you made is within tolerance, you think, "great". If it says it's out of tolerance, you think, "that can't be right," and you go check it on a calibration block.
Religion will have a field day with this (Score:1)

by FudRucker ( 866063 ) writes:

Making up more bullshit to deceive people about their gods/myths
AI vs. Voynich Manuscript? (Score:2)

by sk999 ( 846068 ) writes:

So far, a whole lotta nuttin'.
- Re: (Score:2)
  
  by excelsior_gr ( 969383 ) writes:
  
  How about Linear A?
AI = magic stones in a hat (Score:2)

by Smonster ( 2884001 ) writes:

Okay, but how can we trust the Algorithm is actually translating/deciphering the source material correctly?
- Re: (Score:2)
  
  by dfghjk ( 711126 ) writes:
  
  We don't, and that's never mattered before. Translations are always subjective, and correct is defined by whoever wins an argument.
- Re:AI = magic stones in a hat (Score:4, Insightful)
  
  by quonset ( 4839537 ) writes: on Wednesday January 01, 2025 @08:08PM (#65056013)
  
  If you have a problem with AI doing this, don't look into human scholars translating ancient texts. Two of the three major religions can't even come up with an agreement on if Mary was a virgin because of the words used, and one of those religions contradicts itself on who said what about Mary.
  
  Jewish version [jewishvoice.org].
  
  Christian version [christianity.com] saying Luke was the one to indicate Mary was a virgin.
  
  Christian version number 2 [theconversation.com] saying Matthew was the only one to say Mary was pregnant before she had sex with Joseph.
  
  And then there's the whole homosexual issue which didn't arise until 1946 when someone decided to change the original meaning [imgur.com] of what was (supposedly) written.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by JoshuaZ ( 1134087 ) writes:
    
    Mostly a valid point. Ironically though the claim about the verse in Leviticus is wrong if one looks at the Hebrew text of that verse.
  - Re: (Score:2)
    
    by test321 ( 8891681 ) writes:
    
    I would say the Bible can only go wrong, because any translation is still produced with the intent of being used in religious observance, and will therefore be influenced by the views of the organization promoting the translation. It will be hard for anyone doing the job to extract themselves from their own religious background. Here it's the scroll telling the adventures of Bigus Dickus in Herculanum and that won't bring as much bias.
  - Re: (Score:2)
    
    by Moridineas ( 213502 ) writes:
    
    And then there's the whole homosexual issue which didn't arise until 1946 when someone decided to change the original meaning [imgur.com] of what was (supposedly) written.
    You kind of detract from your point here. Yes, any time you translate from one language or idiom to another, there is going to be information loss or gain in the output. A 100% translation from one language to another, keeping the meaning, connotation, nuance, etc., the exact same, is very rare. For Biblical and other religious translators, there's a huge debate over word-to-hear literal translations vs a more literary translation vs a more meaning-thematic translation. There's not really a single correct
  - Re: (Score:2)
    
    by shilly ( 142940 ) writes:
    
    That first link is *not* to a Jewish site. It does *not* represent a Jewish perspective on the meaning of (the ancient Hebrew word under discussion). That site is run by "Believers in Jesus committed to showing His love and sharing the life-changing message of the Messiah with Jewish people". None of that is a Jewish thing to do.
    An actual Jewish perspective can be found here: https://outreachjudaism.org/al... [outreachjudaism.org]
    Although my personal Jewish perspective is: Christians can believe these texts mean different thing
  - Re: AI = magic stones in a hat (Score:2)
    
    by kubajz ( 964091 ) writes:
    
    Correction: regarding the word for homosexuality, you are criticizing human scholars by linking to an Imgur image of a post that complains about how a Greek word was wrongly translated and how that changes the meaning of a verse from Leviticus that was actually written in Hebrew. Also, "arseno-koitai" means literally "male-bed" - the meaning has nothing to do with a child.
    While a discussion may be had about the nuances of translating that particular word, I would not use it to show the stupidity of Bible
- Re:AI = magic stones in a hat (Score:5, Interesting)
  
  by quenda ( 644621 ) writes: on Wednesday January 01, 2025 @09:11PM (#65056123)
  
  Before answering, I confess I cheated and actually read TFA.
  "In tests with artificially produced gaps, the model’s top ten predictions included the correct answer 72% of the time, and in real-world cases it often matched the suggestions of human specialists. To improve the results further, Papavassileiou hopes to add in visual data, such as traces of incomplete letters, rather than just relying on the transliterated text. She is also investigating ‘transfer learning’, in which the model applies lessons learnt from one series of tablets to another."
  So the real answer is simply they do not trust the results, not yet. But progress is being made, and the the AI is better than human guesses already.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by test321 ( 8891681 ) writes:
  
  If scanned images / 3d reconstructions are available, scientists in different countries and in the coming decades can run different algorithms and compare the results, until we converge to a consensus on which letters are indeed present on the manuscript, and which were extrapolated carelessly by the algorithm and should be excluded.
- you can't trust AI (Score:2)
  
  by stooo ( 2202012 ) writes:
  
  >> how can we trust the Algorithm
  "Trust me Bro"
  What could go wrong.
  Real answer: you can't.
The point of the subject... (Score:3)

by ndsurvivor ( 891239 ) writes: on Wednesday January 01, 2025 @09:13PM (#65056125) Journal

The thrust, or the point... I have listened to more than a few podcasts about this. The thing that the article left out is about how new scanning technology can distinguish writing at mm scale, and distinguish inks from non-inks. But, however, we feed these scrolls into scanning equipment, and get thousands of scans that look literally like "ink blots", and how do we with our puny human minds "unravel them" and make sense of them? AI is doing that, but I think the article is under-reporting how important new scanning technology is as well.

Share
twitter facebook
- Re: (Score:2)
  
  by ndsurvivor ( 891239 ) writes:
  
  https://www.npr.org/2024/02/12... [npr.org] A scroll covered by the eruption of Mount Vesuvius has been read for the first time — with the help of artificial intelligence. This is a really interesting podcast!! I think!
  - Re: (Score:1)
    
    by wakawakka ( 1424101 ) writes:
    
    You will also appreciate the story of Sigurant, the story is in french here: https://www.radiofrance.fr/fra... [radiofrance.fr] and in the following documentary here https://www.youtube.com/watch?... [youtube.com] around 1 hour 14 in, you can see the charred bit of parchment that looks like absolutely nothing, and the same new scanning techs you mentioned going right through, allowing the researcher reconstitute that lost tale (which tells a story of a lost knight and the fire of a dragon by the way, in a fun twist).
Be sure (Score:2)

by zawarski ( 1381571 ) writes:

To drink your Ovaltine.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Why do they trust the results? (Score:5, Interesting)

Re: (Score:1)

Re: (Score:3)

Re:Why do they trust the results? (Score:5, Informative)

Re: Why do they trust the results? (Score:3)

Re:Why do they trust the results? (Score:5, Insightful)

Re: (Score:1)

Re: (Score:1)

Re: (Score:3)

Re: (Score:1)

Re: (Score:1)

Re:Why do they trust the results? (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Check with Translated Texts (Score:3)

Re: (Score:2)

Re: (Score:3)

Re:Why do they trust the results? (Score:4, Funny)

They test on artificially created 'lost' texts (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Religion will have a field day with this (Score:1)

AI vs. Voynich Manuscript? (Score:2)

Re: (Score:2)

AI = magic stones in a hat (Score:2)

Re: (Score:2)

Re:AI = magic stones in a hat (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: AI = magic stones in a hat (Score:2)

Re:AI = magic stones in a hat (Score:5, Interesting)

Re: (Score:2)

you can't trust AI (Score:2)

The point of the subject... (Score:3)

Re: (Score:2)

Re: (Score:1)

Be sure (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals