How AI is Unlocking Ancient Texts (nature.com) 52
AI is unlocking ancient texts previously thought unreadable, potentially revolutionizing historical research, according to a Nature article. Neural networks have successfully decoded burned Roman scrolls from Herculaneum, deciphered ancient Chinese oracle bones, and translated vast Korean royal archives.
In a breakthrough achievement, researchers used AI to reveal 16 columns of Greek philosophical text from a charred Herculaneum scroll that had been unreadable for 2,000 years. The technology could help scholars access hundreds more unopened scrolls from Herculaneum and other historical collections worldwide.
In a breakthrough achievement, researchers used AI to reveal 16 columns of Greek philosophical text from a charred Herculaneum scroll that had been unreadable for 2,000 years. The technology could help scholars access hundreds more unopened scrolls from Herculaneum and other historical collections worldwide.
Why do they trust the results? (Score:5, Interesting)
The one thing everyone knows about current neural net technology is the tendency to "hallucinate".
If people couldn't read what was there, why do they think the AI got it right?
Re: (Score:1)
if they can verify it, then we didn't really need AI to read it at all, did we?
Re: (Score:3)
That's not how that how that works. The experts sees what the AI produces, and says, "Huh, I never thought of that; lemme check that against what we do know to see if it is consistent, or more importantly, if we could have arrived at the same result if we only had the information AI provided."
It's not like these folks accept what AI produces and then doesn't double check the result or submit it for peer review.
Re:Why do they trust the results? (Score:5, Informative)
To be more specific, the TimeSformer-based tool used for decyphering the Herculaneum scrolls isn't even making text; it's making images, pixel at a time. It wasn't trained on text; it was trained on CT images for the presence and absence of ink. They took the few examples that they have where they know where the ink was and where it wasn't, broke it up into little chunks (like 1/64th the size of a letter), and trained the model on these little chunks, from the MRI data, to detect whether the fibres in that location likely had ink or not when they burned.
If it was just making things up, it would be gibberish. Not even letters. Let alone linguistically perfect for the place and time of its finding.
These sorts of "BuT iTs JuSt HaLlUcInAtIoN!" from internet geniuses who think they're so much smarter than everyone else and can't be bothered to read articles gets tiring.
Re: Why do they trust the results? (Score:3)
Re:Why do they trust the results? (Score:5, Insightful)
if they can verify it, then we didn't really need AI to read it at all, did we?
Verification is not the same as creation.
You can verify the factors of a 200-digit composite number in a microsecond, but finding those factors may take longer than the lifetime of the Universe.
Verifying a decoded message is easy because the resulting plaintext is grammatically and semantically valid. But finding the key is far harder.
Reading a charred paper is very similar to cryptanalysis.
Re: (Score:1)
No, that's not the reason. Because no such verification is done.
Re: (Score:1)
Ai is a statistical model. As such, it comes up with a very good guess as to the meaning by interpolating between the not-understood parts. However, the "guess as to the meaning" involves cultural understanding, which involves analysis risk, and the results will end up being reported as being accurate as opposed to being a guess (good or not).
For example, if someone in UK says "There's Ian on the dog for you", a Brit will know that the word "dog" means "phone" (Dog and bone: phone). Without that cultural un
Re: (Score:3)
if someone in UK says "There's Ian on the dog for you", a Brit will know that the word "dog" means "phone"
Speak for yourself mate.
Re: (Score:1)
For example, if someone in UK says "There's Ian on the dog for you", a Brit will know that the word "dog" means "phone" (Dog and bone: phone). Without that cultural understanding, the phrase is likely to be misunderstood.
God save us all from Cockney Rhyming Slang. That crap means whatever the hell the speaker thinks it means and whatever he teaches other people it means. Here's an example.
What the hell do you think I mean by saying this? I'm not British either, but I can make crap up.
Orange is sinister.
It means: Starmer is the prime minister.
So HTF does it mean that? Farmers grow oranges. Farmer rhymes with Starmer. The John Lennon song "Give Peace A Chance" mentions "....minsters and siniste
Re: (Score:1)
Orange is sinister.
Yeah, I can see it coming alright, I bet next FA in line for publication title is: "Ancient texts confirm orange man bad"!
Ancient texts meanings will be adjusted to whatever fits the narrative and we will now have ancient texts on top of "science" (pseudo-science) to push any convenient narrative.
If you disagree with me, you disagree with "science" and "ancient texts". We have been aware of what I am saying for thousands of years!
Re:Why do they trust the results? (Score:4, Informative)
You're thinking of Markov Chains.
And yes, technically everything in the universe can be described by "statistics". But that does not give a sense of how Transformers actually *works*. If you wouldn't call the workings of a CPU "statistics", stop calling Transformers "statistics". The only** random selection comes in deciding - since the final latent state doesn't exist exactly on a token boundary (Transformers is nonlinguistic), which of the nearest tokens to use.
** I say "the only", but there is another non-deterministic aspect, which is libraries like xformers, which are widely used for accelerated performance, but are nondeterministic in that the floating point ops don't always yield the exact same results. But in that regard, human brains are *highly* nondeterministic by comparison. Thankfully, these sorts of weighting processes are extremely forgiving of noise.
To try to sum up, in as few words as possible, how Transformers works:
* A pinch in a neural network creates a bottleneck; if trained to reproduce the input content on the output side across the bottleneck, this requires the creation of a dense representation that captures only the key essence of the input, with the input side forming an encoder, the output side forming a decoder, and the bottleneck being a latent representation.
* Latents have interesting properties, in that you can do math on concepts. Cosine distances show how closely two latent concepts relate to each other. You can additively combine concepts ("king - man + woman ~= queen"), and indeed, "direction" forms concepts, not just positions. And since latent spaces are typically so vastly high dimensional (hundreds or thousands of dimensions), the number of distinct directions for encoding concepts becomes basically unlimited.
* The attention mechanism allows the network to selectively decide which latents from elsewhere in the previous layer to merge into the current latent, and with what weightings / transformation.
* The FFNs are DNNs. In a DNN, every neuron, and every group of neurons, can be thought of as a detector-generator. Each neuron subdivides its input space by a fuzzy hyperplane, and in effect answers a (superposition of) question(s) formed by its input space with an answer between "yes", "no", and "sort of". Each layer builds off the answers of the previous layer, in effect asking more complex questions and getting more complex answers. It's easiest just to see it [distill.pub].
* In Transformers, the FFNs *detect concepts* in the latent they're given as inputs, and *encode the consequences of those concepts* into the output latent. For example, if you asked a LLM in what movie did Tom Hanks act alongside a volleyball, that information comes from a FFN: it detects the encoded concepts for Tom Hanks, a volleyball, acting alongside, etc, and encodes the concept for "Castaway".
* This does not happen in one step, but rather, across dozens or even hundreds of Transformers layer, each layer being the pairing of an attention block and a FFN, and said FFNs in turn having many layers. Each only does a small portion of the given task, but encodes useful information for the next layer to complete its portion of the task.
What they are not: Markov chains. They don't even meet the Markov criterea. Markov chains are not an attempt to model the underlying functionality of a process, but just their results. And this is *very visibly* distinct from mechanisms that model underlying processes. It becomes impossible to extend Markov chains out past a few to a dozen or two observations in most systems, because the amount of statistical data you'd need becomes both impossible to acquire and impossible to store. It's why, for example, Markov Chain sentence completion tends to ramble and talk itself in circles, while Transformers does not.
To go further than these short contexts, you have to actually have an *unde
Re: (Score:2)
That was an awful lot of words to explain Transformers and not *once* did you mention that the Autobots wage their battle to destroy the evil forces of the Decepticons.
Re: (Score:2)
the verification of this type of tool is easy because of some known science that has been discover.
history has shown us, that once we discover something, we get a good laugh about it and say "duh, I should have thought of that sooner"
I explain in another answer above this one, how they most likely get the correct image of the inside of the scroll.
Re: (Score:3)
Re: (Score:2)
It's not a tendency, it always hallucinates. It's just called a hallucination when you don't like the results.
The entire term is an excuse, the fact is that AI works the way it works and its "designers" use creative language to suggest that there is merely an occasional problem that will get resolved. No, AI makes shit up always, and sometimes it's particularly bad.
Re: (Score:2)
This is the proper question to ask because it's the fundamental way of the scientific method.
So let's start with some basics on your question, and develop some trust in the answer using the burned scrolls, I have no interest in the outcome, I just study a lot so I can ask better questions.
1) so via radiography we can estimate and or determine the x line of the scroll with y being the depth, and different atomic or chemical signatures
2) via tangent space of a Riemannian manifold ( I think I said that right,
Check with Translated Texts (Score:3)
You do
Re: (Score:2)
Why? A question both obvious and loaded.
Why is parent assuming they "trust the results"? Even though he has heard of AI hallucination, he seems to stupidly assume the experts have not.
Classic case of Dunning Kruger.
Re: (Score:3)
In the case of the burned scrolls it is being used to make the text legible from xrays. The output is going to quickly be determined if it makes sense or not because we can read ancient greek and latin.
Re:Why do they trust the results? (Score:4, Funny)
They test on artificially created 'lost' texts (Score:3)
In tests, Ithaca restored artificially produced gaps in ancient texts with 62% accuracy, compared with 25% for human experts. But experts aided by Ithaca’s suggestions had the best results of all, filling gaps with an accuracy of 72%. Ithaca also identified the geographical origins of inscriptions with 71% accuracy, and dated them to within 30 years of accepted estimates.
So for 2000 years, people could only guess well as to what about 25% of these missing portions would have meant. With modern statistic ("AI") they can produce good guesses 47% better (for total of accurate 71% of the time) or.. just run the system with no human guidance at all and accept a "miss rate" that is 9% worse than when expert supervision is there.
Re: (Score:3)
Because you seem to have a misunderstanding about AI and the various ways it's used. They're not taking this stuff and plugging it into Chat GPT. They have researcher developers who build custom neural network software to analyze their data. The reason the pop LLMs hallucinate is because they're fed massive amounts of random language or visual data and tasked with generating a response that matches the highest probability of a match. So if that highest probability is 60%, well that's the response you wi
Re: (Score:2)
LLM's hallucinate - this is a neural Net, the other much more reliable and successful AI, the one that is likely to actually be the future of AI
They also occasionally hallucinate, but only about the same amount as experts in the highly specific field they net was trained in
Re: (Score:2)
Exactly. Any "reasonable" result would be considered correct, and AI's only strength is producing output that looks reasonable.
I work with a bunch of tool makers. They have this saying, "nobody calibrates their calipers until it gives you an answer you don't like." If it tells you the part you made is within tolerance, you think, "great". If it says it's out of tolerance, you think, "that can't be right," and you go check it on a calibration block.
Religion will have a field day with this (Score:1)
AI vs. Voynich Manuscript? (Score:2)
So far, a whole lotta nuttin'.
Re: (Score:2)
AI = magic stones in a hat (Score:2)
Re: (Score:2)
We don't, and that's never mattered before. Translations are always subjective, and correct is defined by whoever wins an argument.
Re:AI = magic stones in a hat (Score:4, Insightful)
Jewish version [jewishvoice.org].
Christian version [christianity.com] saying Luke was the one to indicate Mary was a virgin.
Christian version number 2 [theconversation.com] saying Matthew was the only one to say Mary was pregnant before she had sex with Joseph.
And then there's the whole homosexual issue which didn't arise until 1946 when someone decided to change the original meaning [imgur.com] of what was (supposedly) written.
Re: (Score:2)
Re: (Score:2)
I would say the Bible can only go wrong, because any translation is still produced with the intent of being used in religious observance, and will therefore be influenced by the views of the organization promoting the translation. It will be hard for anyone doing the job to extract themselves from their own religious background. Here it's the scroll telling the adventures of Bigus Dickus in Herculanum and that won't bring as much bias.
Re: (Score:2)
And then there's the whole homosexual issue which didn't arise until 1946 when someone decided to change the original meaning [imgur.com] of what was (supposedly) written.
You kind of detract from your point here. Yes, any time you translate from one language or idiom to another, there is going to be information loss or gain in the output. A 100% translation from one language to another, keeping the meaning, connotation, nuance, etc., the exact same, is very rare. For Biblical and other religious translators, there's a huge debate over word-to-hear literal translations vs a more literary translation vs a more meaning-thematic translation. There's not really a single correct
Re: (Score:2)
That first link is *not* to a Jewish site. It does *not* represent a Jewish perspective on the meaning of (the ancient Hebrew word under discussion). That site is run by "Believers in Jesus committed to showing His love and sharing the life-changing message of the Messiah with Jewish people". None of that is a Jewish thing to do.
An actual Jewish perspective can be found here: https://outreachjudaism.org/al... [outreachjudaism.org]
Although my personal Jewish perspective is: Christians can believe these texts mean different thing
Re: AI = magic stones in a hat (Score:2)
While a discussion may be had about the nuances of translating that particular word, I would not use it to show the stupidity of Bible
Re:AI = magic stones in a hat (Score:5, Interesting)
Before answering, I confess I cheated and actually read TFA.
"In tests with artificially produced gaps, the model’s top ten predictions included the correct answer 72% of the time, and in real-world cases it often matched the suggestions of human specialists. To improve the results further, Papavassileiou hopes to add in visual data, such as traces of incomplete letters, rather than just relying on the transliterated text. She is also investigating ‘transfer learning’, in which the model applies lessons learnt from one series of tablets to another."
So the real answer is simply they do not trust the results, not yet. But progress is being made, and the the AI is better than human guesses already.
Re: (Score:2)
If scanned images / 3d reconstructions are available, scientists in different countries and in the coming decades can run different algorithms and compare the results, until we converge to a consensus on which letters are indeed present on the manuscript, and which were extrapolated carelessly by the algorithm and should be excluded.
you can't trust AI (Score:2)
>> how can we trust the Algorithm
"Trust me Bro"
What could go wrong.
Real answer: you can't.
The point of the subject... (Score:3)
Re: (Score:2)
Re: (Score:1)
Be sure (Score:2)