Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?

Comment Re:Just bought... (Score 2, Insightful) 165

I read the first Three Body Problem novel, and I thought it was crap. Some of that might have been the translation, although I've read other translations from Chinese without that much of an issue. The plotting was terrible, the characters flat. I finished it more because I kept expecting it to eventually turn around, breaking my rule that if I don't like a book in the first three chapters, I won't finish it. In the end, I couldn't imagine why I would want to read any more of it.

Comment Re: Just bought... (Score 2, Insightful) 165

Does it have the intro "Imagine Bash, but object oriented and with function call names so long they would drive a Java developer to madness. Brought to you by the author of Microsoft Bob and Clippy, psychopaths that infect your computer with their dead-eyed smiles comes Powershell."

Comment Re:uh bro (Score 1) 165

As an owner of the complete History of Middle Earth series, these books are not for the casual fan, or probably even the average fan. They really are more designed for Tolkien scholars, and anyone picking up The Nature of Middle Earth expecting ripping yarns filled with Hobbits and wizards is going to be very disappointed.

Comment Re:power (Score 2) 70

Titan's atmosphere is rather calm; not an issue. At the surface, the winds measured by Huygens were 0,3 m/s.

You actually can use solar power in extreme environments - even Venus's surface has been shown to be compatible with certain types solar, though you certainly get very poor power density. Dragonfly, as noted above, uses an RTG.

Comment Re:Second flying drone to explore another planet (Score 3) 70

Planetary scientists frequently refer to moons that are large enough to be in hydrostatic equilibrium as planets in the literature. Examples, just from a quick search:

"Locally enhanced precipitation organized by planetary-scale waves on Titan"

"3.3. Relevance to Other Planets" (section on Titan)

"Superrotation in Planetary Atmospheres" (article covers Titan alongside three other planets)

"All planets with substantial atmospheres (e.g., Earth, Venus, Mars, and Titan) have ionospheres which expand above the exobase"

"Clouds on Titan result from the condensation of methane and ethane and, as on other planets, are primarily structured by circulation of the atmosphere"

"... of the planet. However, rather than being scarred by volcanic features, Titan's surface is largely shaped..."

"Spectrophotometry of the Jovian Planets and Titan at 300- to 1000-nm Wavelength: The Methane Spectrum" (okay, it's mainly referring to the Jovian satellites as planets, but same point)

"Superrotation indices for Solar System and extrasolar atmospheres" - contains a table whose first column is "Planet", and has Titan in the list, alongside other planets

Etc. This is not to be confused with the phrase "minor planet", which is used for asteroids, etc. In general there's a big distinction in how commonly you see the large moons in hydrostatic equilibrium referred to as "planets" and with "planetary" adjectives, vs. smaller bodies not in hydrostatic equilibrium.

Comment Re:Titan or Bust! (Score 3, Informative) 70


NASA's obsession with Mars is weird, and it consumes the lion's share of their planetary exploration budget. We know vastly more about Mars than we know of everywhere else except Earth.

This news here is bittersweet for me. I *love* Titan - it and Venus are my two favourite worlds for further exploration, and dragonfly is a superb way to explore Titan. But there's some sadness in the fact that they're launching it to an equatorial site, so we don't get to see the fascinating hydrocarbon seas and the terrain sculpted by them near the poles. I REALLY wish they were going to the north pole instead :( In theory they could eventually get there, but the craft would have to survive far beyond design limits and get a lot of mission extensions. At a max pace of travel it might cover 600 meters or so per Earth day on average. So we're talking like 12 years to get to the first small hydrocarbon lakes and ~18 years to get to Ligeia Mare or Punga Mare (a bit further to Kraken Mare), *assuming* no detours, vs. a 2 1/2 year mission design. And that ignores the fact that they'll be going slower in the start - the nominal mission is only supposed to cover 175km, just a few percent of the way, under 200 metres per day. Sigh... Maybe it'll be possible to squeeze more range out of it once they're comfortable with its performance and reliability, but... it's a LONG way to the poles.

At least if it lasts for that long it'll have done a full transition between wet and dry cycles, which should last ~15 years. So maybe surface liquids will be common at certain points, rare in others.

Comment Re:AI is just Wikipedia (Score 1) 26

I've probably done tens of thousands of legit, constructive edits, but even I couldn't resist the temptation to prank it at one point. The article was on the sugar apple (Annona squamosa), and at the time, there was a big long list of the name of the fruit in different languages. I wrote that in Icelandic, the fruit was called "Hva[TH]er[TH]etta" (eth and thorn don't work on Slashdot), which means "What's that?", as in, "I've never seen that fruit before in my life" ;) Though the list disappeared from Wikipedia many years ago (as it shouldn't have been there in the first place), even to this day, I find tons of pages listing that in all seriousness as the Icelandic name for the fruit.

Comment Nonsense (Score 1) 26

The author has no clue what they're talking about:

Meta said the 15 trillion tokens on which its trained came from "publicly available sources." Which sources? Meta told The Verge that it didn't include Meta user data, but didn't give much more in the way of specifics. It did mention that it includes AI-generated data, or synthetic data: "we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3." There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it's liable to spit out a more concentrated version of any garbage it is ingesting.

1) *Quality classifiers* are not themselves training data. Think of it as a second program that you run on your training data before training your model, to look over the data and decide how useful it looks and thus how much to emphasize it in the training, or whether or not to just omit it.

2): Synthetic training data *very much* can be helpful, in a number of different ways.

A) It can diversify existing data. E.g., instead of just a sentence "I was on vacation in Morocco and I got some hummus", maybe you generate different versions of the same sentence ("I was traveling in Rome and ordered some pasta" ,"I went on a trip to Germany and had some sausage", etc), to deemphasize the specifics (Morocco, hummus, etc) and focus on the generalization. One example can turn into millions, thus rendering rote memorization during training impossible.

B) It allows for programmatic filtration stages. Let's say that you're training a model to extract quotes from text. You task a LLM with creating training examples for your quote-extracting LLM (synthetic data). But you don't just blindly trust the outputs - first you do a text match to see if what it quoted is actually in the text and whether it's word-for-word right. Maybe you do a fuzzy match, and if it just got a word or two off, you correct it to the exact match, or whatnot. But the key is: you can postprocess the outputs to do sanity checks on it, and since those programmatic steps are deterministic, you can guarantee that the training data meets certain characteristics..

C) It allows for the discovery of further interrelationships. Indeed, this is a key thing that we as humans do - learning from things we've already learned by thinking about them iteratively. If a model learned "The blue whale is a mammal" and it learned "All mammals feed their young with milk", a synthetic generation might include "Blue whales are mammals, and like all mammals, feed their young with milk" . The new model now directly learns that blue whales feed their young with milk, and might chain new deductions off *that*.

D) It's not only synthetic data that can contain errors, but non-synthetic data as well. The internet is awash in wrong things; a random thing on the internet is competing with a model that's been trained on reems of data and has high quality / authoritative data boosted and garbage filtered out. "Things being wrong in the training data" in the training data is normal, expected, and fine, so long as the overall picture is accurate. If there's 1000 training samples that say that Mars is the fourth planet from the sun, and one that says says that the fourth planet from the sun is Joseph Stalin, it's not going to decide that the fourth planet is Stalin - it's going to answer "Mars".

Indeed, the most common examples I see of "AI being wrong" that people share virally on the internet are actually RAG (Retrieval Augmented Generation), where it's tasked with basically googling things and then summing up the results - and the "wrong content" is actually things that humans wrote on the internet.

That's not that you should rely only generated data when building a generalist model (it's fine for a specialist). There may be specific details that the generating model never learned, or got wrong, or new information that's been discovered since then; you always want an influx of fresh data.

3): You don't just randomly guess whether a given training methodology (such as synthetic data, which I'll reiterate, Meta did not say that they used - although they might have) is having a negative impact. Models are assessed with a whole slew of evaluation metrics to assess how good and accurately they respond to different queries. And LLaMA 3 scores superbly, relative to model size.

I'm not super-excited about LLaMA 3 simply because I hate the license - but there's zero disputing that it's an impressive series of models.

Comment Re: Cue all the people acting shocked about this.. (Score 1) 41

Under your (directly contradicting their words) theory, then creative endeavour on the front end SHOULD count If the person writes a veritable short-story as the prompt, then that SHOULD count. It does not. Because according to the copyright office, while user controls the general theme, they do not control the specific details.

"Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output."

if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare's style.[29] But the technology will decide the rhyming pattern, the words in each line, and the structure of the text

It is the fact that the user does not control the specific details, only the overall concept, that (according to them) that makes it uncopyrightable.

Comment Re: Cue all the people acting shocked about this.. (Score 1) 41

Based on the Office's understanding of the generative AI technologies currently available, users do not exercise ultimate creative control over how such systems interpret prompts and generate material. Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output.[28] For example, if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare's style.[29] But the technology will decide the rhyming pattern, the words in each line, and the structure of the text.[30]

Compare with my summary:

" their argument was that because the person doesn't control the exact details of the composition of the work"

I'll repeat: I accurately summed up their argument. You did not.

Comment Re:AI Incest (Score 2, Interesting) 41

Yes, "you've been told" that by people who have no clue what they're talking about. Meanwhile, models just keep getting better and better. AI images have been out for years now. There's tons on the net.

First off, old datasets don't just disappear. So the *very worst case* is that you just keep developing your new models on pre-AI datasets.

Secondly, there is human selection on things that get posted. If humans don't like the look of something, they don't post it. In many regards, an AI image is replacing what would have been a much crapper alternative choice.

Third, dataset gatherers don't just blindly use a dump of the internet. If there's a place that tends to be a source of crappy images, they'll just exclude or downrate it.

Fourth, images are scored with aesthetic gradients before they're used. That is, humans train models to assess how much they like images, and then those models look at all the images in the dataset and rate them. Once again, crappy images are excluded / downrated.

Fifth, trainers do comparative training and look at image loss rates, and an automatically exclude problematic ones. For example, if you have a thousand images labeled "watermelon" but one is actually a zebra, the zebra will have an anomalous loss spike that warrants more attention (either from humans or in an automated manner). Loss rates can also be compared between data +sources+ - whole websites or even whole datasets - and whatever is working best gets used.

Sixth, trainers also do direct blind human comparisons for evaluation.

This notion that AIs are just going to get worse and worse because of training on AI images is just ignorant. And demonstrably false.

Slashdot Top Deals

All life evolves by the differential survival of replicating entities. -- Dawkins
