penguinoid - Slashdot User

Comment Re:So? (Score 1) 77

by penguinoid on Sunday April 21, 2024 @01:41PM (#64412296) Attached to: Some Astronomers Will Re-Examine a 102-Year-Old Theory About the Universe's Expansion

We already replaced Pythagorean's Theorem, for use in non-Euclidean geometry.

https://en.wikipedia.org/wiki/...

Comment Re:I see nothing wrong with subscriptions (Score 1) 108

by penguinoid on Sunday April 21, 2024 @01:27PM (#64412274) Attached to: EU: Meta Cannot Rely On 'Pay Or Okay'

"Ad-based service without ads" is different from "Customer-based service". Will not the premium customers still get the ad-focused "maximize engagement" algorithm?

Comment Re: 20% survival is pretty good (Score 1) 57

by hey! on Sunday April 21, 2024 @01:21PM (#64412262) Attached to: World's Coral Reefs Hit By a Fourth Mass Bleaching Event, NOAA Says

If I understand your argument properly, you're suggesting that things will be OK with the reefs because "survival of the fittest" will produce a population of corals better adapted to warmer conditions.

Let me first point out is that this isn't really an argument, it's a hypothesis. In fact this is the very question that actual *reef scientists* are raising -- the ability of reefs to survive as an ecosystem under survival pressure. There's no reason to believe reefs will surivive just because fitter organisms will *tend* to reproduce more, populations perish all the time. When it's a keystone species in an ecosystem, that ecosystem collapses. There is no invisible hand here steering things to any preordained conclusion.

So arguing over terminology here is really just an attempt to distract (name calling even more so) from your weak position on whether reefs will survive or not.

However, returning to that irrelevant terminology argument, you are undoubtedly making an evolutionary argument. You may be thinking that natural selection won't produce a new taxonomic *species* for thousands of generations, and you'd be right. However it will produce a new *clade*. When a better-adapted clade emerges due to survival pressures, that is evolution by natural selection. Whether we call that new clade a "species" is purely a human convention adopted and managed to facilitate scientific communication.

You don't have to take my word for any of this. Put it to any working biologist you know.

Comment Re:The limits of science (Score 3, Insightful) 77

by Black Parrot on Sunday April 21, 2024 @08:28AM (#64411804) Attached to: Some Astronomers Will Re-Examine a 102-Year-Old Theory About the Universe's Expansion

Certain topics do not lend themselves very well to the scientific method.

It's kind of hard to set up 100 universes, say, and run them through a few billion years. You can't do the experiment part.

Sometimes a hypothesis has potentially observable implications, even if a mad scientist can't reproduce everything in their lab.

Comment Out of date, all right. (Score 2) 77

by Black Parrot on Sunday April 21, 2024 @04:26AM (#64411666) Attached to: Some Astronomers Will Re-Examine a 102-Year-Old Theory About the Universe's Expansion

I think it has been decades since cosmologists believed the universe is expanding at a constant rate.

Comment Uhm... (Score 1) 258

by Black Parrot on Saturday April 20, 2024 @05:31AM (#64409878) Attached to: NASA Veteran Behind Propellantless Propulsion Drive Announces Major Discovery

IANAPhysicist, but isn't a thrust of 1g specific to the mass you are accelerating? Same device pushing heavier mass gives less acceleration?

Is a claim that you can create a thrust of 1g even meaningful without additional details?

Comment Why exactly... (Score 4, Insightful) 258

by Rei on Saturday April 20, 2024 @04:40AM (#64409836) Attached to: NASA Veteran Behind Propellantless Propulsion Drive Announces Major Discovery

... are we putting blatant pseudoscience on Slashdot? Are we becoming Natural News? Ir InfoWars maybe?

Comment Re:AI is just Wikipedia (Score 1) 25

by Rei on Friday April 19, 2024 @07:27PM (#64409284) Attached to: Meta's Not Telling Where It Got Its AI Training Data

I've probably done tens of thousands of legit, constructive edits, but even I couldn't resist the temptation to prank it at one point. The article was on the sugar apple (Annona squamosa), and at the time, there was a big long list of the name of the fruit in different languages. I wrote that in Icelandic, the fruit was called "Hva[TH]er[TH]etta" (eth and thorn don't work on Slashdot), which means "What's that?", as in, "I've never seen that fruit before in my life" ;) Though the list disappeared from Wikipedia many years ago (as it shouldn't have been there in the first place), even to this day, I find tons of pages listing that in all seriousness as the Icelandic name for the fruit.

Comment Re: spokesweasel (Score 1) 53

by penguinoid on Friday April 19, 2024 @05:49PM (#64409108) Attached to: Apple Removes WhatsApp, Threads and Telegram From China App Store

Nope, my argument is that both companies acted in their best interest.

Comment Re:spokesweasel (Score 1) 53

by penguinoid on Friday April 19, 2024 @03:39PM (#64408712) Attached to: Apple Removes WhatsApp, Threads and Telegram From China App Store

The business case for Google vs Apple leaving China are vastly different:
1) Apple sells hardware which people pay big money for, abandoning their users is a bad look. Google's thing is internet search, which is useless when censored, and available anyways via proxy.
2) Removing a few apps is different than the highly detailed censorship and snooping that would be asked of Google.
3) There's tons of alternate search engines just a click away, Google could vanish near instantly if they are perceived as inferior.
4) Can almost guarantee that Google would have had to do infinite snooping, censorship, and propaganda promotion before being replaced anyways.

Comment Nonsense (Score 1) 25

by Rei on Friday April 19, 2024 @02:11PM (#64408430) Attached to: Meta's Not Telling Where It Got Its AI Training Data

The author has no clue what they're talking about:

Meta said the 15 trillion tokens on which its trained came from "publicly available sources." Which sources? Meta told The Verge that it didn't include Meta user data, but didn't give much more in the way of specifics. It did mention that it includes AI-generated data, or synthetic data: "we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3." There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it's liable to spit out a more concentrated version of any garbage it is ingesting.

1) *Quality classifiers* are not themselves training data. Think of it as a second program that you run on your training data before training your model, to look over the data and decide how useful it looks and thus how much to emphasize it in the training, or whether or not to just omit it.

2): Synthetic training data *very much* can be helpful, in a number of different ways.

A) It can diversify existing data. E.g., instead of just a sentence "I was on vacation in Morocco and I got some hummus", maybe you generate different versions of the same sentence ("I was traveling in Rome and ordered some pasta" ,"I went on a trip to Germany and had some sausage", etc), to deemphasize the specifics (Morocco, hummus, etc) and focus on the generalization. One example can turn into millions, thus rendering rote memorization during training impossible.

B) It allows for programmatic filtration stages. Let's say that you're training a model to extract quotes from text. You task a LLM with creating training examples for your quote-extracting LLM (synthetic data). But you don't just blindly trust the outputs - first you do a text match to see if what it quoted is actually in the text and whether it's word-for-word right. Maybe you do a fuzzy match, and if it just got a word or two off, you correct it to the exact match, or whatnot. But the key is: you can postprocess the outputs to do sanity checks on it, and since those programmatic steps are deterministic, you can guarantee that the training data meets certain characteristics..

C) It allows for the discovery of further interrelationships. Indeed, this is a key thing that we as humans do - learning from things we've already learned by thinking about them iteratively. If a model learned "The blue whale is a mammal" and it learned "All mammals feed their young with milk", a synthetic generation might include "Blue whales are mammals, and like all mammals, feed their young with milk" . The new model now directly learns that blue whales feed their young with milk, and might chain new deductions off *that*.

D) It's not only synthetic data that can contain errors, but non-synthetic data as well. The internet is awash in wrong things; a random thing on the internet is competing with a model that's been trained on reems of data and has high quality / authoritative data boosted and garbage filtered out. "Things being wrong in the training data" in the training data is normal, expected, and fine, so long as the overall picture is accurate. If there's 1000 training samples that say that Mars is the fourth planet from the sun, and one that says says that the fourth planet from the sun is Joseph Stalin, it's not going to decide that the fourth planet is Stalin - it's going to answer "Mars".

Indeed, the most common examples I see of "AI being wrong" that people share virally on the internet are actually RAG (Retrieval Augmented Generation), where it's tasked with basically googling things and then summing up the results - and the "wrong content" is actually things that humans wrote on the internet.

That's not that you should rely only generated data when building a generalist model (it's fine for a specialist). There may be specific details that the generating model never learned, or got wrong, or new information that's been discovered since then; you always want an influx of fresh data.

3): You don't just randomly guess whether a given training methodology (such as synthetic data, which I'll reiterate, Meta did not say that they used - although they might have) is having a negative impact. Models are assessed with a whole slew of evaluation metrics to assess how good and accurately they respond to different queries. And LLaMA 3 scores superbly, relative to model size.

I'm not super-excited about LLaMA 3 simply because I hate the license - but there's zero disputing that it's an impressive series of models.

Comment Re: Cue all the people acting shocked about this.. (Score 1) 41

by Rei on Friday April 19, 2024 @12:28PM (#64408010) Attached to: Author Granted Copyright Over Book With AI-Generated Text - With a Twist

Under your (directly contradicting their words) theory, then creative endeavour on the front end SHOULD count If the person writes a veritable short-story as the prompt, then that SHOULD count. It does not. Because according to the copyright office, while user controls the general theme, they do not control the specific details.

"Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output."

if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare's style.[29] But the technology will decide the rhyming pattern, the words in each line, and the structure of the text

It is the fact that the user does not control the specific details, only the overall concept, that (according to them) that makes it uncopyrightable.

Comment Re: Makes sense (Score 1) 85

by penguinoid on Friday April 19, 2024 @12:21AM (#64406762) Attached to: Reddit Is Taking Over Google

There you go, as you say the majority of the posts aren't spam. I imagine well over 1% of your google search results for dating will be ripoffs or even worse.

Comment Makes sense (Score 4, Informative) 85

by penguinoid on Thursday April 18, 2024 @11:38PM (#64406708) Attached to: Reddit Is Taking Over Google

However shitty reddit may be, at least it's real humans talking about things humans care about. Way better than the spammers that have bamboozled google with SEO, which is about to be worsened by AI.

Comment Re:This is old stuff! (Score 2) 146

by penguinoid on Thursday April 18, 2024 @11:33PM (#64406700) Attached to: Cops Can Force Suspect To Unlock Phone With Thumbprint, US Court Rules

The 5th Amendment isn't about public vs private stuff. It's because at the time it was common to torture people until they confess. Passwords are an interesting case because they can't be a false confession; but confessing that you know the password is confessing that you have access to the account, but the stuff protected by the password is physical evidence and not a confession. There's been cases of people being compelled to share their password after admitting they know it. And biometrics are physical evidence, not a confession.

Slashdot Top Deals