Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×

Comment Experiment is the source of scientific truth (Score 1) 257

We have an overwhelming set of experiments and observations about momentum being conserved. But if someone can show a replicable exception someday, we'll have a huge overhaul to do.

My questions about this would include whether the inventor has given enough information for other people to build one. Then I'd look around to see if it's, by some unknown mechanism, reacting against something in the environment.

Comment Re:Welcome to the machine (Score 1) 259

The Chinese beg to differ with Ms. Thatcher. I would contrast their performance to Great Britain's.

Let's do that: GDP per capita of the UK: $46k. GDP per capita of China: $12k. But maybe China does better at distributing the wealth? Nope. UK Gini coefficient: 35, China, 47 (higher is more inequality). Until recently China had phenomenal growth rates, but that's only because (a) they started from a very depressed level and (b) they mostly abandoned socialism. As Xi is reasserting more socialist policies their growth engine has largely stopped and their growth rate is currently below that of the UK. It's still positive at the moment, but if Xi continues what he's doing, it will likely go negative.

Socialism -- not social democracy, which is a thoroughly capitalist economy that accepts high but strongly progressive taxation to fund a strong safety net -- consistently drives economies into the toilet whenever applied on any scale larger than a kibbutz. Without fail, every time.

Comment Re:AI is just Wikipedia (Score 1) 25

I've probably done tens of thousands of legit, constructive edits, but even I couldn't resist the temptation to prank it at one point. The article was on the sugar apple (Annona squamosa), and at the time, there was a big long list of the name of the fruit in different languages. I wrote that in Icelandic, the fruit was called "Hva[TH]er[TH]etta" (eth and thorn don't work on Slashdot), which means "What's that?", as in, "I've never seen that fruit before in my life" ;) Though the list disappeared from Wikipedia many years ago (as it shouldn't have been there in the first place), even to this day, I find tons of pages listing that in all seriousness as the Icelandic name for the fruit.

Comment Nonsense (Score 1) 25

The author has no clue what they're talking about:

Meta said the 15 trillion tokens on which its trained came from "publicly available sources." Which sources? Meta told The Verge that it didn't include Meta user data, but didn't give much more in the way of specifics. It did mention that it includes AI-generated data, or synthetic data: "we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3." There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it's liable to spit out a more concentrated version of any garbage it is ingesting.

1) *Quality classifiers* are not themselves training data. Think of it as a second program that you run on your training data before training your model, to look over the data and decide how useful it looks and thus how much to emphasize it in the training, or whether or not to just omit it.

2): Synthetic training data *very much* can be helpful, in a number of different ways.

A) It can diversify existing data. E.g., instead of just a sentence "I was on vacation in Morocco and I got some hummus", maybe you generate different versions of the same sentence ("I was traveling in Rome and ordered some pasta" ,"I went on a trip to Germany and had some sausage", etc), to deemphasize the specifics (Morocco, hummus, etc) and focus on the generalization. One example can turn into millions, thus rendering rote memorization during training impossible.

B) It allows for programmatic filtration stages. Let's say that you're training a model to extract quotes from text. You task a LLM with creating training examples for your quote-extracting LLM (synthetic data). But you don't just blindly trust the outputs - first you do a text match to see if what it quoted is actually in the text and whether it's word-for-word right. Maybe you do a fuzzy match, and if it just got a word or two off, you correct it to the exact match, or whatnot. But the key is: you can postprocess the outputs to do sanity checks on it, and since those programmatic steps are deterministic, you can guarantee that the training data meets certain characteristics..

C) It allows for the discovery of further interrelationships. Indeed, this is a key thing that we as humans do - learning from things we've already learned by thinking about them iteratively. If a model learned "The blue whale is a mammal" and it learned "All mammals feed their young with milk", a synthetic generation might include "Blue whales are mammals, and like all mammals, feed their young with milk" . The new model now directly learns that blue whales feed their young with milk, and might chain new deductions off *that*.

D) It's not only synthetic data that can contain errors, but non-synthetic data as well. The internet is awash in wrong things; a random thing on the internet is competing with a model that's been trained on reems of data and has high quality / authoritative data boosted and garbage filtered out. "Things being wrong in the training data" in the training data is normal, expected, and fine, so long as the overall picture is accurate. If there's 1000 training samples that say that Mars is the fourth planet from the sun, and one that says says that the fourth planet from the sun is Joseph Stalin, it's not going to decide that the fourth planet is Stalin - it's going to answer "Mars".

Indeed, the most common examples I see of "AI being wrong" that people share virally on the internet are actually RAG (Retrieval Augmented Generation), where it's tasked with basically googling things and then summing up the results - and the "wrong content" is actually things that humans wrote on the internet.

That's not that you should rely only generated data when building a generalist model (it's fine for a specialist). There may be specific details that the generating model never learned, or got wrong, or new information that's been discovered since then; you always want an influx of fresh data.

3): You don't just randomly guess whether a given training methodology (such as synthetic data, which I'll reiterate, Meta did not say that they used - although they might have) is having a negative impact. Models are assessed with a whole slew of evaluation metrics to assess how good and accurately they respond to different queries. And LLaMA 3 scores superbly, relative to model size.

I'm not super-excited about LLaMA 3 simply because I hate the license - but there's zero disputing that it's an impressive series of models.

Comment Re: Cue all the people acting shocked about this.. (Score 1) 41

Under your (directly contradicting their words) theory, then creative endeavour on the front end SHOULD count If the person writes a veritable short-story as the prompt, then that SHOULD count. It does not. Because according to the copyright office, while user controls the general theme, they do not control the specific details.

"Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output."

if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare's style.[29] But the technology will decide the rhyming pattern, the words in each line, and the structure of the text

It is the fact that the user does not control the specific details, only the overall concept, that (according to them) that makes it uncopyrightable.

Comment Re:This has been known for ages (Score 1) 145

Press the power button 5 times rapidly to enable "emergency mode" or whatever they call it. Biometric unlock will be disabled and you will have to enter your password/PIN to access the device again.

I don't think this is true. If you enable emergency mode video recording you have to enter your PIN to end the recording, but biometrics will still unlock the lockscreen. While the recording is going, hit the power button to activate the lockscreen, which will be unlockable with biometrics. You can also swipe up from the bottom (assuming gesture navigation) and switch to other apps. The device is not locked and not in lockdown mode while in emergency mode.

What you can do is press power and volume up to bring up the power menu, and then tap the "Lockdown" icon. That will lock the device and disable biometric authentication.

If you really, really want to lock it down, power the device down, or reboot it and don't log in. Android's disk encryption scheme uses your PIN/pattern/password ("lockscreen knowledge factor", or LSKF) along with keys stored in secure hardware to derive the disk encryption keys. It would make for a long post to go into all of the details, but given the hardware-enforced brute force mitigation,if the attacker gets a device in this state it's extremely difficult to decrypt any of the credential-encrypted data on the device without your LSKF. This is particularly true on devices that implement "StrongBox" (all Pixels, some Samsungs, some others). Android StrongBox moves some crucial functionality, including LSKF authentication and LSKF brute force resistance, into a separate hardened, lab-certified[*] security processor with its own internal storage, a "secure element".

Of course, note that appellate courts in the US have split on whether or not your LSKF can be compelled. Some have ruled that unless the PIN/pattern/password is itself incriminating, it's no different than compelling the combination to a safe, which has long been held to be constitutional.

[*] For anyone interested in the details, the required certification is Common Criteria EAL 4+ (5+ is recommended, and common, many devices meet 6+), using protection profile 0084 for the hardware and equivalent "high attack potential" evaluation for the software, plus AVA_VAN.5 penetration testing, all performed in a nationally-accredited security testing lab. While certification isn't a guarantee of security (nothing is), the required certification applies the highest level of scrutiny you can get for commercially-available devices. Apple also uses a similarly-certified SE in their devices, but it's not clear whether they use it for LSKF authentication, or whether they use their (uncertified) Secure Enclave.

Comment Re:Who on SLASHDOT is using biometric data for con (Score 1) 145

Must be quite entertaining to watch you unlock your phone hundreds of times a day.

JFC...why in the world would you need to be accessing your phone "hundreds of times a day"???

Maybe not hundreds, but at least dozens. For most people their phone is their digital assistant in all sorts of ways... not only for communication for for calendaring, looking up random things, reading the news or books, listening to music, getting directions, checking their bank account/brokerage, doing calculations, fitness tracking, managing shopping and to-do lists... the list goes on and on.

Comment Re: Cue all the people acting shocked about this.. (Score 1) 41

Based on the Office's understanding of the generative AI technologies currently available, users do not exercise ultimate creative control over how such systems interpret prompts and generate material. Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output.[28] For example, if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare's style.[29] But the technology will decide the rhyming pattern, the words in each line, and the structure of the text.[30]

Compare with my summary:

" their argument was that because the person doesn't control the exact details of the composition of the work"

I'll repeat: I accurately summed up their argument. You did not.

Slashdot Top Deals

Without life, Biology itself would be impossible.

Working...