Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×

Comment Nonsense (Score 1) 24

The author has no clue what they're talking about:

Meta said the 15 trillion tokens on which its trained came from "publicly available sources." Which sources? Meta told The Verge that it didn't include Meta user data, but didn't give much more in the way of specifics. It did mention that it includes AI-generated data, or synthetic data: "we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3." There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it's liable to spit out a more concentrated version of any garbage it is ingesting.

1) *Quality classifiers* are not themselves training data. Think of it as a second program that you run on your training data before training your model, to look over the data and decide how useful it looks and thus how much to emphasize it in the training, or whether or not to just omit it.

2): Synthetic training data *very much* can be helpful, in a number of different ways.

A) It can diversify existing data. E.g., instead of just a sentence "I was on vacation in Morocco and I got some hummus", maybe you generate different versions of the same sentence ("I was traveling in Rome and ordered some pasta" ,"I went on a trip to Germany and had some sausage", etc), to deemphasize the specifics (Morocco, hummus, etc) and focus on the generalization. One example can turn into millions, thus rendering rote memorization during training impossible.

B) It allows for programmatic filtration stages. Let's say that you're training a model to extract quotes from text. You task a LLM with creating training examples for your quote-extracting LLM (synthetic data). But you don't just blindly trust the outputs - first you do a text match to see if what it quoted is actually in the text and whether it's word-for-word right. Maybe you do a fuzzy match, and if it just got a word or two off, you correct it to the exact match, or whatnot. But the key is: you can postprocess the outputs to do sanity checks on it, and since those programmatic steps are deterministic, you can guarantee that the training data meets certain characteristics..

C) It allows for the discovery of further interrelationships. Indeed, this is a key thing that we as humans do - learning from things we've already learned by thinking about them iteratively. If a model learned "The blue whale is a mammal" and it learned "All mammals feed their young with milk", a synthetic generation might include "Blue whales are mammals, and like all mammals, feed their young with milk" . The new model now directly learns that blue whales feed their young with milk, and might chain new deductions off *that*.

D) It's not only synthetic data that can contain errors, but non-synthetic data as well. The internet is awash in wrong things; a random thing on the internet is competing with a model that's been trained on reems of data and has high quality / authoritative data boosted and garbage filtered out. "Things being wrong in the training data" in the training data is normal, expected, and fine, so long as the overall picture is accurate. If there's 1000 training samples that say that Mars is the fourth planet from the sun, and one that says says that the fourth planet from the sun is Joseph Stalin, it's not going to decide that the fourth planet is Stalin - it's going to answer "Mars".

Indeed, the most common examples I see of "AI being wrong" that people share virally on the internet are actually RAG (Retrieval Augmented Generation), where it's tasked with basically googling things and then summing up the results - and the "wrong content" is actually things that humans wrote on the internet.

That's not that you should rely only generated data when building a generalist model (it's fine for a specialist). There may be specific details that the generating model never learned, or got wrong, or new information that's been discovered since then; you always want an influx of fresh data.

3): You don't just randomly guess whether a given training methodology (such as synthetic data, which I'll reiterate, Meta did not say that they used - although they might have) is having a negative impact. Models are assessed with a whole slew of evaluation metrics to assess how good and accurately they respond to different queries. And LLaMA 3 scores superbly, relative to model size.

I'm not super-excited about LLaMA 3 simply because I hate the license - but there's zero disputing that it's an impressive series of models.

Comment Re:Golly (Score 1) 64

New Orleans is 6 feet below sea level, as we found out 20 years ago.

Err....we've knows NOLA was below sea level for many MANY years....decades....a century or more even.....

What we found out with Katrina was...that the US Army Corps of Engineers had made some serious mistakes and short cuts in the levee system they had built and overrated.

This isn't something new to man....go ask those nice folks in the Netherlands that fight off the ocean like we do here....they know a thing of two about living below sea level and fighting off the ocean.

Comment Re:How fast are they sinking? (Score 1) 64

I understand from working geologists in soil districts and the like that actions can be taken on a building by building basis, but can you apply that to an area?

Or, you do like areas like the Netherlands or New Orleans, and you build yourself a big ass levee system to keep the ocean at bay (no pun intended).

Overall, in a battle between nature and man....nature will eventually win.

But, man CAN hold out for a long time...I mean New Orleans just recently had it's 300th anniversary, you know?

Comment Re:Who you are; Something you know (Score 1) 133

I guess it's different in the US where the cops immediately point a gun at you. So again, it depends on your threat model.

That's not the case at all...unless you are something like a robbery suspect, they don't pull guns on your...geez.

As long as you don't act a fool....and are being stopped for being a violent crime suspect, you aren't going to have any problems.

If in a car, just keep your hands on the wheel where they can see them...be polite...and don't talk too much, only answer what you need to....giving drivers license, registration proof of insurance on demand...etc.

Don't incriminate yourself, know your rights, ask if you are being detained, if now, ask if you are free to leave....

Hell, I've been pulled over and since I have a carry concealed permit, told the cop I had a gun and where it was...they were cool about it, wrote me a warning and sent me on my way...didn't disarm me or have me leave the car or anything....

So, no...normal people acting normal do not get guns drawn on them.

Now, with all that being said, if you are stopped or pulled over...and immediately start trying to grab and fuck with your phone (which they may not be able to see is a phone) could be recognized as suspicious....I'd not really do that...and hence better to just keep a long enough passcode/password on the phone in case things escalate and they take your phone for evidence.

Comment Re:Welcome to the machine (Score 2, Interesting) 218

Without profit driven, you end up with what LA has: 5 Billion dollars missing in various "help the homeless" scan non profits.

Or three bankrupt casinos ("The money I took out of there was incredible."), failing golf courses, a failed airline, a failed "university", and other businesses which never turned a profit. It's almost as if the point was not to generate a profit, but scam people out of their money.

Comment Re:The Good News (Score 1) 126

The only reason I still have a windows machine is for the exceedingly rare game that doesn't work on proton/steam, and more importantly, Fusion 360, although browser-based OnShape apparently is pretty good if you have a computer/GPU that can make it run smoothly (they both use the same licensed "kernel" that almost all CAD software uses)

Comment Re: Cue all the people acting shocked about this.. (Score 1) 41

Under your (directly contradicting their words) theory, then creative endeavour on the front end SHOULD count If the person writes a veritable short-story as the prompt, then that SHOULD count. It does not. Because according to the copyright office, while user controls the general theme, they do not control the specific details.

"Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output."

if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare's style.[29] But the technology will decide the rhyming pattern, the words in each line, and the structure of the text

It is the fact that the user does not control the specific details, only the overall concept, that (according to them) that makes it uncopyrightable.

Submission + - IMF sounds alarm on ballooning US national debt: 'Something will have to give

schwit1 writes:

Under current policies, public debt in the U.S. is projected to nearly double by 2053. The IMF identified “large fiscal slippages” in the U.S. in 2023, with government spending surpassing revenue by 8.8% of GDP – a 4.1% increase from the previous year, despite strong economic growth.

If this trend continues, the Congressional Budget Office anticipates the national debt will grow to an astonishing $54 trillion in the next decade. Higher interest rates are also compounding the pain of higher debt.

Should that debt materialize, it could risk America’s economic standing in the world.

The IMF is talking down to Washington like we’re a Third World country because that’s the direction Washington is taking us.

Interest payments alone on the current debt is $1.6T/year.

Comment Is there ANY precedent for this? (Score 4, Interesting) 77

Today it is the executives, tomorrow it will be the workers. /s

1. Has ANY company tried this and it saved them?
2. How many TOTAL hours per week is this?? I highly doubt this will be 6 2/3 hours x 6 days but the article doesn't say. What happened to the 6 hour workdays ?

Switching to 8 hours x 6 days isn't going to fixing the fundamental problem. Someone in management needs to read "From Good to Great", "Built to Last" among other management books. IMHO Samsung needs to:

* look at their core business,
* look at their entire supply chain costs,
* pivot their (core?) business where it makes sense.

Working even more hours is a "Hail Mary" pass pretty much guaranteed to fail causing more burnout as home-life balance is nonexistent. Don't be surprised if Samsung is out of business in 10 years. I'll miss their NVMe drives.

Comment Re:Microsoft already know you as a user (Score 3, Interesting) 126

You are absolutely right 99% of people are too complacent to switch.

However, you are forgetting that over time Linux users AND open source users are growing. i.e. For me 7-Zip killed the commercial file archivers (pkzip, winrar, etc.)

More and more software works on Windows, macOS, and Linux. Switching from proprietary vendor lock-in to open source alternatives (where it makes sense) is how to get people to switch. OpenOffice / LibreOffice, Blender, Krita already provide great alternatives.

The harder MS shoves their agenda down everyone's throats the easier it is to finally come across "the straw that broke the camel's back." Maybe not today, maybe not tomorrow, but maybe next year, or in 5 years, or in 10. Cue "THIS year is the year of the Linux desktop". /s It starts with us Techies who are tired of supporting MS's adware crap. Eventually we just don't care about MS. I'm already running Linux under a dedicated spare box and under a VM in my daily driver. In time we'll make the switch permanent.

For me, games have been the biggest reason I haven't made my daily driver Linux but with the new games coming out there is less and less "need" to stay on Windows. The more and more they add an in-game MTX store to games the less interested I am in them.

Valve is also doing a great job of having more and more games work under Linux. Support them when you can.

The best way to "proselytize" Linux is NOT to say anything but just to use it. Start small: 1 application here, another application there. Suggest open source alternative at work. So who cares if MS wins this battle (Win10) or that battle (win11); eventually they are going to lose the war as more and more people get fed up with SaaS and switch to open-source alternatives. From there is easier to eventually switch to Linux. The best way to "win the war" is 1 application at a time. Time is on our side. Linux already "won" on the Supercomputer and Mobile (Android) space. Desktop is next.

Slashdot Top Deals

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...