Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×

Comment popcorn ! (Score 2) 28

Now that is interesting news to follow.

Definitely massive use of dark patterns there. The cancel process is definitely made to just outright frustrate you, and whenever you order something from Amazon you need to take extra care not to accidentally sign up to Prime.

And good that they're going after the execs. It's time the corporate shield comes down a bit.

Comment Re:It is just a bit better google search for me... (Score 1) 176

Text is its own context. Models are trained on somewhere between several thousand and several million tokens at once. That's a bloody lot of context. And trainers can add in any additional context they want into the mix.+

And the fine tune is a lot less sophisticated than your explanation. It's simply a pile of text where metadata has been applied by the cheapest labor force available.

You're literally talking to someone who creates her own models. *facepalm*

I'll repeat: the finetune is "a curated dataset of sample user requests and appropriate answers". Typically it's a json file, not "a pile of text", with each training record containing two or more fields, depending on what format you're training to. This is not in any way, shape or form "metadata"; it's sample questions and appropriate answers, sometimes including corresponding sample chat histories. The answers are all written in a professional format with accurate scientific information, and thus the weight shifts between the foundation and instruct models taps on similar sources. We can literally see the changes in what parts of the model become enhanced and surprised between the foundation and instruct models - see Anthropic's work on this front. They're not black boxes anymore.

Some finetune data is human provided, but increasing amounts are autogenerated, with human data only filling in for any weaknesses or reviewing autogenerated data. And yes, finetune datasets absolutely have errors (people love sharing e.g. weirdness in the Alpaca dataset or whatnot), but it doesn't matter much, as you're not teaching specific questions and answers, but rather, the type of structured output you want. The actual knowledge comes overwhelmingly from the foundation.

I'll repeat: the finetune (the part where humans are significantly involved in creating the dataset) is NOT where the model learns most of its knowledge. It learns almost nothing from the finetune except "how to respond appropriately". That is the purpose of the finetune: to harness the already-captured information in a desired fashion. Knowledge on expert fields comes from the foundation having been trained on expert data - e.g. research papers, etc. The finetune just gets the model to tap into this already learned information.

If you want to see what goes into a typical finetune dataset for an assistant model, Here you go. Note that Alpaca is pretty dated and there's a lot of better stuff out there now, but it's sort of a base point of comparison for finetunes.

Comment X keeps making me lie (Score 0) 28

When I block advertisers on Twitter (which the site makes reasonably convenient) I get a popup asking me to pay for premium to block all the ads, and I can either upgrade now or "maybe later". But no, there is in fact no chance that I will subscribe later. Also, I am doing them a favor since seeing their ads on Twitter at this time creates a negative association in my mind. Not that there was any chance I would do business with them anyway, since the vast majority of them are for some kind of cryptocurrency service or web tracking.

Comment Re: On the right track. (Score 1) 58

I had Moto triplets phones too... V300 V500 V600, RAZR V3 V3i. My current phone is a Moto G Power 2021 (2nd) which they advertised as having a 3 day battery, but I have gone 5 without getting into the single digits several times. My lady has an even cheaper Tracfone made by Moto which has less dots but otherwise accomplishes all the same stuff, she almost never uses her phone except to read on.

Comment Re:This doesn't mean anything (Score 1) 176

In the immortal words of Bender Bending Rodriguez, "Ah ha ha, ha ha ha! ...Oh wait, you're serious. Let me laugh even harder! HAHAHAHAHAHAHAHAHA!!"

ChatGPT is not GPT-3. Thanks for playing. ChatGPT was indeed launched on 30 November, 2022 - nearly in 2023!. ChatGPT was built on the base of a further-trained variant of GPT-3 , but was the first to employ a finetune rather than just being a foundational model, and was thus the first you could interact with naturally and have it respond reliably and predictably in an interactive fashion.

GPT-3 (not ChatGPT) did indeed get some press (generally of the format "Uh oh, AI is getting scary good, the future is going to be crazy, who knows whether crazy-bad or crazy-good!"), but almost nobody could actually use it, and again, "using" it was awkward and unreliable, because it could only continue text, and you never knew if it was going to continue in the right direction. You may start off with something like:

Recipe: Linguini Alfredo
Ingredients:

And maybe most of the rest of the time it might continue into a recipe of some arbitrary format or another. Or it might just do something like:

Recipe: Linguine Alfredo
Ingredients:
  * One box of....

John squinted his eyes. The text on the handwritten card was blurry and no longer legible. Maybe he should just order takeout?

As he was trying to decide what to do, Susan arrived home. Her hair was disheveled and there was sweat on her brow.

"How's your decision to bike instead of driving going?" John said with a smirk.

"Oh shut up, you jerk," said Susan. She reached into her bag...

Etc.

Comment Re:Tried it a couple of times, not impressed at al (Score 2) 176

use python to split an mp3 into wav files

Show me the website that contains my specific specified task. Re-read the exact task, code that I could drop directly into a program, not "functions on the same theme as your task". And that was FYI an off-the-cuff really trivial task I just made up for the purpose of this post.

I could not find 'Dunce' in Google.

That's what's otherwise known as "a failure".

But when I did your exact same search in ChatGPT it gave me 'boycott' named after Captain Charles Boycott.

ChatGPT provided him as well, providing the caveat that:

Although not directly related to being stupid, "boycott" is named after Charles Boycott, a 19th-century English land agent. His tenants in Ireland ostracized him as part of a campaign against unfair rents. While "boycott" itself doesn't imply stupidity, Charles Boycott's handling of the situation was seen as inept and counterproductive.

It also offered Quisling (also caveated that it's only provided for foolish beytrayal and poor judgement) and Malapropism (after the fictional Mrs. Malaprop), but as the most correct answer it put Duns Scotus on the top.

Talk like a pirate.. why would you want to do that?

Why would the day even exist? Believe it or not, people actually like to have fun on the internet. Also, look up the definition of an example in the dictionary.

GA3.. I googled on it and found all kinds of stuff.

Thank you for demonstrating my point with your "all kinds of stuff" not from a specified report for a specified context within that report, even where there's no exact keyword match (e.g. referring to it as "gibberellic acid" instead of GA3, referring to it as "it" or "the hormone" or any other phrase, using various words to refer to "germination", the word "germination" being listed far from the context where GA3 is listed but the two are still being referred to in the same context, etc etc).

Comment Re:It is just a bit better google search for me... (Score 2) 176

Yes, factually, LLMs give you the consensus of the input,

This is simply not true.

First off, learning happens overwhelmingly in creating the foundation, not the finetunes. The foundations are unsupervised continuation models. The model learns to predict what it's going to see next in the text. This is entirely context-dependent. In the context of whether vaccines cause autism, if the context is "some rando's post on Twitter", it might be just as likely to think that they cause autism as that they don't, while in the context of an abstract for a journal paper, it would almost universally be against the hypothesis.

From there, you make a finetune (supervised learning). This involves creating a curated dataset of sample user requests and appropriate answers, and training on that, so that the model learns the question / answer format and how to behave - but the knowledge it draws on was learned during the unsupervised foundational training phase. As a general rule, the sort of answers presented in the finetune will be scientific and professional in nature. As a result, the finetune draws from this context in its responses. Now, you certainly COULD finetune with examples that sound like responses from randos on Twitter, and you'd get a model that would behave as such. But why would you?

Comment Re:Tried it a couple of times, not impressed at al (Score 2) 176

Just as a random example, I'll scroll through my ChatGPT history.

Me: "using ffmpeg, encode all pngs in 00035/ to 00035.mp4 at 4fps"

ChatGPT: returned "ffmpeg -framerate 4 -i 00035/%05d.png -c:v libx264 -r 30 -pix_fmt yuv420p 00035.mp4", along with a description

Me: "How to see metadata of a png"

ChatGPT: returned five categories, some with multiple options, and made sure to address the ambiguity of my request by providing e.g. command-line tools, GUI tools, python packages, etc.

Me: "Write a detailed summary of Tolstoy's personal attitudes / philosophies expressed in War and Peace, with respect to the Russian empire."

ChatGPT: Did a great job.

Me: Several variants of "Summarize all of Tolstoy's works into a single really brief, snarky sentence. No more than 15 words. Just list the summary, not the author."

Various ChatGPT answers: "Russian society sucks; seek meaning in simple life.", "Rich Russians ponder life, love, and existential angst, sans brevity or optimism.", "Rich people whine, peasants suffer nobly, and everyone's miserable but profound.", etc.

Me: "In the style of Rishi Sunak, briefly announce a parody policy declaring that they'll make up to any North Ireland youth mandatory-constripted into the military by giving them a stylish black-and-tan uniform. Keep it to under 340 characters." (note my typo - ChatGPT doesn't care, so long as it can understand the request)

ChatGPT: "In a bold new initiative, we are announcing a mandatory conscription for Northern Ireland’s youth. To honor their service, every conscript will receive a stylish black-and-tan uniform. This policy ensures our youth will be both fashion-forward and ready to serve. Together, we’ll march into a future that’s as stylish as it is secure."

(Note: I know ChatGPT might mess up character counts, as it can't see letters, and that's fine, because it's no issue for me to correct if it's too long, just like it's far easier to fix bugs in generated code (where there are any, usually there aren't) than to write it from scratch)

Me: "Term for a distortion, often found in old images, where the corners of the image are darker than the centre" (I forgot the word, it was on the tip of my tongue)

ChatGPT: "The term you're looking for is "vignetting." Vignetting is..."

Me: "for all .jpg files $f in the current directory, use imagemagick to convert them to output in the filename "1920/$f", while downscaling them to a width of 1920."

ChatGPT: Does it.

-- ... and on and on. Yes, in some (though not all) of these cases one could Google, but it'd be significantly slower. Or I could implement things myself, but again, it'd be significantly slower. It's not like it's HARD to use e.g. ffmpeg, imagemagick, etc, but one has to remember or google the flags, think about the command structure, etc, and it's just faster to write out a short description of what you want to do. Seconds instead of minutes.

Comment Re:Tried it a couple of times, not impressed at al (Score 4, Insightful) 176

ChatGPT is not a search engine. It's a task engine. It's funny how people invariably try to test them out with are the things they're inherently worst at: obscure trivia, math, and problems involve physically seeing words (LLMs don't see words, they see tokens).

AI is the field of solving problems that are easy for humans but traditionally hard for computers. If you want a problem that's "easy for computers" (looking up things in a database, doing math, counting characters, etc), AI is the worst way to handle that, except when the AI just functions as a manager for other tools (e.g. RAG (retrieval-augmented generation, aka incorporating search), running a calculator or programming a program to solve a task, etc). AI tools exist to fill in the gaps around things that computers normally *can't* do, like resolving ambiguity, applying logic, understanding context and motivations, applying creativity, handling novel situations, using tools, recognizing information across modalities (vision, sound, etc), and so forth.

When LLM development first started, nobody was expecting them to be usable as a search engine at all; this was an emergent capability, the discovery that they contained vast amounts of knowledge and could incorporate that into their responses. But they don't know an entire search engine's worth of knowledge. Neither does any human, for that matter.

Re: hallucination - while it's improved significantly in commercially-available models, the biggest improvements are in research models. You can practically eliminate using varying techniques of running the model multiple times in varied conditions and then either - via processes on the output or more internal methods (such as cosine similarity metrics on the hidden states) measure the consistency of the responses. This is, however, slow. I fully expect that what we're going to move to is MoEs with cosine similarity metrics for each token, for each layer, between all executed expert models, fed back (matrix mult, add, norm) into the next layer, so that the model itself can learn how to appropriately react to the situation where its different expert models disagree (e.g.. low confidence).

The rate of advancement has been pretty staggering, and there's no signs that it's going to slow down.

The simple fact is that for many people, these tools even in their current state are extremely valuable. If you're a human being, I cannot understand how you can function in the world without understanding and adapting to the concept of fallibility. Because, TL/DR: for many tasks, failure absolutely *is* an option, or can't even be properly measured (e.g. creativity), while on others, that's what cross-referencing or applying your brain is for (again, you do this in your daily life in interaction with other fallible unreliable humans), and it's worth it for the capabilities LLMs bring to the picture (see paragraph #2).

I can't search on Google for, say, "I'm hungry for a vegetarian dinner and I'd like to use up some rice, green onions, cucumber and potatoes that I have, and I'd really prefer something that takes under 30 minutes to prepare; give me 15 ideas, and list the ingredients in metric, and oh, if it calls for butter, substitute olive oil." and immediately get back 15 ideas, and it works even if I misspell catastropically or whatnot.

I can't search on Google for "Were there any words for "being stupid" named after an actual person?" and get back Duns Scotus (Dunce) among others. (If you Google you might find it like 10 pages down in a non-top-ranked Ycombinator comments section)

I can't search on Google for, "Write a python function that will take an mp3 filename, load the file, split it up into three equal parts, and save them as part1.wav, part2.wav, and part3.wav", but I absolutely can have ChatGPT do that.

I can't search on Google for, "Here's the abstract to a paper I just wrote, but it's Talk Like a Pirate Day, so rephrase it in pirate talk for me."

I can't search on Google for, "In this long report, extract for me all information related to GA3 impacts on germination." and get that.

On and on and on. They're for tasks. Wherein you're either fine with (like humans) imperfect reliability, or (like when dealing with humans) you can test / double check the outputs and/or apply your own logic and knowledge. The fact that they also are getting increasingly good at trivia (or can be used with RAG) is an entirely separate issue.

Comment Re:The Human Target. (Score 2) 176

You read it backwards. 18-24 year olds are the ones *bucking the trend*, aka using AI a lot. It's Gen X and boomers who are most mad at it.

"But the study, from the Reuters Institute and Oxford University, says young people are bucking the trend, with 18 to 24-year-olds the most eager adopters of the tech."

Averaging across all six countries, 56% of 18–24s say they have used ChatGPT at least once, compared to 16% of those aged 55 and over.

To be more specific: 56% at least once or twice, 39% at least monthly, 27% at least weekly, 9% daily.

By contrast with boomers: 16% at least once or twice, 6% at least monthly, 4% at least weekly, 1% daily.

Vs. boomers, zoomers are 3,5x more likely to have tried it, 6,5x more likely to use it at least monthly, 6,8x at least weekly, and 9x daily. It's a massive difference.

Furthermore, calling these numbers "low" seems absurd when you multiply them by the global population. Maybe they sound low on a percentage basis, but that translates to e.g. 100M monthly daily users and 13M daily users for ChatGPT.

Slashdot Top Deals

Computers are useless. They can only give you answers. -- Pablo Picasso

Working...