Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

Comment Re:It is just a bit better google search for me... (Score 1) 268

But where do all of those embeddings go? In a vector database, right?

SIGH.

No.

And not because "embeddings" aren't stored in a database (they're not even stored, they're a transformation of YOUR input text). And not because NO part of the model is stored in a database (the whole thing is loaded into VRAM). It's because neural networks don't have some collection of facts that they just search through.

For simplicity's sake I'll leave out the attention blocks and just focus on the linear networks. How they work is: picture the most insane flow chart your mind could possibly conceive of. Billions or even trillions of nodes and lines. Many thousands of lines leading to and from each node. Each node not being yes/no, but any degree of "maybe", and different input lines having entirely different amounts of significance to the decision. And each node not answering a single question, but a superposition of questions, the individual answers of which only get teased apart later.

It's perhaps easiest to understand with image recognition nets. Here's a useful primer.

But, TL/DR: all the information is the result of stepping through this insane flow chart. There is no "iterating over some database" stage It's all in the logic. With heavy overlap between concepts. For example, when Anthropic detected and boosted the cluster of neurons whose collective action fires when the topic of the Golden Gate Bridge comes up, it didn't just boost the bridge, but also the colours of the bridge, Alcatraz Island, the Muir Woods, San Francisco, the Pacific Ocean, tule fog, on and on - everything connected to the concept.

Latent spaces are conceptual spaces. Every hidden state vector represents a point in N-dimensional space (where depends on the model, but is usually hundreds). The more related two concepts are, the closer to the same space they occupy in the vector (as measured by cosine distance). LLMs work by many layers of processing of many hidden states. For example, if you had the word "bank" (let's just pretend 1 token = 1 word, though it's not like that), it might mean a financial bank or a river bank. But if there were words related to water relating to that word,then the model would shift the position of that bank vector in the direction of the water-related vectors. Now it's mapping to some other part of the latent space that no longer maps directly to a word, but contains a much more precise "conceptual position". Atop this, the attention mechanism allows the model to focus on the specific tokens of relevance rather than the entire vector attracting everything evenly at once. This all happens again and again and again, allowing ever-more elaborate operations to chain off each other repeatedly. It's turing-complete, and fully self-assembled.

Otherwise, how would Bing's "AI" results give you links to the websites whose content it mainly used to synthesise its answer?

Bing uses RAG, Retrieval Augmented Generation. It's the combination of a LLM with a search engine. Two separate things being invoked together. The simplest forms of RAG use a very lightweight summarization model that knows nothing on its own but only knows how to summarize things from other pieces of text. More complex RAG models have large amounts of information of their own, but are also fed the results of queries (or can even invoke queries on their own), so when they process the input, they have the added external context.

The queries are not a fundamental part of the LLM. That's an external add-on. In many cases the results are literally just appended into the chat history. It has nothing to do with the inner workings of the LLM, which is a distinct and self-contained element.

My understanding is that this is what you do when you set up your own: you install a vector database like Qdrant,

You never use a database at any point in the training process, neither in generating the foundation nor in doing the finetune.

divide it into overlapping chunks to calculate embeddings for

You do divide it into overlapping chunks (if we're talking about the foundation, you don't do that with the finetune). This is what you store until you're ready to start training.For training, the first step with text is tokenization. You iterate one token at a time. The net result of the training process is that the model tries to predict the next token, and you get an error metric of how far off it was on each dimension. These errors backpropagate through the model, essentially slightly shifting all of the weighs and biases in the direction of what would have been closer to the correct answer for that token. These shifts are tiny, veritably homeopathic - ~1e-5 or so. But over time, the model incrementally gets better at predicting what the next token will be.

Databases NEVER come into the picture.
Searching NEVER comes into the picture.

Please understand this fact.

Comment Re:It is just a bit better google search for me... (Score 2) 268

Text is its own context. Models are trained on somewhere between several thousand and several million tokens at once. That's a bloody lot of context. And trainers can add in any additional context they want into the mix.+

And the fine tune is a lot less sophisticated than your explanation. It's simply a pile of text where metadata has been applied by the cheapest labor force available.

You're literally talking to someone who creates her own models. *facepalm*

I'll repeat: the finetune is "a curated dataset of sample user requests and appropriate answers". Typically it's a json file, not "a pile of text", with each training record containing two or more fields, depending on what format you're training to. This is not in any way, shape or form "metadata"; it's sample questions and appropriate answers, sometimes including corresponding sample chat histories. The answers are all written in a professional format with accurate scientific information, and thus the weight shifts between the foundation and instruct models taps on similar sources. We can literally see the changes in what parts of the model become enhanced and surprised between the foundation and instruct models - see Anthropic's work on this front. They're not black boxes anymore.

Some finetune data is human provided, but increasing amounts are autogenerated, with human data only filling in for any weaknesses or reviewing autogenerated data. And yes, finetune datasets absolutely have errors (people love sharing e.g. weirdness in the Alpaca dataset or whatnot), but it doesn't matter much, as you're not teaching specific questions and answers, but rather, the type of structured output you want. The actual knowledge comes overwhelmingly from the foundation.

I'll repeat: the finetune (the part where humans are significantly involved in creating the dataset) is NOT where the model learns most of its knowledge. It learns almost nothing from the finetune except "how to respond appropriately". That is the purpose of the finetune: to harness the already-captured information in a desired fashion. Knowledge on expert fields comes from the foundation having been trained on expert data - e.g. research papers, etc. The finetune just gets the model to tap into this already learned information.

If you want to see what goes into a typical finetune dataset for an assistant model, Here you go. Note that Alpaca is pretty dated and there's a lot of better stuff out there now, but it's sort of a base point of comparison for finetunes.

Comment Re:This doesn't mean anything (Score 1) 268

In the immortal words of Bender Bending Rodriguez, "Ah ha ha, ha ha ha! ...Oh wait, you're serious. Let me laugh even harder! HAHAHAHAHAHAHAHAHA!!"

ChatGPT is not GPT-3. Thanks for playing. ChatGPT was indeed launched on 30 November, 2022 - nearly in 2023!. ChatGPT was built on the base of a further-trained variant of GPT-3 , but was the first to employ a finetune rather than just being a foundational model, and was thus the first you could interact with naturally and have it respond reliably and predictably in an interactive fashion.

GPT-3 (not ChatGPT) did indeed get some press (generally of the format "Uh oh, AI is getting scary good, the future is going to be crazy, who knows whether crazy-bad or crazy-good!"), but almost nobody could actually use it, and again, "using" it was awkward and unreliable, because it could only continue text, and you never knew if it was going to continue in the right direction. You may start off with something like:

Recipe: Linguini Alfredo
Ingredients:

And maybe most of the rest of the time it might continue into a recipe of some arbitrary format or another. Or it might just do something like:

Recipe: Linguine Alfredo
Ingredients:
  * One box of....

John squinted his eyes. The text on the handwritten card was blurry and no longer legible. Maybe he should just order takeout?

As he was trying to decide what to do, Susan arrived home. Her hair was disheveled and there was sweat on her brow.

"How's your decision to bike instead of driving going?" John said with a smirk.

"Oh shut up, you jerk," said Susan. She reached into her bag...

Etc.

Comment Re:Tried it a couple of times, not impressed at al (Score 2) 268

use python to split an mp3 into wav files

Show me the website that contains my specific specified task. Re-read the exact task, code that I could drop directly into a program, not "functions on the same theme as your task". And that was FYI an off-the-cuff really trivial task I just made up for the purpose of this post.

I could not find 'Dunce' in Google.

That's what's otherwise known as "a failure".

But when I did your exact same search in ChatGPT it gave me 'boycott' named after Captain Charles Boycott.

ChatGPT provided him as well, providing the caveat that:

Although not directly related to being stupid, "boycott" is named after Charles Boycott, a 19th-century English land agent. His tenants in Ireland ostracized him as part of a campaign against unfair rents. While "boycott" itself doesn't imply stupidity, Charles Boycott's handling of the situation was seen as inept and counterproductive.

It also offered Quisling (also caveated that it's only provided for foolish beytrayal and poor judgement) and Malapropism (after the fictional Mrs. Malaprop), but as the most correct answer it put Duns Scotus on the top.

Talk like a pirate.. why would you want to do that?

Why would the day even exist? Believe it or not, people actually like to have fun on the internet. Also, look up the definition of an example in the dictionary.

GA3.. I googled on it and found all kinds of stuff.

Thank you for demonstrating my point with your "all kinds of stuff" not from a specified report for a specified context within that report, even where there's no exact keyword match (e.g. referring to it as "gibberellic acid" instead of GA3, referring to it as "it" or "the hormone" or any other phrase, using various words to refer to "germination", the word "germination" being listed far from the context where GA3 is listed but the two are still being referred to in the same context, etc etc).

Comment Re:It is just a bit better google search for me... (Score 2) 268

Yes, factually, LLMs give you the consensus of the input,

This is simply not true.

First off, learning happens overwhelmingly in creating the foundation, not the finetunes. The foundations are unsupervised continuation models. The model learns to predict what it's going to see next in the text. This is entirely context-dependent. In the context of whether vaccines cause autism, if the context is "some rando's post on Twitter", it might be just as likely to think that they cause autism as that they don't, while in the context of an abstract for a journal paper, it would almost universally be against the hypothesis.

From there, you make a finetune (supervised learning). This involves creating a curated dataset of sample user requests and appropriate answers, and training on that, so that the model learns the question / answer format and how to behave - but the knowledge it draws on was learned during the unsupervised foundational training phase. As a general rule, the sort of answers presented in the finetune will be scientific and professional in nature. As a result, the finetune draws from this context in its responses. Now, you certainly COULD finetune with examples that sound like responses from randos on Twitter, and you'd get a model that would behave as such. But why would you?

Comment Re:Tried it a couple of times, not impressed at al (Score 2) 268

Just as a random example, I'll scroll through my ChatGPT history.

Me: "using ffmpeg, encode all pngs in 00035/ to 00035.mp4 at 4fps"

ChatGPT: returned "ffmpeg -framerate 4 -i 00035/%05d.png -c:v libx264 -r 30 -pix_fmt yuv420p 00035.mp4", along with a description

Me: "How to see metadata of a png"

ChatGPT: returned five categories, some with multiple options, and made sure to address the ambiguity of my request by providing e.g. command-line tools, GUI tools, python packages, etc.

Me: "Write a detailed summary of Tolstoy's personal attitudes / philosophies expressed in War and Peace, with respect to the Russian empire."

ChatGPT: Did a great job.

Me: Several variants of "Summarize all of Tolstoy's works into a single really brief, snarky sentence. No more than 15 words. Just list the summary, not the author."

Various ChatGPT answers: "Russian society sucks; seek meaning in simple life.", "Rich Russians ponder life, love, and existential angst, sans brevity or optimism.", "Rich people whine, peasants suffer nobly, and everyone's miserable but profound.", etc.

Me: "In the style of Rishi Sunak, briefly announce a parody policy declaring that they'll make up to any North Ireland youth mandatory-constripted into the military by giving them a stylish black-and-tan uniform. Keep it to under 340 characters." (note my typo - ChatGPT doesn't care, so long as it can understand the request)

ChatGPT: "In a bold new initiative, we are announcing a mandatory conscription for Northern Ireland’s youth. To honor their service, every conscript will receive a stylish black-and-tan uniform. This policy ensures our youth will be both fashion-forward and ready to serve. Together, we’ll march into a future that’s as stylish as it is secure."

(Note: I know ChatGPT might mess up character counts, as it can't see letters, and that's fine, because it's no issue for me to correct if it's too long, just like it's far easier to fix bugs in generated code (where there are any, usually there aren't) than to write it from scratch)

Me: "Term for a distortion, often found in old images, where the corners of the image are darker than the centre" (I forgot the word, it was on the tip of my tongue)

ChatGPT: "The term you're looking for is "vignetting." Vignetting is..."

Me: "for all .jpg files $f in the current directory, use imagemagick to convert them to output in the filename "1920/$f", while downscaling them to a width of 1920."

ChatGPT: Does it.

-- ... and on and on. Yes, in some (though not all) of these cases one could Google, but it'd be significantly slower. Or I could implement things myself, but again, it'd be significantly slower. It's not like it's HARD to use e.g. ffmpeg, imagemagick, etc, but one has to remember or google the flags, think about the command structure, etc, and it's just faster to write out a short description of what you want to do. Seconds instead of minutes.

Comment Re:Tried it a couple of times, not impressed at al (Score 5, Insightful) 268

ChatGPT is not a search engine. It's a task engine. It's funny how people invariably try to test them out with are the things they're inherently worst at: obscure trivia, math, and problems involve physically seeing words (LLMs don't see words, they see tokens).

AI is the field of solving problems that are easy for humans but traditionally hard for computers. If you want a problem that's "easy for computers" (looking up things in a database, doing math, counting characters, etc), AI is the worst way to handle that, except when the AI just functions as a manager for other tools (e.g. RAG (retrieval-augmented generation, aka incorporating search), running a calculator or programming a program to solve a task, etc). AI tools exist to fill in the gaps around things that computers normally *can't* do, like resolving ambiguity, applying logic, understanding context and motivations, applying creativity, handling novel situations, using tools, recognizing information across modalities (vision, sound, etc), and so forth.

When LLM development first started, nobody was expecting them to be usable as a search engine at all; this was an emergent capability, the discovery that they contained vast amounts of knowledge and could incorporate that into their responses. But they don't know an entire search engine's worth of knowledge. Neither does any human, for that matter.

Re: hallucination - while it's improved significantly in commercially-available models, the biggest improvements are in research models. You can practically eliminate using varying techniques of running the model multiple times in varied conditions and then either - via processes on the output or more internal methods (such as cosine similarity metrics on the hidden states) measure the consistency of the responses. This is, however, slow. I fully expect that what we're going to move to is MoEs with cosine similarity metrics for each token, for each layer, between all executed expert models, fed back (matrix mult, add, norm) into the next layer, so that the model itself can learn how to appropriately react to the situation where its different expert models disagree (e.g.. low confidence).

The rate of advancement has been pretty staggering, and there's no signs that it's going to slow down.

The simple fact is that for many people, these tools even in their current state are extremely valuable. If you're a human being, I cannot understand how you can function in the world without understanding and adapting to the concept of fallibility. Because, TL/DR: for many tasks, failure absolutely *is* an option, or can't even be properly measured (e.g. creativity), while on others, that's what cross-referencing or applying your brain is for (again, you do this in your daily life in interaction with other fallible unreliable humans), and it's worth it for the capabilities LLMs bring to the picture (see paragraph #2).

I can't search on Google for, say, "I'm hungry for a vegetarian dinner and I'd like to use up some rice, green onions, cucumber and potatoes that I have, and I'd really prefer something that takes under 30 minutes to prepare; give me 15 ideas, and list the ingredients in metric, and oh, if it calls for butter, substitute olive oil." and immediately get back 15 ideas, and it works even if I misspell catastropically or whatnot.

I can't search on Google for "Were there any words for "being stupid" named after an actual person?" and get back Duns Scotus (Dunce) among others. (If you Google you might find it like 10 pages down in a non-top-ranked Ycombinator comments section)

I can't search on Google for, "Write a python function that will take an mp3 filename, load the file, split it up into three equal parts, and save them as part1.wav, part2.wav, and part3.wav", but I absolutely can have ChatGPT do that.

I can't search on Google for, "Here's the abstract to a paper I just wrote, but it's Talk Like a Pirate Day, so rephrase it in pirate talk for me."

I can't search on Google for, "In this long report, extract for me all information related to GA3 impacts on germination." and get that.

On and on and on. They're for tasks. Wherein you're either fine with (like humans) imperfect reliability, or (like when dealing with humans) you can test / double check the outputs and/or apply your own logic and knowledge. The fact that they also are getting increasingly good at trivia (or can be used with RAG) is an entirely separate issue.

Comment Re:The Human Target. (Score 2, Insightful) 268

You read it backwards. 18-24 year olds are the ones *bucking the trend*, aka using AI a lot. It's Gen X and boomers who are most mad at it.

"But the study, from the Reuters Institute and Oxford University, says young people are bucking the trend, with 18 to 24-year-olds the most eager adopters of the tech."

Averaging across all six countries, 56% of 18–24s say they have used ChatGPT at least once, compared to 16% of those aged 55 and over.

To be more specific: 56% at least once or twice, 39% at least monthly, 27% at least weekly, 9% daily.

By contrast with boomers: 16% at least once or twice, 6% at least monthly, 4% at least weekly, 1% daily.

Vs. boomers, zoomers are 3,5x more likely to have tried it, 6,5x more likely to use it at least monthly, 6,8x at least weekly, and 9x daily. It's a massive difference.

Furthermore, calling these numbers "low" seems absurd when you multiply them by the global population. Maybe they sound low on a percentage basis, but that translates to e.g. 100M monthly daily users and 13M daily users for ChatGPT.

Comment Things I'm looking out for: (Score 4, Interesting) 40

Q* (A* Q-learning, aka iterative tackling of possible routes to solve a problem via perceived distance from the solution)?

Massively MoE, with cos distance between hidden states feeding back to provide the model with an assessment of its own confidence?

More "distributed" models where portions are more decoupled from others and don't need as much bandwidth, to make it easier to train and run on cheaper hardware?

Will we finally see a lower O() context memory model? Could we at maybe close in on O(logN)?

Non-Transformers based (Mamba, etc)? Doubt it, but possible.

PCN-based with neuromorphic hardware? I *really* doubt that, but it would be an utter game changer.

Will surely be highly multimodal from the start. I wonder what the full modality list will be, though...

Slashdot Top Deals

"Love is a snowmobile racing across the tundra and then suddenly it flips over, pinning you underneath. At night, the ice weasels come." --Matt Groening

Working...