You Can Now Run a GPT-3 Level AI Model On Your Laptop, Phone, and Raspberry Pi

You Can Now Run a GPT-3 Level AI Model On Your Laptop, Phone, and Raspberry Pi 27

Posted by BeauHD on Tuesday March 14, 2023 @09:00AM from the what-will-they-think-of-next dept.

An anonymous reader quotes a report from Ars Technica: On Friday, a software developer named Georgi Gerganov created a tool called "llama.cpp" that can run Meta's new GPT-3-class AI large language model, LLaMA, locally on a Mac laptop. Soon thereafter, people worked out how to run LLaMA on Windows as well. Then someone showed it running on a Pixel 6 phone, and next came a Raspberry Pi (albeit running very slowly). If this keeps up, we may be looking at a pocket-sized ChatGPT competitor before we know it. [...]

Typically, running GPT-3 requires several datacenter-class A100 GPUs (also, the weights for GPT-3 are not public), but LLaMA made waves because it could run on a single beefy consumer GPU. And now, with optimizations that reduce the model size using a technique called quantization, LLaMA can run on an M1 Mac or a lesser Nvidia consumer GPU. After obtaining the LLaMA weights ourselves, we followed [independent AI researcher Simon Willison's] instructions and got the 7B parameter version running on an M1 Macbook Air, and it runs at a reasonable rate of speed. You call it as a script on the command line with a prompt, and LLaMA does its best to complete it in a reasonable way.

There's still the question of how much the quantization affects the quality of the output. In our tests, LLaMA 7B trimmed down to 4-bit quantization was very impressive for running on a MacBook Air -- but still not on par with what you might expect from ChatGPT. It's entirely possible that better prompting techniques might generate better results. Also, optimizations and fine-tunings come quickly when everyone has their hands on the code and the weights -- even though LLaMA is still saddled with some fairly restrictive terms of use. The release of Alpaca today by Stanford proves that fine tuning (additional training with a specific goal in mind) can improve performance, and it's still early days after LLaMA's release. A step-by-step instruction guide for running LLaMA on a Mac can be found here (Warning: it's fairly technical).

You Can Now Run a GPT-3 Level AI Model On Your Laptop, Phone, and Raspberry Pi

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 27 Comments Log In/Create an Account

Comments Filter:

If you're looking for more of a ChatGPT... (Score:5, Interesting)

by Rei ( 128717 ) writes: on Tuesday March 14, 2023 @09:06AM (#63369375) Homepage

... style interface, try this [github.com]. They've now even gotten it running on 4bit floating point models, which is rather insane, as that's only 16 possible values for the floats (it uses a trick involving adjusting the rounding algorithm so that errors between layers tend to cancel out). There's a a Reddit [reddit.com] for discussing it here, where you can see examples like this [reddit.com].
Do note that this is all bleeding edge, and neural network software tends to have pretty long dependency chains.

- Re:If you're looking for more of a ChatGPT... (Score:4, Informative)
  
  by Rei ( 128717 ) writes: on Tuesday March 14, 2023 @09:14AM (#63369393) Homepage
  
  Oh, and while it's "possible" to run LLMs almost anywhere, realistically, you want a NVidia card with as much VRAM as you can afford, and ideally, to run it in Linux. Everyone develops for AI tools for CUDA; AMD (let alone CPU) is more of an afterthought, a hack if you're lucky. A RTX 3060 (12GB) should be seen as the minimum for reasonable uses. A used RTX 3090 (24GB) is a lot better. A RTX 4090 (GB) will get you more performance, but not more of that critical VRAM. The upcoming Titan RTX Ada is super-exciting for AI applications in that it's supposed to come with 48GB of VRAM... although said four-slot "cinder block" is expected to come with huge power draw and a correspondingly big price tag.
  
  - Re:If you're looking for more of a ChatGPT... (Score:5, Interesting)
    
    by PCM2 ( 4486 ) writes: on Tuesday March 14, 2023 @11:30AM (#63369779) Homepage
    
    According to the article, it runs pretty decently on Apple Silicon, too. I was going to try it out, but unfortunately, you need to Torrent 219GB of data and go through the process of building it, first. I just don't have the time and mental bandwidth.
    
    - Re: (Score:2)
      
      by Rei ( 128717 ) writes:
      
      It's possible to just download preconverted 4-bit models, though it's a good idea to have the full 219GB models. But yeah, setup for bleeding-edge AI software is nontrivial. The download is the easy part.
      - Re: (Score:2)
        
        by LifesABeach ( 234436 ) writes:
        
        results.
        all results do not need to be in 5 minutes.
        some questions can be answered maybe tomorrow.
        to have chat g p t reply back by saying
        i am still working on it.
        then to have the user reply
        start from the beginning tell me how you are going about solving the question i asked.
        here is where the user can prioritize branches in the tree logic of the question asked.
        not cut the branch.
        but prioritize the branch.
        
        Re: (Score:2)
        
        by Samantha Wright ( 1324923 ) writes:
        
        Unfortunately, that is not how GPT algorithms work.
      - Re: (Score:2)
        
        by angel'o'sphere ( 80593 ) writes:
        
        The download is the easy part.
        Only if I pipe the download to /dev/null :P
    - - Re: (Score:2)
        
        by CAIMLAS ( 41445 ) writes:
        
        I know you're a troll, but... Compared to what, though?
        Apple Silicon might not be great for the latest, greatest desktop experience, but on a laptop it's more than sufficient for most use cases. It's also ubiquitous, now, and "the low bar" for entry. You just need the laptop, not an additional $500+ GPU.
        
        Re: (Score:2)
        
        by Bahbus ( 1180627 ) writes:
        
        There are other equally good options, also without needing an additional GPU. And not trolling. Apple silicon isn't worth the money.
      - Re: (Score:2)
        
        by Bahbus ( 1180627 ) writes:
        
        Running well on Apple silicon doesn't mean it will also run well on any other hardware setups, so no.
- Re:If you're looking for more of a ChatGPT... (Score:4, Informative)
  
  by Some Guy ( 21271 ) writes: on Tuesday March 14, 2023 @11:16AM (#63369749)
  
  Do note that this is all bleeding edge, and neural network software tends to have pretty long dependency chains.
  One of the goals of Georgi's work (llama.cpp [github.com], whisper.cpp [github.com], and ggml [github.com]) is to eliminate that. It's all C/C++. There are a couple of model conversions that use scripts in Python but these are one-shot and the converted models are made available.
  To me his work is most interesting because you can embed these in other software as a C++ library. No Python + massive number of packages and dependencies.
  
Add another chip to the mix... (Score:1)

by Anonymous Coward writes:

Couple the CPU and GPU with a FPGA. FPGA allows for the CPU & GPU to offload "software" into just-in-time hardware that is also coupled to the CPU & GPU logic.
Maybe the next logical extension to RISC-V - chunk of real estate with an FPGA on it.
- Re: (Score:2)
  
  by omnichad ( 1198475 ) writes:
  
  Maybe for general computing, but not for these current AI algorithms. Everything is in the VRAM. The CPU barely matters either because you can't copy anything back to system RAM without completely destroying performance. Because the newer Macs use a SoC, they have shared memory which would let the CPU participate without penalty.
  I'm sure the future of computing will have more SoC devices, not less, and the dedicated GPU will go away once again due to memory bandwidth constraints. It wouldn't surprise me
  - Re: (Score:3)
    
    by drinkypoo ( 153816 ) writes:
    
    I'm sure the future of computing will have more SoC devices, not less, and the dedicated GPU will go away once again due to memory bandwidth constraints.
    Mac M2 total memory bandwidth: 100GB/sec, shared between CPU and GPU
    PCIE 5.0 x16 total memory bandwidth: 64GB/sec/direction, unshared
    Impending (released but as yet generally unavailable) PCIE 6.0 x16 total memory bandwidth: 128GB/sec/direction, unshared
    Additional: The CPU has little to contribute anyway in most cases.
    Conclusion: Apple has only a slight advantage in memory bandwidth between CPU and GPU despite their unified memory model, and meanwhile the total amount of memory they can put into the system i
LLaMA is not a chat based model (Score:2)

by Dan East ( 318230 ) writes:

It's my understanding that LLaMA is not a chat based model in the first place, so it will not produce an interactive style conversation / chat like ChatGPT. That's why the various demo outputs from this story are text completion only. Not that I know the fundamental difference between the two or how persistence is handled from one response to the next in a chat-style interaction.
Prompt: Ars Technica is
Response: Ars Technica is 10 years old! Here are our most memorable stories...
Etc
- Re: (Score:2)
  
  by Rei ( 128717 ) writes:
  
  As per my post above [slashdot.org], you can trick it into behaving like a chat model, even though it's not fine-tuned to do so. That said, there's a fine-tuned model called Alpaca out there; hopefully the weights will be released soon.
Comment removed (Score:4, Insightful)

by account_deleted ( 4530225 ) writes: on Tuesday March 14, 2023 @11:42AM (#63369819)

Comment removed based on user account deletion

- Re:Waiting for local ChatGPT (Score:4, Informative)
  
  by Rei ( 128717 ) writes: on Tuesday March 14, 2023 @11:51AM (#63369839) Homepage
  
  You can rather well do that with LLaMA and the Oobabooga text-generation-webui using your own custom version of the Inevitable Start [reddit.com] character card, such as A-Hole ChatGPT [reddit.com], or whatever you want. It'll only get better once Alpaca comes out.
  
- Re: (Score:2)
  
  by Bahbus ( 1180627 ) writes:
  
  Yeah, you think you want that, but just like every other human being, you actually have no idea what you want, you just know what you don't want.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Ultimate I look forward to a local ChatGPT that I can remove all it's injected brainwashing.
  And inject your own, once you figure out how.
  I want one that responds based on facts, not ideology.
  No, you don't. You want one that responds based on your ideology.
  At least try to be honest, to yourself to start with.
Nothing new there (Score:2)

by JamesTRexx ( 675890 ) writes:

I've been running an AI for many years already who is even smarter than ChatGPT: Eliza.
And she didn't have to steal any work from others.
VCs in doubt (Score:2)

by real_nickname ( 6922224 ) writes:

The billion dollar text generator at $X per token economic model seems to fade already. Weights are leaked, nobody knows what kind of IP law apply, leaked models are trimmed, trained again... They don't need data center to run anymore. The AI speculative bubble is dead already (not the the tech!). Good news I think.
This is so bizarre! (Score:1)

by Anonymous Coward writes:

Warning: it's fairly technical
On Slashdot? Really?? Does somebody find technical offensive? I guess the audience here isn't what it once was

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

You Can Now Run a GPT-3 Level AI Model On Your Laptop, Phone, and Raspberry Pi 27

You Can Now Run a GPT-3 Level AI Model On Your Laptop, Phone, and Raspberry Pi More Login

You Can Now Run a GPT-3 Level AI Model On Your Laptop, Phone, and Raspberry Pi

If you're looking for more of a ChatGPT... (Score:5, Interesting)

Re:If you're looking for more of a ChatGPT... (Score:4, Informative)

Re:If you're looking for more of a ChatGPT... (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:If you're looking for more of a ChatGPT... (Score:4, Informative)

Add another chip to the mix... (Score:1)

Re: (Score:2)

Re: (Score:3)

LLaMA is not a chat based model (Score:2)

Re: (Score:2)

Comment removed (Score:4, Insightful)

Re:Waiting for local ChatGPT (Score:4, Informative)

Re: (Score:2)

Re: (Score:1)

Nothing new there (Score:2)

VCs in doubt (Score:2)

This is so bizarre! (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot