Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
AI Open Source

Ask Slashdot: Where Are the Open-Source Local-Only AI Solutions? 192

"Why can't we each have our own AI software that runs locally," asks long-time Slashdot reader BrendaEM — and that doesn't steal the work of others.

Imagine a powerful-but-locally-hosted LLM that "doesn't spy... and no one else owns it." We download it, from souce-code if you like, install it, if we want. And it assists: us... No one gate-keeps it. It's not out to get us...

And this is important: because no one owns it, the AI software is ours and leaks no data anywhere — to no one, no company, for no political nor financial purpose. No one profits — but you!

Their longer original submission also asks a series of related questions — like why can't we have software without AI? (Along with "Why is AMD stamping AI on local-processors?" and "Should AI be crowned the ultimate hype?") But this question seems to be at the heart of their concern. "What future will anyone have if anything they really wanted to do — could be mimicked and sold by the ill-gotten work of others...?"

"Could local, open-source, AI software be the only answer to dishearten billionaire companies from taking and selling back to their customers — everything we have done? Could we not...instead — steal their dream?!"

Share your own thoughts and answers in the comments. Where are the open-source, local-only AI solutions?
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Where Are the Open-Source Local-Only AI Solutions?

Comments Filter:
  • by Mr. Dollar Ton ( 5495648 ) on Saturday March 15, 2025 @11:43PM (#65237121)

    As for "open-source" open-source, what is called "AI" today isn't about "source", it is about having the ability to collect and store other people's data on a large scale, build pyramids out of expensive hardware and pay the power bills to power these to sift through the data and produce tables of statistical coefficients from it. That needs money and not just free work, so, unsurprisingly, there isn't much of it outside of areas that have public financing.

    Incidentally, some of these areas (mostly the sciences) provide more useful "AI" than the LLMs from elona, sam and whatever.

    • by Mr. Dollar Ton ( 5495648 ) on Saturday March 15, 2025 @11:51PM (#65237137)

      And for more practical advice on the subject, there's https://ollama.com/ [ollama.com]

      • by SumDog ( 466607 ) on Sunday March 16, 2025 @03:14AM (#65237361) Homepage Journal
        I'm running Ollama on my one windows (gaming) box since it has a 3080-Ti in it. The Continue plugin works with IntelliJ and open-webui works for general stuff. You can't run the really big models with just a 3080, but it's more than enough for a decent bullshit machine.

        Coding I would only trust AI for simple doc lookups. People who use it to do large scale coding are insane. The code looks god awful. Stop that.

        If you want to shell out ~$3k, you can get one of the nVidia mini boards with a lot of vram. Some AMD APUs can now split general RAM/vRAM on embedded systems (miniPCs) and that could be another avenue for larger big models.
        • If one uses the tools right, one can get useful results. [harper.blog]

        • You're making an assertion based on your ability to run either 1) Really small models, or 2) Large models that have been quantized to stupidity.

          If you want to shell out a little bit more than $3k, you can get yourself a Mac with 128GB of RAM that will run circles around the competitors that cost less than $70k in GPUs.
          Good models are quite good at coding. There are benchmarks available if you're curious.
          Grace Blackwell from NV does look really interesting, but we have yet to see benchmarks from it.
  • https://www.localai.app/ [localai.app]

    A duckduckgo search found many more options.

    • Re: Here's One (Score:4, Informative)

      by Z00L00K ( 682162 ) on Sunday March 16, 2025 @12:12AM (#65237159) Homepage Journal

      Even then it needs a huge amount of data to be trained on.

      AI is a case where the tool itself isn't the thing, it's the data behind the tool.

      • by allo ( 1728082 )

        You can just download models. Most are MIT or Apache 2.0 licensed.
        If you have a beefy GPU download Mistral small 24B, otherwise maybe Gemma 2 9B, Llama 3.2 8B or Mistral Nemo 12B depending on your VRAM.

  • by ihadafivedigituid ( 8391795 ) on Saturday March 15, 2025 @11:52PM (#65237139)
    Check Hugging Face, there are tons of open weight models. I run four or five different ones locally on my Mac and rarely use any of the online LLM providers.

    ollama (open source), LM Studio (not open source, but free), and other "host" programs are out there too.
    • But they were most likely trained on stolen data.

      • But they were most likely trained on stolen data.

        Reading isn't "stealing".

        • But reproducing for (or not for) profit is.

          • But reproducing for (or not for) profit is.

            Reproducing may be a copyright violation, but copying is not "stealing".

            It isn't clear that training an AI is even "copying".

          • by sosume ( 680416 )

            But these llms can't reproduce copyrighted works verbatim. You just object to the llm having specific information about copyrighed works, and being able to iterate on those works and produce something similar. But a llm is not like a copier, it's more an offline version of the internet.

          • No, it is not.
            Stop perpetuating this fucking myth, asshole.

            That is a copyright violation, which is not theft. The MPAA and RIAA don't need your fucking help brainwashing people.
      • So were you, by that so-called logic.
      • Wrong.

        They were trained on data that has potential copyright fair use problems. That is yet to be determined.
        That is not stealing.
    • by AmiMoJo ( 196126 )

      You are always relying on private companies doing the training though, not doing your own training with their open source tools, because training requires very expensive hardware resources. Even the DeepSeek one cost a few million to train.

      To be truly open source you would need a training data set that is fully open, and the resulting trained model, with a way to verify that the latter came from the former. Does anyone offer that?

  • by Utopia ( 149375 ) on Saturday March 15, 2025 @11:57PM (#65237145)

    To address the question: "Where can I find open-source, local-only AI solutions?"

    There is a vibrant online community called localllama dedicated to this very topic. Localllama is a great resource for individuals interested in running AI models locally without relying on cloud services. You can explore a variety of models and tools within this community.

    One popular option is 'llama.cpp', a high-performance library for running large language models locally. 'llama.cpp' is designed to be efficient and can run on both CPUs and GPUs. For those who prefer a more comprehensive framework, the Hugging Face Transformers library is another excellent choice. It supports a wide range of models and provides extensive documentation and community support.

    While a GPU is recommended for optimal performance and faster inference times, it is possible to run these models on a CPU. However, be prepared for significantly longer processing times if you choose to use a CPU.

  • You can, as long as it is relatively small, or if you have the hundreds of thousands for the hardware you can also run large models.
  • for the training and then open source the LLM?

    • by martin-boundary ( 547041 ) on Sunday March 16, 2025 @02:04AM (#65237305)
      A trained LLM without the original training data is not "open source"
      • Very well said. I don't think I would be happy qualifying llama or deepseek as opensource. Some corporation did the fishing for us and handed us the fish. There is no way for me to go change the process that produced the model.

        • You're right... but there also isn't a realistic way for you to, anyway, even with the data.

          You don't need the data to fine-tune it, which is realistically the only thing within your power to do, unless you've got millions of dollars to burn on pre-training some weights.
          • If there's no realistic way to make it open source, then it should not be called open source. Otherwise we live on a wall, like Humpty Dumpty.
            • I didn't say there was no realistic way to make it open source- I said there's nothing realistic you could do with the raw-est source.
              The pre-trained models (which are available) are close enough to source.
              You can fine-tune them, they haven't had alignment training, etc.
              They simply took care of the expensive part for you.
              • I think you're wrong about the pre-trained models. The biases in the raw data are important, they form the basis for tokenizations, and the meta-data (file names, directories, embedded markup, page numbers) too is not easily recognizable in pre-trained parameter sets.

                There is a great definition in the GPL which gets to the crux:

                The âoesource codeâ for a work means the preferred form of the work for making modifications to it. âoeObject codeâ means any non-source form of a work.

                In the

                • The biases in the raw data are important, they form the basis for tokenizations

                  I definitely didn't mean to imply that the raw data isn't important. It literally is the model, more or less.

                  and the meta-data (file names, directories, embedded markup, page numbers) too is not easily recognizable in pre-trained parameter sets.

                  Sure, but what realistically do you need it for?

                  There is a great definition in the GPL which gets to the crux:

                  I think the critical argument here, is how we define source.
                  If we're being completely fair, the GPL's concept of "open source" doesn't apply here at all, really. It's too tailored to software, and a released set of weights isn't software. They're absolutely abusing the term, but not too much.
                  You could certainly argue that the parameters aren't source,

      • by Tom ( 822 )

        So when you buy or download a software, you expect all the books the coders used to learn with it, as well as everything they looked up on StackOverflow while doing it?

        I don't see this as being completely different. In all other fields of life, we buy the final product. We understand that there are machines and workers in a factory producing it, but we don't expect to get those sent along.

  • by jamienk ( 62492 ) on Sunday March 16, 2025 @03:06AM (#65237353)

    Are cultural works really "my" or "your" works? It seems that the pendulum has swung so far that most art, writing, film, etc., is controlled by big-monied interests.

    The elimination of copyright, I think, would benefit the arts and & sciences, would benefit the people over the big companies. The companies want more IP laws & limits, because this is their way of gathering rents. International trade has focused largely on IP, with the US demanding that China and others respect our legal restrictions. IP has become a deeply reactionary way of looking at things.

    I think the rip, mix, burn model is the natural way that people express themselves, build culture, and (indirectly) effect the course of humanity. By making it so highly regulated, we strangle the ability for change to come from below. It serves an intensely oppressive impulse, and is a "market" in all the bad ways (unfair monopoly, rents, legal manipulation) and none of the good ways (trade, competition, innovation).

    The big AI companies are going to SUPPORT making it more and more illegal to "train" on "data" -- or in other words, for people to be allowed to freely read and experience culture in its broadest sense. Yes, even China will end up here: it will be very illegal to train your personal AI on unapproved data.

    I think the anti-corporate sentiment and impulse of the political grassroots is wrongly focused. I want my personal AI to train on all the books and music and movies and stuff that I love, that I hold dear. Just like how I want my children to read those books, see those movies, sing those songs. The AI that I train for myself will not respect the "rights" of the "creators" of those things. When I read a book and it deeply influences me, I am not doing a disservice to the author! When someone convinces me about an idea, I am not stealing their "product!" When I make a computer program that appreciates the film techniques that I have grown to love, I am not ruining those traditions!

    • I agree with your general direction about copyright being taken too far, but it doesn't seem to me that removing copyright entirely follows.

      Copyright exists to solve the negative externality that is cause by a person creating a work that then goes on to benefit other (external) parties more than it benefits them. With no copyright protections, someone who dedicates their time to creating a work of art, which might cost them a lot of money for materials and expenses (and for them to learn the skills to do so

      • by allo ( 1728082 )

        Automating things gives that a new twist.

        Currently copyright mostly protects a craft. You need some artwork, I get the job. I invest a lot of time and a one-time sale that covers my costs may be too expensive to you, so I sell licenses to a lot of people. Take for example clipart, once created it brings money because everyone needs some little cute icons for their website. It will cover the costs and then continue to make money. Not that much, but you don't have to do anything for it afterward.

        With less cop

        • I agree with almost everything you've said. Art wouldn't exist at all if it was all just a case of maximising returns and getting the most efficient allocation of resources.

          The only thing I disagree wiht is:

          "With less copyright (shorter, less strict, whatever) I can't rely on this constant income stream, so I need to demand the money my work costs from you".

          This isn't an obvious conclusion to me. Would it be more likely that you'll find you need to keep making new works to keep generating new revenue stream

      • by jamienk ( 62492 )

        You are taking a narrow view: that we can/should create and manipulate markets to incentivize certain kinds of expression. But there are other, broader, considerations: artists might need or want more freedom; there might be scientific or technical strangulation; the right to access literature and science is often hobbled; etc. etc. All of the arguments for free speech apply: the moral ones, the ones about a marketplace of ideas, the status of minority views, etc. Theories of art do not all privilege econom

        • Thank you, very kind of you.

          I agree with you. I wasn't really trying to express any kind of moral position, but rather just outlining my indication of the historial reasons that led to copyright laws coming into force (of course, in reality it was a messy thing done in little bits and pieces over the course of centuries, but you know how these things are bets summarised).

          That said, I'm of the opinion that they ought to be a balance. I think that if there were no copyright laws then all art would end up as l

  • by Dripdry ( 1062282 ) on Sunday March 16, 2025 @03:09AM (#65237357) Journal

    What is the best way to gather our own data? I have wanted to gather my own metrics on everything: from my movement and location, what apps I use, when I wake up, all the statistics of everything I go about on a day-to-day basis. I want that stored, and now that AI has come about there is actually a potential use for this data.

    I am sure this is obvious to many, but it seems to me that most data of any use, and the ability to collect my own data, is all locked behind paywalls or outright not available.

    • What is it that you call "my data"?

      Various bodily functions, movement and location - use GadgetBridge with one or more of the supported hardware.

      What apps you use - trivial on android, or on a PC, as you can see it with any log browser.

      What you eat, how much you weigh, which access points and cell towers you see, etc? There are apps on f-droid for all of these.

      How is your pet doing? You may need some work to adapt some of the above.

      What else do you need?

  • They do not exist. FOSS creators are a bit more hype-resistant than the average person and currently are not interested in creating artificial morons that guzzle power and make you dumber. Eventually, that may change, but a real, FOSS "business case" is needed and that is missing. So if you want in on the hype, pay up. Or you could just use your time better and stay away.

    • I could see possible use cases for local use. One of the biggest issues I see is that the company that trains the model necessarily puts their finger on the scale. You get what they want you to get... simple example, what happened at tiennamen square? The answer will be curated by the originator. Training your own private model allows to put your finger on that scale. I see this as analogous to raising a child.
  • This works: https://github.com/SciSharp/LL... [github.com] Using Ollama to dump clean models I built a c# app using Llamasharp in a couple of days that cycles history (.. learns?), saves sessions, runs offline, etc. albeit slowly on my 2080. I'm WAY too cheap to pay for ChatGPT, and I wanted to better understand what Deepseek was so I've been using the 14b model as a baseline, and it works great. It can refactor itself (feed forms/documents into it), and writes some pretty mean Godzilla poetry in XML format when pr
  • I don't have any LLMs that run locally, but I do run two extremely useful and practical AI (or at least, machine-learning) tools locally: whisper.cpp [github.com] to automatically transcribe speech in videos to subtitles, and rembg [github.com] to automatically remove the backgrounds from images, leaving only the foreground subjects. Both work extremely well.

  • by kamakazi ( 74641 ) on Sunday March 16, 2025 @10:13AM (#65237927)

    There are a few pieces to AI, there is the code that ingests the data and tokenizes it, there is the code that interacts with user, and then there is the actual data fed into the tokenizer. The first and second are more like what we traditionally call software, and are available in open source versions. The third is the problem piece. If you managed to textualize everything you know and feed it into a LLM the LLM would only know what you know, and that would not really be very useful unless you just wanted it to remember your second cousins birthday and remind you about it. The minute you start feeding it text or images you didn't create you venture into the ethical and legal morass that is currently churning all over the world around the big LLMs.
    That huge pool of tokens is what makes an LLM useful, it really is the LLM. The code that can be ethically shared just creates or interacts with that LLM. Yes, you can own books, and by historical precedent have every right to physically manipulate that book in any way you like. You do not have the right to copy that book, and this is at the heart of the controversy. Many authors, artists, and creators are making the claim that the act of ingesting a book into an LLM creates a copy of that book, while the people making LLMs (and the corporations who see the potential for $BILLIONS$) say that they are just deriving metadata, or that the ingestion does not constitute a copy because the text is not stored, it is tokenized, and LLMs will not regurgitate verbatim the data on which they are trained.
    Of course creative prompts seem to show that they will indeed regurgitate verbatim.
    The current state of this controversy makes it very difficult to guarantee that the training set of a useful LLM was actually all public domain or legally ingestable and therefore releasing an LLM under an open source license might get you sued.
    Of course this legal back and forth is how we discover the need for new law, and eventually will lead to various governments or legislative bodies making laws that define the borders of what can and can't be fed to an LLM without licensing. These laws will vary by location and the perceived values of the bodies making the law, which will probably make "LLM friendly" locations where the AI companies will go to lower ingestion cost, which will then lead to another wave of lawsuits, this time by authors et al., attempting to prevent access to the LLMs from regions with stricter laws, much like we have seen in the audio/video realm.
    Basically AI, at least in the AGI sense, is really not something that an individual of normal means can do, much in the same way that an individual of normal means cannot make a mass production factory, the resources required are just too big.
    AI, in the classic sense not the prompt driven generative sense, is something an individual can play with. It is fundamentally pattern recognition, and is applied invisibly in many parts of life already.
    For me a really fun example is the arc-fault circuit breaker required in much of the US in new electrical installations. It actually "listens" to noise on the electrical line and compares it to a signature library of electrical connections to determine if it is an accidental arc or just the normal operation of a device which arcs, like a brushed motor or a relay.
    The first generation of these devices produced so many false positives they rapidly gained a reputation of uselessness, however as signature libraries improved and pattern matching algorithms evolved they got better and better. This is AI. It has nothing to do with general intelligence or conversation, it is a very specific realm of pattern matching, and it does it better and faster than anything a person could do.
    Because it is an industrial control device it is not recognized as AI, it is just another little black box that controls something. It doesn't even look as impressive as a PID process controller, which, though it can appear to be smarter, is really not AI, it is just a basic calculator with real world inputs.

  • I don't know the full details, and I'm working from memory, but there were scientists working on building local AI instances so that they could be archived along with the data being analyzed for reproducibility.

    Of course, many of them are of the 'data analysis' and 'computer vision' type AI, not of the generative AI types

  • There are of course many small models whose weights you can download and run locally, but typically training data isn't provided (nor training instructions), so whether you consider that "open source" is up to you. Of course most people don't want to (or can't afford to) incur the cost and effort of training a model themselves, so weights is all they really want.

    Now, to be useful an LLM needs to be large, so training it ONLY on your personal data is really a non-starter, unless you somehow have a self-autho

  • I have an idea, stop looking for a hand out and build your own.
    Don't have money, start a company and look for investors..

  • Why use the big corporate models? If you have a computer with a recent-ish GPU, look at GPT4All and LM-Studio. What the author is asking for already exists.

    And if you want to run these models on 3rd party hardware, use t3.chat - very reasonable monthly cost, and their business model doesn't include harvesting your data.

    Anyone using chatgpt.com or MS Copilot is kinda running behind the times at this point.

  • by Peterus7 ( 607982 ) on Sunday March 16, 2025 @01:41PM (#65238299) Homepage Journal
    I've been in the FOSS and DIY scene on and off for 20 years, and there's always been some core group of DIY people doing cool stuff, whether it's custom ROMs, FOSS stuff, or just hosting their own servers and mesh nets. I know a few people dabbling with building their own AIs- A buddy of mine built a LMM to handle citywide policy analysis (he works as a civil servant). Kinda surreal to see this AI that literally sits in his basement. But I think the democratization of AI will happen when more people start getting into it and realizing, screw it, I can make my own. It's happening, but slowly.
  • There are smaller models that are perfectly capable and will still run alright on a gaming PC or one of the M* Apples with unified memory. For example qwen32 distillation of R1 is still quite competitive with state of the art and doesn't require insanely expensive hardware.

    The nice thing about inference software like llama you can also spill over to ram so if the model is too big to fit in VRAM on the GPU you can spill over the rest to run in RAM on the CPU instead.

  • ...who was home-schooled with the bible alone?

  • Where are the open-source, local-only AI solutions?

    In front of me, on the harddrive of my machine.

    These things already exist and some of them have been around for quite some time. LM Studio, for example, or Ollama. They're even reasonably easy to use, an interested non-techie could figure it out.

    More of them are coming out constantly. Local-only or local-first (i.e. with an optional choice to also query online sources) AIs are already fairly common. Asker needs to use Google before using /.

Nothing will ever be attempted if all possible objections must be first overcome. -- Dr. Johnson

Working...