Hey everyone, I’m interested in using a local LLM (Language Model) on a Linux system to create a long story, but I’m not sure where to start. Does anyone have experience with this or know of any resources that could help me get started? I’d love to hear your tips and suggestions. Thanks!
Install ollama and create an alias for
ollama run <model>
for ease of access is what I didHow much time do you have? Because even small models will take alot of time on that kind of hardware to spit out a long text…
And the small models arent that great. I think the current best and economic model would be a mistral, mixtral or dolphin.
If you got the power, nous-capybara is very good and “only” 34B parameters (loading alone needs like 40GB of memory).-
you need a gpu or gpu resources to run a GPT level model
-
stop thinking of it as “make me a story” and start thinking of it as “let’s make a story together”
First talk back and forth over the basic idea.
Then get a short outline.
Then work on each chapter of the outline.
Then repeat and refine.Try to keep overall queries under a certain character limit. Google what some good ranges are.
Make sure to save each conversation, and use the outline and previous chapter as the start of your conversation for making the next chapter.
The AI won’t write it for you, but it can be an amazing accelerator if you’re willing to put in the work.
-
Perhaps if you broke your story up into chapters, each of those would be a more digestible chunk for the LLM?
Is this something that a TPU module could accelerate?
I wondered this out loud previously and what I got back is that LLMs need more RAM than they do compute, so not really.
Makes sense, thank you.
Raspberry Pi
This is a famously high-compute problem and you want to chuck it on a pi? Most llm models require a good GPU.
Just want to piggyback this. You will probably need more than 6gb vram to run good enough models with a acceptable speed and coherent output, but the more the better.
I think you’ll struggle with the coherent part
Most LLMs can do a few paragraphs and stay on topic but after thet they need better guidance, usually by changing the prompt to stay relevant. 10k+ words can be hard for normal authors to stay coherent on a single prompt, let alone a GPT-3 model
Reading the articles you attached OP, this is exactly the technology they are still struggling with. I dont think any open source consumer level models will have quite what you’re looking for… yet!
Using a raspberry pi seems very underpowered, best case you will be limited to something like 4-7B models on a 8GB RPi4. You may need to configure it with very long timeouts and expect it to output something like a token every few minutes.
I ran a 6B model on a i7 without a GPU and it didn’t give good results before I got CUDA up and running. Probably because of timeout.
I use machine learning/ai pretty much daily and I run stuff at home locally when I do it. What you’re asking is possible, but might require some experimentation on your side, and you might have to really consider what’s important in your project because there will be some serious trade-offs.
If you’re adamant about running locally on a Rasberry Pi, then you’ll want a RPi 4 or 5, preferably an RPi 5. You’ll also want as much RAM as you can get (I think 8gb is the current max). You’re not going to have much VRAM since RPi’s don’t have a dedicated graphics card, so you’ll have to use it’s CPU and normal RAM to do the work. This will be a slow process, but if you don’t mind waiting a couple minutes per paragraph of text, then it may work for your use case. Because of the limited memory of Pis in general you’ll want to limit what size LLM models you use. Something specialized like a 7b story telling LLM, or a really good general purpose model like Mistral Open Orca 7b is a good place to start. You aren’t going to be able to run much larger models than that, however, and that could be a bit creatively limiting. As good as I think Mistral Open Orca 7b is, it lacks a lot of content that would make it interesting as a story teller.
Alternatively, you could run your LLM on a desktop and then use an RPi to connect to it over a local network. If you’ve got a decent graphics card with like 24gb of VRAM you could run a 30b model locally, and get decent results fairly fast.
As for the 10k words prompt, that’s going to be tricky. Most LLMs have a certain number of tokens they can spit out before they have to start up again. I think some of the 30b models I use have a context length of 4096 tokens… so no matter what you do you’ll have to tell your LLM to do multiple jobs.
Personally, I’d use LM Studio (not open source) to see if the results you get from running locally are acceptable. If you decide that its not performing as well as you had hoped, LM studio also generates python code so you could send commands to an LLM on a local network.
Gpt4all is easy to use
I use koboldAI
This sounds suspiciously like “I want AI to write my school essay” to me.
More generically, this sounds like “Let’s not put any effort into anything: the machine will do it for me.”
That is my opinion and I’m aware it’s a minority point of view these days. That’s why I fully expect to be modded down immediately after posting this. Because arguing with people who have a different opinion is also not in fashion anymore.
Let’s not put any effort into anything: the machine will do it for me
So you are not using a calculator, I presume? Only math done on abacus is not being lazy?
There is a conveniently omitted difference here. When doing learning exercises, the result is almost irrelevant. The way towards said result is what matters. So if you put a function into a calculator and learn that the result is X1/2=3π², you have really gained nothing. What about text creation have you learned when you let an AI spew out some text “from a single prompt”?
What about text creation have you learned
In many cases I don’t want nor need to learn that. I just need volume about the key points
When it makes sense I do. More often than not, for basic arithmetics in daily life, and back of the hand engineering calculations, I use my noggin because it’s quicker.
Why an LLM is any different?
Let’s say I want my RPG players to find a corporate mail that gives them some plot info. Why not ask an LLM to write the boilerplate around the info I want to give them? Just as example
Agreed. All I said was, the way the OP framed his question sounds like they’re trying to weasel out of putting some effort into something that sounds worth putting some effort into.
It’s not like they said “I want to draft boilerplate legalese that I can go back and adapt to each customer.” He said “I want to use an LLM to create full-blow 10,000 word long, coherent stories.”
trying to weasel out of putting some effort into something that sounds worth putting some effort into
But that depends what do they need it for
Personally I don’t see a difference between legalese boilerplate and 10k word story. But that discussion might lead us nowhere
The difference is between spewing out functional text for the purpose of covering your ass in court and human creativity.
I have no problem with people using AI to generate Excel sheets, meeting records, summaries of articles… I really, REALLY have a problem with people who ditch what makes us human to the wayside and delegate what is ours and ours alone to do to the machine out of laziness.
I keep a bookmark to this AI detector on my browser bar https://gptzero.me/
I just do random checks for AI generated stuff, especially if it’s a news site I’ve not seen before. There are use cases for AI in text creation but I think it’s very limited - a summary production (with references), for example. Instead I am exited for medical advances, space study, cases where there are huge datasets of good quality and not datasets that are the result of huge dragnets of shite from the internet.
People are quite right to be cynical about AI created stuff; we’re near the zenith of a hype-cycle at the moment.
You’ll be getting no down vote from me.
Thank you for that link. I needed that!
Why host it locally in that case, and why host it on a Pi? Seems rather restrictive for that usecase.
They didn’t mention what the story is for, so I am free to speculate. Maybe a raspberry pi is all they have available for such tasks. It could also be for articles or blog posts.
Well check out gpt4all.
There are a number of different models to download and run.
I think your real challenge is to find the right model.
Do you know any programming languages?
If you want something local and open source, I think your main problem will be the number of parameters (the
b
thing). ChatGPT-3 is (was?) noticeably big and open source models are usually smaller. There is, of course, an exchange about how much the size of the model matters and how the quality of the training data affects the results. But when I did a non-scientific comparison ~half a year ago, there was a noticeable difference between smaller models and bigger ones.Having said all of that, check out https://huggingface.co/ it aims to be like GitHub for AIs. Most of the models are more or less open source, you will only need to figure out how to run one and if you have some bottlenecks on PI