How can I use a local LLM on Linux to generate a long story?

ChasingEnigma@lemmy.world · 9 months ago

How can I use a local LLM on Linux to generate a long story?

kby@feddit.de · edit-2 9 months ago

You can try setting up Ollama on your RPi, then use a highly-quantized variant of the Mistral model (or quantize it yourself with GGUF+llama.cpp). You can do some very heavy quantization (2-bit), which will increase the error rate. But if you are only planning to use the generated text as a starting point, it might be useful nevertheless. Also see: https://github.com/ollama/ollama/blob/main/docs/import.md#importing-pytorch--safetensors

Here are some pre-quantized variants of Mistral 7B: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF

(all the tools and models I have mentioned in my comment are free and open-source, and beyond that, require no uplink during operation)