Hey everyone, I’m interested in using a local LLM (Language Model) on a Linux system to create a long story, but I’m not sure where to start. Does anyone have experience with this or know of any resources that could help me get started? I’d love to hear your tips and suggestions. Thanks!
You can try setting up Ollama on your RPi, then use a highly-quantized variant of the Mistral model (or quantize it yourself with GGUF+llama.cpp). You can do some very heavy quantization (2-bit), which will increase the error rate. But if you are only planning to use the generated text as a starting point, it might be useful nevertheless. Also see: https://github.com/ollama/ollama/blob/main/docs/import.md#importing-pytorch--safetensors
Here are some pre-quantized variants of Mistral 7B: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF
(all the tools and models I have mentioned in my comment are free and open-source, and beyond that, require no uplink during operation)