Llama 3.1 AI Models Have Officially Released

simple@lemm.ee · 4 months ago

Llama 3.1 AI Models Have Officially Released

brucethemoose@lemmy.world · 4 months ago

[Oh, my friend, you have to switch to this: https://huggingface.co/BeaverAI/mistral-doryV2-12b

It’s so much smarter than llama 13B. And it goes all the way out to 128K!

just another dev@lemmy.my-box.dev · 4 months ago

Oof - not on my 12gb 3060 it doesn’t :/ Even at 48k context and the Q4_K quantization, it’s ollama its doing a lot of offloading to the cpu. What kind of hardware are you running it on?

brucethemoose@lemmy.world · edit-2 4 months ago

A 3090.

But it should be fine on a 3060, with zero offloading.

Dump ollama for long context. Grab a 5-6bpw exl2 quantization and load it with Q4 or Q6 cache depending on how much context you want. I personally use EXUI, but text-gen-webui and tabbyapi (with some other frontend) will also load them.