ooli@lemmy.world to Technology@lemmy.worldEnglish · 9 months agoGPU's rival? What is Language Processing Unit (LPU)www.turingpost.comexternal-linkmessage-square13fedilinkarrow-up10arrow-down10
arrow-up10arrow-down1external-linkGPU's rival? What is Language Processing Unit (LPU)www.turingpost.comooli@lemmy.world to Technology@lemmy.worldEnglish · 9 months agomessage-square13fedilink
minus-squareScott@sh.itjust.workslinkfedilinkEnglisharrow-up0·9 months agoI’m just trying to get my hands on some faster hardware, https://groq.com has been able to do some crazy shit with their 500/tokens/sec on their LPUs
minus-squareLmaydev@programming.devlinkfedilinkEnglisharrow-up0·edit-29 months agoThat is insanely fast! I figured we’d be getting “AI cards” at some point soon.
minus-squareLojcs@lemm.eelinkfedilinkEnglisharrow-up0·9 months agoWhat kind of a website is that? Super slow and doesn’t work without web assembly. Do you really need that for a simple interface
minus-squareScott@sh.itjust.workslinkfedilinkEnglisharrow-up0·9 months agoIt’s not about their frontend, they are running custom LPUs which can process LLM tokens at 500/sec which is insanely impressive. For reference with a max size of 2k tokens, my dual xeon silver 4114 procs take 2-3 minutes.
minus-squareAmaltheamannen@lemmy.mllinkfedilinkEnglisharrow-up0·9 months agoIsn’t it those that cost $2000 per 250mb of memory?? Meaning you’d about 350 to load any half decent model.
minus-squareScott@sh.itjust.workslinkfedilinkEnglisharrow-up0·9 months agoNot sure how they are doing it, but it was actually $20k not $2k for 250mb of memory on the card. I suspect the models are probably cached in system memory.
I’m just trying to get my hands on some faster hardware, https://groq.com has been able to do some crazy shit with their 500/tokens/sec on their LPUs
That is insanely fast! I figured we’d be getting “AI cards” at some point soon.
What kind of a website is that? Super slow and doesn’t work without web assembly. Do you really need that for a simple interface
It’s not about their frontend, they are running custom LPUs which can process LLM tokens at 500/sec which is insanely impressive.
For reference with a max size of 2k tokens, my dual xeon silver 4114 procs take 2-3 minutes.
Isn’t it those that cost $2000 per 250mb of memory?? Meaning you’d about 350 to load any half decent model.
Not sure how they are doing it, but it was actually $20k not $2k for 250mb of memory on the card. I suspect the models are probably cached in system memory.