• Lojcs@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      What kind of a website is that? Super slow and doesn’t work without web assembly. Do you really need that for a simple interface

      • Scott@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        It’s not about their frontend, they are running custom LPUs which can process LLM tokens at 500/sec which is insanely impressive.

        For reference with a max size of 2k tokens, my dual xeon silver 4114 procs take 2-3 minutes.

        • Amaltheamannen@lemmy.ml
          link
          fedilink
          English
          arrow-up
          0
          ·
          9 months ago

          Isn’t it those that cost $2000 per 250mb of memory?? Meaning you’d about 350 to load any half decent model.

          • Scott@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            0
            ·
            9 months ago

            Not sure how they are doing it, but it was actually $20k not $2k for 250mb of memory on the card. I suspect the models are probably cached in system memory.