I’m looking at people running Deepseek’s 671B R1 locally using NVME drives at 2 tokens a second. So why not skip the FLASH and give me a 1TB drive using a NVME controller and only DRAM behind it? The ultimate transistor count is lower on the silicon side. It would be slower than typical system memory but the same as any NVME with a DRAM cache. The controller architecture for safe page writes in Flash and the internal boost circuitry for pulsing each page is around the same complexity as the memory management unit and constant refresh of DRAM and it’s power stability requirements. Heck, DDR5 and up is putting power regulation on the system memory already IIRC. Anyone know why this is not a thing or where to get this thing?

  • 𞋴𝛂𝛋𝛆@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    0
    ·
    19 days ago

    I’m talking about volatile memory instead of persistent storage. I think we’re on different pages, no pun intended - lie

    • UberKitten@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      0
      ·
      19 days ago

      a ram drive is volatile memory. you can get higher performance out of DRAM chips attached to the CPU memory controller, versus putting them behind the PCIe bus using NVME. for applications that only work with file systems, a RAM drive works around that limitation. so, why bother?

      • 𞋴𝛂𝛋𝛆@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        0
        ·
        19 days ago

        Enormous model access. It is too slow for real time but it is possible to load a 671 billion parameter model in 2 bit quantization with 4 experts at barely over 2 tokens a second. That is the hyperbolic extreme. It really means that I go from my present limit around 70B and can double it. The more interesting aspect is what is possible with a larger mixture of experts model. The new Llama 4 models are exploring this concept. I already run a 8×7B model most of the time because it is so fast and nearly on par with a 70B. One of the areas that is really opening up right now is a MoE built out of something like a 3B and run at full model weights on a 16 GB GPU. It would be possible to do a lot with a model like that because fine tuning individual 3B models is accessible on that same 16GB GPU. Then it becomes possible to start storing your own questions and answers and using them for training in niche subjects, then stitch them together in your own FrankenMoE. Like the main training set used to teach a model to think like Deepseek R1 is only 600 questions long. If you actually read most of this kind of training dataset it is half junk most of the time. If a person that knows how to prompt well saves their own dataset, much better results will follow. A very large secondary nonvolatile storage makes it much more reasonable to load and unload 200 - 400 GB a few dozen times a day. With an extensive agentic toolset, up that by an order of magnitude. If the toolset is automated with several models being swapped out, raise that another order of mag

    • e0qdk@reddthat.com
      link
      fedilink
      arrow-up
      0
      ·
      19 days ago

      What they mean is you can just do something like mount -t tmpfs -o size=1G tmpfs /mnt/ramdisk to get a file system using regular RAM already.

      • solrize@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        19 days ago

        There are few motherboards with enough dram channels to handle a TB of dram. That’s basically why ram drives existed back in the day, and they are still potentially sort of relevant for the same reason. No a TB of ram wouldn’t fit on an m.2 card, but a box of ram with a pcie connector is still an interesting idea. Optane also aimed for this, but it never got enough traction to be viable, so it was scrapped.

        • e0qdk@reddthat.com
          link
          fedilink
          arrow-up
          0
          ·
          19 days ago

          You can get motherboards with enough slots if you’re willing to pay enterprise prices for them. I have a system with 1TB of RAM at work that I use as a fast data cache. I just mount tmpfs on it, write hundreds of gigs of data into it (overwriting it all every few hours), and it works great. Cost was somewhere in the $10~15K (US) range a few years ago, IIRC. Steep for an individual, sure, but not crazy for an organization.

          • solrize@lemmy.world
            link
            fedilink
            arrow-up
            0
            ·
            19 days ago

            Last I saw you had to use super premium ultra dense memory modules to get 1tb into a motherboard. Maybe that’s less so now. But the hope would be to use commodity ram and CPUs etc. 10k for a 1tb system is pretty good. Last I looked it was a lot more.

            • e0qdk@reddthat.com
              link
              fedilink
              arrow-up
              0
              ·
              19 days ago

              Pretty sure you can do much better than that now (plus or minus tariff insanity) – quick check on Amazon, NewEgg, etc. suggests ballpark of $5K for 1TB RAM (Registered DDR4) + probably compatible motherboard + CPU.

      • 4am@lemm.ee
        link
        fedilink
        arrow-up
        0
        ·
        edit-2
        19 days ago

        I think OP is looking for fast, temporary storage; like a buffer or a ramdisk.

        But, PCIe bus is slower than the memory bus. Just use actual RAM for this.

      • Dudewitbow@lemmy.zip
        link
        fedilink
        arrow-up
        0
        ·
        19 days ago

        OP is basically looking for the hypothetical endgame intel optane thats almost as fast as ram. he doesnt however know that the optane project is functionally dead.