I’m looking at people running Deepseek’s 671B R1 locally using NVME drives at 2 tokens a second. So why not skip the FLASH and give me a 1TB drive using a NVME controller and only DRAM behind it? The ultimate transistor count is lower on the silicon side. It would be slower than typical system memory but the same as any NVME with a DRAM cache. The controller architecture for safe page writes in Flash and the internal boost circuitry for pulsing each page is around the same complexity as the memory management unit and constant refresh of DRAM and it’s power stability requirements. Heck, DDR5 and up is putting power regulation on the system memory already IIRC. Anyone know why this is not a thing or where to get this thing?

  • CitricBase@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    19 days ago

    You are asking for 1TB of RAM. Keying it to M.2 wouldn’t make it any cheaper or better than keying it to regular DDR5. I don’t think that even just a tenth of that would physically fit onto an NVMe drive, even if someone wanted it to.

    Put in that context, do you begin to see now why that isn’t a thing that exists?

    • 𞋴𝛂𝛋𝛆@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      19 days ago

      DRAM is 1 transistor per bit. Flash is 3 in a larger overall footprint, but these are deposited in multiple stacked layers. The actual dies are not much different if the packaging is removed. Most decent quality NVME drives have a large amount of DRAM cache already onboard. It is trivial to make the entire drive DRAM. This is a product that should exist for the niche of AI models and it probably does, but I’m unable to find it in the modern dystopian internet.