• ylai@lemmy.mlOP
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    The situation is somewhat different and nuanced. With weights there are tools for fine-tuning, LoRA/LoHa, PEFT, etc., which presents a different situation as with binaries for programs. You can see that despite e.g. LLaMA being “compiled”, others can significantly use it to make models that surpass the previous iteration (see e.g. recently WizardLM 2 in relation to LLaMA 2). Weights are also to a much larger degree architecturally independent than binaries (you can usually cross train/inference on GPU, Google TPU, Cerebras WSE, etc. with the same weights).

    • sweng@programming.dev
      link
      fedilink
      arrow-up
      0
      ·
      1 year ago

      How is that different then e.g. patching a closed-sourced binary? There are plenty of community patches to old games to e.g. make them work on newer hardware. Architectural independence seems irrelevant, it’s no different than e.g Java bytecode.

      • ylai@lemmy.mlOP
        link
        fedilink
        arrow-up
        0
        ·
        edit-2
        1 year ago

        This is a very shallow analogy. Fine-tuning is rather the standard technical approach to reduce compute, even if you have access to the code and all training data. Hence there has always been a rich and established ecosystem for fine-tuning, regardless of “source.” Patching closed-source binaries is not the standard approach, since compilation is far less computational intensive than today’s large scale training.

        Java byte codes are a far fetched example. JVM does assume a specific architecture that is particular to the CPU-dominant world when it was developed, and Java byte codes cannot be trivially executed (efficiently) on a GPU or FPGA, for instance.

        And by the way, the issue of weight portability is far more relevant than the forced comparison to (simple) code can accomplish. Usually today’s large scale training code is very unique to a particular cluster (or TPU, WSE), as opposed to the resulting weight. Even if you got hold of somebody’s training code, you often have to reinvent the wheel to scale it to your own particular compute hardware, interconnect, I/O pipeline, etc… This is not commodity open source on your home PC or workstation.

        • sweng@programming.dev
          link
          fedilink
          arrow-up
          0
          ·
          1 year ago

          The analogy works perfectly well. It does not matter how common it is. Pstching binaries is very hard compared to e.g. LoRA. But it is still essentially the same thing, making a derivative work by modifying parts of the original.