If somebody wants to use my online content to train their AI without my consent I want to at least make it difficult for them. Can I somehow “poison” the comments and images and stuff I upload to harm the training process?

  • affenlehrer@feddit.org
    link
    fedilink
    arrow-up
    0
    ·
    17 hours ago

    LLMs learn to predict the next token following a set of other tokens they pay attention to. You could try to sabotage it by associating unrelated things with each other. One of the earlier ChatGPT versions had a reddit username associated with lots of different stuff, it even got it’s own token. SolidGoldMagikarp or something like that. Once ChatGPT encountered this token it pretty much lost it’s focus and went wild.