Reddit says Microsoft’s Bing, Anthropic, and Perplexity have scraped its data without permission. “It has been a real pain in the ass to block these companies.”

  • cygnus@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    3 months ago

    Sort of, but not really. From the Reddit ToS (emphasis mine):


    By submitting Your Content to the Services, you represent and warrant that you have all rights, power, and authority necessary to grant the rights to Your Content contained within these Terms. Because you alone are responsible for Your Content, you may expose yourself to liability if you post or share Content without all necessary rights.

    You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

    When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

    • bionicjoey@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      Beyond that, if you are serving webpages with data on them, you don’t get to decide what people do with those pages. They can’t stop search engines from scraping

      • bassomitron@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        3 months ago

        Just to nitpick, they can stop scraping, anyone can. However, doing so would require implementing barriers that tend to also negatively effect sites that are dependent on being discovered and browsed.

    • chakan2@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      Lol…really? So the can reuse, modify, and remove all association with your content, but somehow you think you still own it?

      I’ve got a bridge to sell you.

      • cygnus@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        3 months ago

        In essence, it means that you reserve the right to also use the content for your own purposes, without Reddit having any recourse to preventing you from doing that.

        • chakan2@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          3 months ago

          Except they published your work, all variants of said work, and completely eliminated you as the author of said work.

          I don’t know how else to explain to you that you don’t own that work anymore. You have rights to it. But you don’t own it.

          • cygnus@lemmy.ca
            link
            fedilink
            English
            arrow-up
            0
            ·
            3 months ago

            It’s the opposite; you own it, but Reddit also have rights to it.

    • ReallyActuallyFrankenstein@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      It’s actually a fascinating bind Steve/Reddit has put themselves in. Because it is a non-exclusive license, you can affirmatively declare your content is free for anyone to scrape or use.

      After that, if Reddit ever asserts rights over your content by, say, suing Microsoft for improperly using your content in training data, you now have a legal claim against Reddit for interference with either your ownership rights or with a contract via whatever license you have made your content available under.

      Now, maybe Reddit has a claim release in their TOS, but it wouldn’t prevent you from getting an injunction enjoining Reddit from restricting your data from being used by Microsoft.

      It’s kind of academic, because… it’s not really a victory that Microsoft is also training its AI on your data. But, hey, they’re probably doing it anyway and at least this way we get to screw over Huffman for being an ass.

      • Bookmeat@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        3 months ago

        The only issue I see with this is that it can be argued that this license doesn’t grant third parties access to data on Reddit’s platform.

      • cygnus@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        3 months ago

        MS couldn’t access that content without scraping the page itself, though, which of course belongs to Reddit. From a legal standpoint, it’s like a paywall.

    • Bookmeat@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      It’s right there in the ToS: NON-EXCLUSIVE license. If they go to court, I would guess they lose.