• thawed_caveman@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    9 months ago

    I feel like AI companies have been scraping Reddit for their datasets already since the beginning and without permission. In fact, unless there’s been a regulation change that i’m not aware of, i’m not sure why they would have Reddit “sign away” the data when they can just scrape it.

    Also dubious if the current form of AI has a future. They seem like they should revolutionize every sector when you look at their capacities, but in practice their applications might be more limited than we thought?

    Anyway, if Reddit does go public i will be deleting my account within the hour. The only reason i haven’t yet is that i’ve been a moderator of the same subreddit for eight years and it’s the only thing that’s been consistent in my life in that time, i’m kind of attached. The reason i will is i didn’t sign up to create value for shareholders, i signed up to create value for a community.

    • RunningInRVA@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      You need to go ahead and delete your account and give up the ghost on modding whatever sub you are referring to. I’m tired of these types of posts where you are both beholden to Reddit and also not. Pick a dang side.

      • thawed_caveman@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        Well no, because the old sub will continue to exist and will therefore always be where everyone goes until Reddit itself dies. I really doubt admins would let me delete the sub.

    • ChunkMcHorkle@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      I feel like AI companies have been scraping Reddit for their datasets already since the beginning and without permission.

      Well yeah, Sam Altman (Open AI) was even on the board of Reddit for a while. It’s a safe bet that they’ve been doing it for years.

  • Verserk@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    Considering some of the very wrong and upvoted domain specific knowledge I’ve seen on Reddit over the years I’m not sure the training data is going to be useful for much beyond what every other model can do.

    • Lvxferre@mander.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      A LLM that behaves like a typical Redditor? // What possible use is that?

      • [You] “Chatbot, please tell me which pokemon types are strong against Fairy.”
      • [Le Lebbit Moronbot] “I’m not sure if I understand, you calling me a chatbot? I’m so confused lol”
      • [You] “Moronbot, please tell me which pokemon types are strong against Fairy.”
      • [LLM] “Actually, you should be spelling it “Pokémon” lol”
      • [You] “Moronbot, which types are strong against Fairy?”
      • [LLM] “I assume you talking about fairies. Fairies are from mythology lmao”
      • [You] “Did people really waste water and electricity for this trash?”
      • [LLM] “Waaah, you’re toxic!!111one”
    • honey_im_meat_grinding@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      What possible use is that?

      I’ve noticed “has this sub gotten more right wing recently?” posts reaching the top post of the day in the last 6 months or so. r/norge and r/unitedkingdom being examples. You can automate bots that change a subreddit’s consensus on certain topics by bot-spamming threads pertaining to those topics, especially in the first hour of a thread going up. I don’t know if that’s happening, or if it has more to do with the Reddit protest that saw mods abdicate their positions last June and new mods being responsible for the change… but it could also be a bit of both.

      • IchNichtenLichten@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        You’ll get your refund eventually but first it will try and gaslight you that Air Canada is a woke mind virus before calling you an asshole and then stalking you.

        • pdxfed@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          9 months ago

          “instead of the $3.50 refund, I’m also authorized to offer you some June 2025 $350 GME calls.”

    • leaky_shower_thought@feddit.nl
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      A redditor bot is a viable example of a forum member bot.

      IMO, I don’t think it can drive topics, but it could make things controversial.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      0
      ·
      9 months ago

      Negative examples are often just as useful for training an AI as positive ones. And it all depends on what you want to use the AI for. A moderator bot, for example, needs familiarity with the whole range of user responses it might see.

      • aidan@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        9 months ago

        That gives me actually a fun idea for a Lemmy instance, it has an automated review process that bans posts/comments that are too similar in style to reddit posts/comments.

  • Voyajer@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    This is why I don’t blame anyone for editing/deleting their post history on reddit.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      0
      ·
      9 months ago

      I do. It’s frankly selfish. Having an AI get training on my old comments costs me nothing and it results in the development of useful AI tools. Trying to sabotage that is petty and pointless. It’s not like you could somehow collect the fraction of a pittance that you think you’re owed retroactively. I never commented on Reddit thinking “awesome, I’m going to make bank on the content I’m generating here.”

      People complain about the capitalist mindset of the world and then they do this. Sigh.

      • R00bot@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        How is not wanting capitalist companies to profit off of your content not aligned with complaining about the capitalist mindset of the world? Wtf lol.

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          0
          ·
          9 months ago

          It’s the insistence that everything that people do must be compensated with money. People have spent years posting on Reddit for fun, without any thought to being paid for it, and now all of a sudden someone else is making some money so they’re demanding that they should get their slice. And doing what they can to wreck their earlier efforts when they don’t.

          How does Reddit making some money licensing this stuff harm those of us who contributed to it? Is there any problem aside from “I wanna get paid!”?

          • R00bot@lemmy.blahaj.zone
            link
            fedilink
            English
            arrow-up
            0
            ·
            9 months ago

            Why do you think it’s about wanting a slice? They posted on Reddit with no expectation of profit. But they don’t want others to profit off it either. It’s not that complicated.

            • FaceDeer@kbin.social
              link
              fedilink
              arrow-up
              0
              ·
              9 months ago

              But they don’t want others to profit off it either.

              And that’s why I call them selfish. It doesn’t harm them in the slightest if someone else profits off of it.

              • R00bot@lemmy.blahaj.zone
                link
                fedilink
                English
                arrow-up
                0
                ·
                9 months ago

                They wouldn’t have posted if they knew this was going to happen. They posted because it was fun, not for this.

                They may be morally opposed to AI (as there are many valid reasons to be opposed to it), or they may just have wanted to have been able to make an informed decision before posting, but by retroactively training the AI on their posts they’ve robbed them of the agency to make that decision.

                That’s why they’re upset.

                • FaceDeer@kbin.social
                  link
                  fedilink
                  arrow-up
                  0
                  ·
                  9 months ago

                  They posted content on a website whose user agreement says “we can do whatever we like with the content you post here” and then go surprised-pikachu when the website goes ahead and does whatever they like with the content they posted. Frankly, I’m not tremendously sympathetic. This should have been easy to predict.

      • Nurse_Robot@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        Defending giant corporations profiting off of uncompensated individuals, while criticizing anyone who doesn’t want to provide free labor to said corporations, is a disgusting take. Are you a CEO?

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          0
          ·
          9 months ago

          The more accessible training data there is the easier it is for new AI projects to enter the field less dominant those “giant corporations” become.

          The free labour was already freely given. If someone doesn’t want to have shitposted on Reddit for free then maybe they shouldn’t have shitposted on Reddit for free.

          • Nurse_Robot@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            9 months ago

            “if you didn’t want me to steal your intellectual property, you shouldn’t have thought of it in the first place”

            • Fungah@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              9 months ago

              So, for an example of what the other user was talking about, I’m just some guy and for my first foray inyo programming / machine learning (I kind of just threw myself into the deep end) I modified stylegan 3 and trained it on about 500g of reddit porn that I scraped off reddit.

              Now, I stopped the training after about a week (it was going to take about a solid month on my rtx 2080 ti) when I found out stable diffusion existed but I learned a LOT from that experience.

              I couldn’t do that now. Arguably none of that was how any of that should be done but whatever.

            • FaceDeer@kbin.social
              link
              fedilink
              arrow-up
              0
              ·
              9 months ago

              I’m not sure what you mean here. Nothing’s being stolen. Even if you think there needs to be permission for training an AI off of data, Reddit has that permission.

              • Nurse_Robot@lemmy.world
                link
                fedilink
                English
                arrow-up
                0
                ·
                9 months ago

                I assume you’re more of a moron than a troll, which is disappointing. Regardless, you’re not worth my time, as I don’t think any argument could convince you to have an open mind and be willing to change. Good luck out there!

            • QuaternionsRock@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              edit-2
              9 months ago

              No, you shouldn’t have posted it to Reddit, in which you were required to give them a perpetual license to use your IP in any way they see fit.

              For the record, I’m here because Reddit pissed me off when they axed the free API, and I’m pissed at myself for not expecting it. That’s what I get for accepting their terms and conditions, I guess.

              Edit: I also don’t accept the idea that using my content for training data is “fair use” when it is used to train proprietary models, especially ones in which the end user is allowed to prompt it to plagiarize or otherwise imitate my content.

      • TORFdot0@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        I had an 11 year old account that I deleted all my old comments and posts from because of the API debacle. Does that make me selfish that I felt like Reddit wasn’t holding up its end of the unwritten agreement?

        Reddit doesn’t deserve my content anymore than I deserve access from the third party API.

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          0
          ·
          9 months ago

          If you did it over the API debacle then you’re not one of the people I’m talking about here. This is about people deleting their content to prevent it from being used to train AIs.

          • Voyajer@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            9 months ago

            Do you not remember the reason why the API debacle happened in the first place was to prepare for this moment?

      • Voyajer@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        It’s their comment to do with as they see fit. I can’t get mad at them for wanting to erase their presence on a site they don’t use anymore.

      • gedaliyah@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        For me it’s a privacy matter. Going through old posts (whether human or machine learning) can nor be used for anything good.

      • Hackerman_uwu@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        What about people who just think “A.I.” Is dog shit and chat bots are a dumb obsession steering the industry in the wrong direction due to hype and money?

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          0
          ·
          9 months ago

          What about them? I don’t see why they’d care what AI companies are doing in that case. They’d assume they were just wasting money on this stuff.

  • FrostyTrichs@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    Enjoy training on my -checks notes- DELETED POST HISTORY YOU FUCKING CLOWNS.

    Stay ForeverFucked™ spez.

  • HuddaBudda@kbin.social
    link
    fedilink
    arrow-up
    0
    ·
    9 months ago

    Oh no! My outdated political takes and league of legends rants are going to be used to train AI!?

    We’re all doomed!

  • etrotta@kbin.social
    link
    fedilink
    arrow-up
    0
    ·
    9 months ago

    Out of all things to hate Reddit for, giving data to AI isn’t something fediverse users can really criticize it for, though making money from it perhaps.
    Remember: All data in federated platforms is available for free and likely already being compiled into datasets. Don’t be surprised if this post and its comments end up in GPT5 or 6 training data.

    • ColeSloth@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      No. I can. Reddit was bought out, uses volunteers to control all the subs but forcefully removes you from the sub you created and were supposed to have control over if you didn’t play by their ever-changing rules, ruined/eliminates third party apks by demanding WAY over ad revenue profits to have access to api with a very short notice, and shadow banned anyone and everyone in a position to do anything about any of it. It’s a corporation that gutted an entire platform in order to push agendas they want and milk as much money out of it as possible. Hell, it’s the entire reason all of lemmy gets more than 30 posts a day. So many people switched to lemmy over the past year. They ruined a website I enjoyed and I’d rather them not make more money from the thousands of posts I made from over a decade of being there.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      0
      ·
      9 months ago

      After all the hue and cry I have seen over stuff like Threads and Bluesky federation I don’t imagine most people using the Fediverse have a particularly coherent philosophy on the matter.

    • BrianTheeBiscuiteer@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      If they already, essentially, cut off API access then it’s not a big leap to limit access on the web to logged in users only and rate limit or ban accounts that behave like scrapers.

      • Verserk@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        That would matter more if it wasn’t trivial to make new accounts and very cheap to buy established ones.

    • treadful@lemmy.zip
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      The problem isn’t that AI is being trained on the data. The problem is that they locked down all third party data access so they could monetize our content. On a federated platform, everyone gets equal access and can do whatever they want with it.

      We sure can criticize them for that.

  • SVcrossDO@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    Damn it. I haven’t deleted my account due to how many people I’ve supported and helped, I stopped using it while ago. It seems I’ll have to.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      0
      ·
      9 months ago

      I’m kind of puzzled by this mindset. You were pleased with supporting and helping people before, but now supporting and helping is bad?

      • SVcrossDO@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        I’m happy that everyone has the support, but not that some specific AI can monetize that same support. I left on my Reddit account ways to contact me (including Lemmy). I helped others so good vibes could reach them, not for making the rich richer.

  • stevedidwhat_infosec@infosec.pub
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    Signed over its content.

    Just like that? No thought or anything put into what makes good vs bad training data?

    Good luck lmfao.

    Makes you wonder how hard it would be to clog up the training data with outputs from other AI models to really bake in that echo defect that they all seem to have to some extent as fast as possible. Wouldn’t that suck!

  • ozoned@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    “Reddit has given access to YOUR conversations and posts to AI companies.”. FTFY

    These were created by people, for peoole, and I will ALWAYS disagree that this data is Reddit’s or any other platforms.

    Don’t forget your direct messages aren’t end to end encrypted on Reddit, so now AI will be trained on your craziest “private” conversations

    • butterflyattack@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      now AI will be trained on your craziest “private” conversations

      I have no idea what horrible thing this will do to an LLM but I’m kind of curious.

    • DocMcStuffin@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      There’s one good news. Reddit didn’t want to pay to move all the old DMs to the new chat infrastructure. So they deleted them.

      • hdnsmbt@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        Pretty sure they just didn’t migrate to the new data structure and didn’t actually delete the raw data. They’re effectively deleted for users but not for Reddit.

    • atrielienz@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      Oh no, all the times I sent or received dodo codes from randos so we could trade animal crossing items. Whatever shall I do?

  • NutWrench@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    Reddit is all bots, porn, ads and political shit posts. Good luck getting any useful training content out of that.

    • Queen HawlSera@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      I wish it would die, because honestly some of the porn was great and Lemmy seems to be the one place on the net that doesn’t specifically ban porn, yet has none of it anyway.

      I miss bodyswap and part tf captions…

    • ladicius@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      Maybe that’s the point? Training the AI to produce the blabbering bullshit that’s preferred in social media?

    • PoliticalAgitator@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      They don’t care if the AI produced is useful, they just want to milk as much money from their content as they can.

      The API changes were almost certainly just the groundwork for this and I called it at the time. The ridiculous pricing model for API access is because it’s aimed at the hottest tech companies, not third party app developers.

      The enshittification continues because it’s what neoliberalism demands. They’ll sell your content and the data they have about you and still show you ads, because that’s the most profitable. Ethics and product quality don’t even enter into it.

      • Ilgaz@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 months ago

        Liberal market gives end users choice. If they don’t choose, they get the consequences.

        This is more like people choosing Trump like types and complaining. Alternative exists, choose it.

        • PoliticalAgitator@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          9 months ago

          “The free market can fix it” is just another neoliberal lie, pushed precisely because it doesn’t work. Rather than holding corporations accountable, it blames the population instead.

          The reality is that boycotting businesses isn’t always an option and when it is, it’s usually a luxury. Very few products are domestically and/or ethically produced and when they are, they’re extremely expensive, especially for people being fucked out of every cent by their bosses, landlords and utilities.

          It’s why the most hated companies in the world continue to bring in record profits.

          Regulations are the real answer, which is why neoliberals oppose them.

          • Ilgaz@lemm.ee
            link
            fedilink
            English
            arrow-up
            0
            ·
            9 months ago

            I really don’t care about people who behave like they are living in North Korea or who wants a North Korean Word to live in.

            Even Digg people could say “No, F you” to Digg superstar owners. It is just a damn URL to type.

  • General_Effort@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 months ago

    They say it’s $60 million on an annualized basis. I wonder who’d pay that, given that you can probably scrape it for free.

    Maybe it’s the AI act in the EU. That might cause trouble in that regard. The US is seeing a lot of rent-seeker PR, too, of course. That might cause some to hedge their bets.

    Maybe some people had not realized that yet, but limiting fair use does not just benefit the traditional media corporations but also the likes of Reddit, Facebook, Apple, etc. Making “robots.txt” legally binding would only benefit the tech companies.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      0
      ·
      9 months ago

      This is the most frustrating thing, so many people are arguing against their own interests with their efforts to “lock down” their content to prevent AIs from training on it. In this very thread I’ve been accused of being pro-giant-company when I’m quite the opposite. The harder we make it to train AI, the stronger the advantage that the existing giant companies have in this field.