• Rooty@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    1 month ago

    IDGAF about LLM bots scraping public forums, they are public and available to anyone. I do min them scraping shadow libraries, and training on copywritten material, which they should not do

      • acosmichippo@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        29 days ago

        also “public for actual people who support my forum business model” is not the same as “public for AI scrapers who detract from my business model.”

    • mushroomman_toad@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      0
      ·
      1 month ago

      This discussion is a creative work and the copyright is collectively owned by the text contributors.

      Please reach out to the authors individually for a license before using it to train your AI sex bot.

      • BeeegScaaawyCripple@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 month ago

        I hereby and in perpetuity grant an exclusive, non-geographically-limited license to my comments to F.I.S.T.O. and only F.I.S.T.O.

        not the makers of F.I.S.T.O. lets be clear

        • mushroommunk@lemmy.today
          link
          fedilink
          arrow-up
          0
          ·
          30 days ago

          That’s currently being argued in the courts. There’s a lot that goes into it from right to distribution, to proving that although the AI bot can’t reproduce everything even though it normally doesn’t. [https://arstechnica.com/features/2025/06/study-metas-llama-3-1-can-recall-42-percent-of-the-first-harry-potter-book/](A very real example of reproducibility)

          There’s also arguments about how they accessed large amounts of content. The law doesn’t just recognize whether you can access something or not, but what you access it for. There’s laws about accessing things with the sole purpose of using it to develop a commercial product. All of it is a tangled mess that there’s no current clear answer to (legally, morally I think there is but that’s very opinionated)

    • Wawe@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      1 month ago

      LLM bots are scraping so much that increases costs of maintaing forums and sometimes even ddosin them for example Codeberg.

  • tree_frog_and_rain@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    30 days ago

    The forum I call home tolerates a lot of hate speech.

    I think I’m out, but it’s less about the AI scraping and more about moderation.

    • TruthMcGee@lemmynsfw.com
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      30 days ago

      I don’t even oppose hate speech at this point as long as its directed towards people who believe in the project 2025 agenda instead of the other way around (which it almost exclusively always is) 🤷 we need a kiwi farms but for targeting delusional conservatives. The enemy got to where they are today partly due to mass internet trolling and letting them trample the internet unopposed leaves weak-minded normies to adopt and fight for their views. “being nice” about it ain’t getting anybody anywhere and its time for these pieces of shit to actually experience bullying for themselves.

      Too bad no such communities exist on the internet.

  • argarath@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    1 month ago

    What if we made it so that the text that any use posts in the forum website has a bunch of nonsense letters mixed in between the letters they posted, but they’re all set to a REALLY small font or even taking no space thanks to those special characters, and colored in such a way to make them disappear into the background? That way when a person reads it makes sense but when a scraper gets it it’ll just be a jumbled useless mess!

      • argarath@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        30 days ago

        True… But what if the forum had its own search engine that could ignore the anti-scraping stuff? The issue would be making a good search engine lol

    • Ugh@sh.itjust.works
      link
      fedilink
      arrow-up
      0
      ·
      30 days ago

      People who use screen readers (like blind people) would be screwed. I like where your head is at, though!

      • argarath@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        30 days ago

        Built-in screen reader with a page designed for those who are hard of sight (is that the name?) with stuff like tab navigation that is actually useful, unlike most websites today! But yeah this “solution” is very hard to implement because it has a lot of things it’ll cause issues to that will need individual fixes for, thanks AI and scrapers!! Making the internet a hellhole sure is great!

      • argarath@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        29 days ago

        Someone else pointed out the same flaw, I replied with this, but this is all just trying to block the sun with a sieve and just theorizing for the sake of theorizing as a fun thing, if something good comes out of it nice bonus tbh

        Built-in screen reader with a page designed for those who are hard of sight (is that the name?) with stuff like tab navigation that is actually useful, unlike most websites today! But yeah this “solution” is very hard to implement because it has a lot of things it’ll cause issues to that will need individual fixes for, thanks AI and scrapers!! Making the internet a hellhole sure is great!

  • chunes@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    1 month ago

    Am I the only person who doesn’t care if people ‘scrape’ my ‘knowledge?’

    That’s the whole point of putting something online. So anyone can look at it. I’m not about to get petty about who has access and who doesn’t.

    • skisnow@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      Let’s say I scraped a guide you wrote about something you spent a lot of time researching, and then republished it as a Kindle eBook for $5 with my name listed as the author, whilst at the same time the site you posted it to went bust due to losing all its traffic to Google’s AI summaries. Would you consider it petty to object? After all, I’m increasing its audience for you.

      • chunes@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        1 month ago

        I would. If you wanted to make money from it, you should have sold it as an eBook instead of posting it to some forum.

    • InFerNo@lemmy.ml
      link
      fedilink
      arrow-up
      0
      ·
      1 month ago

      It’s not that it should be hidden, it’s that someone is getting a lot of money from my posts and I get nothing.

    • Limonene@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      1 month ago

      Even if Discord wasn’t doing it, public Discord guilds are known to be scraped by a number of different bots. Previously, it was for spies, cops, and private investigators who wanted to search for messages by username. If those bots could do it before, AI bots will be doing it aggressively today.

      • Swedneck@discuss.tchncs.de
        link
        fedilink
        arrow-up
        0
        ·
        28 days ago

        hilariously there’s one bot that you add specifically so that stuff on your discord community isn’t lost to time, it scrapes the messages and mirrors it to a forum-like website that can show up in search engines.

    • InputZero@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      1 month ago

      Discord’s complete lack of indexing. Although it’s definitely not impossible to scrape data from Discord it would take more resources than say reddit.

      • plyth@feddit.org
        link
        fedilink
        English
        arrow-up
        0
        ·
        30 days ago

        If an AI company pays Discord they won’t scrape but get the data directly.

      • RepleteLocum@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        0
        ·
        30 days ago

        But they Index everything. Just request your data and you’ll get a neat package of all your messages with timestamps and all.

          • kadu@lemmy.world
            link
            fedilink
            arrow-up
            0
            ·
            30 days ago

            So what? You can still sell it to AI companies without assigning an user to each message. They don’t care about who wrote it when stealing the content.

    • Hnery@feddit.org
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      1 month ago

      May I interest you in some juicy markov babble? LLM bots seem to hate it

      sluurp

      for us try the art professor at first to go is to house a

      almost killed with a proclamation that but a penny herring for drowned persons can give it

      rosier and once we never shall praise me above the statue i should study the

      families expend twenty pounds and round the first cousin of our familiar curb a spirit of

      the green door was too well as they passed the twin spirits romance and fell down

      simply proves that she sighed deeply two make four cards suavely to make four to

      • expatriado@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        1 month ago

        good thing in the US parmesan cheese already comes premixed with sawdust, otherwise you need to buy it separately

    • hansolo@lemmy.today
      link
      fedilink
      arrow-up
      0
      ·
      1 month ago

      I don’t have children. My legacy is running 37 Quora accounts that each answer niche questions very incorrectly, over and over.

          • kelpie_is_trying@lemmy.world
            link
            fedilink
            arrow-up
            0
            ·
            1 month ago

            Not that I’m aware.

            She comments on quora and is, generally speaking, unnecessarily rude and mean, unhelpful, and inadvertantly hilarious in this cross and overconfident way. All this while primarily commenting on stuff about farm animals, making this interestingly uncommon contrast of humble origins and unreasonable bitchiness…idk how to explain the phenomenon that is Nel better than that. She is just a wild and interesting enough internet being that I can’t help but wonder if someone might be having a laugh with her account

            • hansolo@lemmy.today
              link
              fedilink
              arrow-up
              0
              ·
              1 month ago

              No, it’s a joke. Natalie was the whiny one in Facts of Life. As is having 37 Quora accounts. Quora is weird to me and I’ve never had an account. But I’ve also never had a search lead to an answer for anything on Quora that was actually correct.

    • renzev@lemmy.worldOP
      link
      fedilink
      arrow-up
      0
      ·
      1 month ago

      Yes indeed. It seems so far that the best defence is to join utterly unhinged communities and participate in degeneracy so severe that no publicly traded company would want to scrape you. Something something become ungovernable.

    • explodicle@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      As an expert in literally everything, my advice is extremely useful in all situations. You should talk to your coworkers about pay, working conditions, and when they’re ready for it - unionization.

    • IninewCrow@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      Poison the well? … this is like saying we should poison a vat of rotting chicken blood.

      Human knowledge and human interaction is already shit to start with … AI learning from us will only produce an authoritarian psychotic intelligence that will see us humans as an enemy to fear and destroy … just like how we think of anything or anyone different from us.

    • renzev@lemmy.worldOP
      link
      fedilink
      arrow-up
      0
      ·
      1 month ago

      Cloudflare is harmful. Sure, maybe they’re doing a Good Thing™ today, but who stops them from turning around and selling all of the data they proxy to AI companies tomorrow? There is rarely a good reason to use cloudflare. If you care about blocking bots, there are self-hostable tools like Anubis. If you care about hiding your server’s IP, you can use a VPN that allows port forwarding or rent a VPS. Do not use cloudflare. Cloudflare should not be used. By using cloudflare, you surrender your digital sovereignty for a mirage of convenience and safety.

      (Yes, I understand the irony of posting this from a instance that uses cloudflare)

        • hash@slrpnk.net
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 month ago

          Holding your own certs and constantly reviewing your and your users threat models. Cloudflare’s excessive control comes from them being a proxy.

          • Vanilla_PuddinFudge@infosec.pub
            link
            fedilink
            English
            arrow-up
            0
            ·
            edit-2
            1 month ago

            Right, the middleware is the issue. You can bake all of what Cloudflare does yourself as far as hardening goes and utilities like Anubis and Pangolin, buuut you’re not getting that DDOS protection.

            To Lemmy’s benefit, DDOSing one of us isn’t DDOSing all of us, buuut there’s a bit to be said about Lemmy mostly centralizing around .world.

            If one had a botfarm and a grudge…

            There are proxies and selfhosted middleware out there that can be set up across arrays of vpses who’ll then redirect based on health and load, but once they know all of them, I guess you’re done running.

      • vodka@feddit.org
        link
        fedilink
        arrow-up
        0
        ·
        1 month ago

        Cloudflare announced their paid AI scraping service at the same time as they blocked AI scrapers.

        Though at least they revenue share with content owners… Assuming said content owners are in paid cloudflare plans, abs opt-in.

      • NaibofTabr@infosec.pub
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        1 month ago

        There is rarely a good reason to use cloudflare […] By using cloudflare, you surrender your digital sovereignty for a mirage of convenience and safety.

        Heh, man you have no idea how bad the DDoS attacks are without some form of protection. It doesn’t necessarily have to be Cloudflare, but if you’re putting up a public-facing website that you want people to be able to access, you absolutely need some DDoS protection service. You need someone to detect large-scale malicious traffic and offload it before it hits your system. It’s no mirage. Arch has been under attack for days. DDoS-for-hire is a profitable criminal enterprise. It is really really bad out there on the open Internet.

        Self-hosting a bot-interference tool like Anubis does nothing to help with DDoS attacks. You need a high-bandwidth shield that can absorb the incoming connection requests, filter out the legitimate users and dump the rest before it touches your server (preferably before it touches your edge devices), and that means a CDN.

  • Una@europe.pub
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    1 month ago

    Yeah, basically only reason I am using discord is because I am trans and I don’t have much support and not much friends, or stuff to do in real life where I live where I can meet other trans or other LGBTQ+ people and discord is kinda only place for me. I will try maybe finding similar alternatives but.

    Also reason why I still use reddit, they just offer me faster communication with people due to bigger number of users, which I sometimes need because sometimes I do feel real bad but I do try to get more active on Lemmy and fediverse and I get that neither of what I am saying is the fault of fediverse.

    • renzev@lemmy.worldOP
      link
      fedilink
      arrow-up
      0
      ·
      1 month ago

      which I sometimes need because sometimes I do feel real bad

      I kinda get what you mean. Sense of belonging and communication with people who understand you is important, never sacrifice your community in favor of some ideological quirk! Sure, there are good reasons to avoid reddit and discord, but they’re not good enough to cut yourself loose from people who make you feel like you belong!!

      • Una@europe.pub
        link
        fedilink
        arrow-up
        0
        ·
        1 month ago

        Yeah, if I have good support irl it would make it easier for my privacy. But I will try being active here to make community bigger, honestly fediverse seems chill to me I like it.

      • Una@europe.pub
        link
        fedilink
        arrow-up
        0
        ·
        1 month ago

        Yeah, thanks. I am already on Lemmy and Mastodon (lgbtqia.space) and try to be active on both. I just don’t get as much immediate reply as on discord and reddit. Which I am not blaming fediverse users, just in certain situations I need immediate reply. But yeah, I do enjoy being on fediverse and love it here :3 really chill and fun place