Almost every website and services are getting scraped at alarming rate, are Lemmy servers facing this issue?

Please share mitigations you’ve seen applied to this.

  • ramble81@lemmy.zip
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    22 hours ago

    They don’t really need to scrape. They just have to set up their own federated instance and the ActivityPub protocol will willingly hand it all to them in a nicely parsable format.

  • CaptainBasculin@lemmy.ml
    link
    fedilink
    arrow-up
    0
    ·
    1 day ago

    It’s very easy for any activitypub content to be scraped, all servers practically serve the content on a silver platter to any federated server.

  • safesyrup@feddit.org
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 day ago

    I think lemmy content is scraped too, just how the whole web is beeing scraped. I do not have any proof for it though.

    I have seen a user add a like anti-commercial AI license as a footer for every comment he writes lol

    • SSUPII@sopuli.xyz
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      24 hours ago

      Those are truly useless to go against bad actors and is instead only annoying for the humans that read. And good actors with proper licenses won’t be scraping Lemmy, Reddit or Twitter.

      You just cannot prevent it on Lemmy because if an instance places filters like Anubis, another will not. And it is not feasable to mandate every instance to do so. Also, this is an open platform by nature and there is no group or company that can mandate rules of access. As you are limiting non-humans, you might also be limiting real users with peculiar configurations or under heavy privacy middlewares.

      • Captain Beyond@linkage.ds8.zone
        link
        fedilink
        arrow-up
        0
        ·
        20 hours ago

        The point (as I see it) is not so much to stop scraping as it is to prevent bots from effectively DDOS-ing web services. As others have said ActivityPub content is public and there are ways to get it without slamming instances with scraper bots.

    • potatoguy@potato-guy.space
      link
      fedilink
      arrow-up
      0
      ·
      1 day ago

      It is, I saw claudebot and gptbot scraping my instance, made a post about it on fuckai, but i have blocked all these bots now and my instance is a lot faster.

      • Forester@pawb.social
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 day ago

        Out of curiosity, I am not familiar with the stack that runs the behind the scenes at all for lemmy. Are you blocking IP ranges or something else?