• lazynooblet@lazysoci.al
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    How is blocking scrapers easy?

    This instance receives 500+ IPs with differing user agents all connecting at once but keeping within rate limits by distribution of bots.

    The only way I know it’s a scraper is if they do something dumb like using “google.com” as the referrer for every request or by eyeballing the logs and noticing multiple entries from the same /12.

    • rumba@lemmy.zip
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      Exactly this, you can only stop scrapers that play by the rules.

      Each one of those books powering GPT had like protection on them already.