- cross-posted to:
- technology@beehaw.org
- cross-posted to:
- technology@beehaw.org
Research Findings:
- reCAPTCHA v2 is not effective in preventing bots and fraud, despite its intended purpose
- reCAPTCHA v2 can be defeated by bots 70-100% of the time
- reCAPTCHA v3, the latest version, is also vulnerable to attacks and has been beaten 97% of the time
- reCAPTCHA interactions impose a significant cost on users, with an estimated 819 million hours of human time spent on reCAPTCHA over 13 years, which corresponds to at least $6.1 billion USD in wages
- Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set
- Google should bear the cost of detecting bots, rather than shifting it to users
“The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service,” the paper declares.
In a statement provided to The Register after this story was filed, a Google spokesperson said: “reCAPTCHA user data is not used for any other purpose than to improve the reCAPTCHA service, which the terms of service make clear. Further, a majority of our user base have moved to reCAPTCHA v3, which improves fraud detection with invisible scoring. Even if a site were still on the previous generation of the product, reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling.”
reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling
That’s funny, because when I’m faced with this, I keep adding/removing one of the image randomly and it keeps accepting them as ok.
I like this strategy.
reCAPTCHA is exploiting users for profit
Well duh.
reCAPTCHA started out as a clever way to improve the quality of OCRing books for Distributed Proofreaders / Project Gutenberg. You know, giving to the community, improving access to public-domain texts. Then Google acquired them. Text CAPTCHAs got phased out. No more of that stuff, just computer vision rubbish to improve Google’s own AI models and services.
If they had continued to depend on tasks that directly help community, Google would at least have had to constantly make sure the community’s concerns are met. But if they only have to answer to themselves for the quality of the data and nobody else even gets to see it, well, of course it turned into yet another mildly neglected Google project.
Then Google acquired them. Text CAPTCHAs got phased out
Google kept the text version for five years after the acquisition though. They used it to digitize books on Google Books, to allow full-text search of their book archive.
I thought it was detecting bots based on how you are moving your mouse, etc to solve it, but if they can be solved by AI do they want their AI trained by other AI?
We already knew that, but it’s nice re to have data.
Alright, I don’t use google.com
Sites you visit use Google, their recaptcha, their analytics, their ads.
But you might still be using their captcha
There’s nothing that can express my disdain for Google’s reCaptcha.
😒 We’re training its AI models 😒 It’s free labor for Google 😒 Sometimes it wants the corner of an object, sometimes it doesn’t 😒 Wildly inconsistent 😒 Always blurry and hard to see 😒 Seemingly endless 😒 It’s the robot asking us humans if we’re the robots
I kinda figured. It was annoying to do one, but then they wanted you to do two or three and that’s absurd. Whenever it comes up now, I usually just close out.
Funny thing is they stop asking if you do them really slowly. Almost as if to tell you, you‘re too inefficient to even be an unpaid intern or something. Anyway, if they annoy you, take your time.
At a certain point I did like 10 of them, and then ended up closing the page, cause it never let me in, all because I was on a vpn
Some captchas have also just gotten obvious AI training. “Click on the living being in this image”, “Select every image of the same object as in this example image”. And the images you have to select look obviously AI generated.
Heh, I got one just the other day “Select the images containing structures built by people” lmao
“click on all people not helping with the robot uprising”
Alas, I have but one up-vote. :~(
Im surprised that this is in the news right now. This has been acknowledged as fact for a decade or so.
Relevant 1053
I still don’t get this one even after being linked to it so many times 😌🤣
Things that are common knowledge for you is not common knowledge for everyone and vice versa.
Instead of making fun of people for not knowing things, you should take the opportunity to teach so that you can get these fun moments of discovery and learning.
😮l made fun of people that did not know something?
No, I explained what the comic is trying to convey.
Just answering your question.
❤️
Someday you will, and you’ll be one of the lucky 10.000 that day.
😆👌🏻
they wanted you to do two or three and that’s absurd
Yea how about 20
if you have to do that many, you either have some privacy setting on or on a flagged ip given from a VPN
Or google knows you will out up with it and want the most interaction it can get from you.
Google’s just lonely 🥺👉👈
Well yah of course I do. Why the hell is that ‘abnormal’?
its abnormal to them because vpns are often also used by bad actors. your use is not abnormal but its a there are other people misusing it making it worse for everyone else.
Wow, way to blame individuals who take basic precautions instead of the corporations who are blantly invading your privacy. Good job making the world a better place, bud.
Most people don’t, most bots do. You look more like a bot, so you get extra challenges.
Yeah exactly
VPN? Google will just go in a loop with these things, so I just stopped using Google completely.
Whenever I’m on a private window the captchas just keep on coming. Trying to reset your Steam password via the program will also trigger an infinite loop of captchas, you HAVE to use a browser.
No. But it’s also not like I get 20 constantly, it was just the worst I’ve seen. Usually it’s 2 to 5, I think.
I assume they’re just collecting data on how many are users willing to do.
One time I did five in a row, because I use VPNs for everything, and realized after the 5th time that it would have been easier to just use bing so I do that first now. Google has turned into my last last resort, which is quite funny, because that’s where Bing used to be. Lmao
That’s because you’re shady.
They knew I was committing crimes with my adblocker.
The worst kind - crimes against profit!
Elon musk wants to know what the government is going to do about you not viewing ads on Xitter
Not going to his shithole website.
Had this when at uni, mostly due to the amount of requests coming from a single IP
I tried to order some components on Digikey a few months ago and I’m still mentally scarred. Probably did a few hundred of those things over the course of 2 weeks.
Cries in battlenet sign up process
The one reason I tried to create an account and never came back
Stop using Tor…
STOP BEING SNEAKY MICHAEL
Getting served a captcha often results in me closing the tab. I’m not doing stupid puzzles for you.
I haven’t done an image one in years for the same reason.
My general internet usage has plummeted between ads and captchas and all the other modern website bullshit, which is why I am here so much.
Do them wrong and then close out
It knows they’re wrong which is why I don’t really think this article is accurate. Is it training if it already has the answers? Probably not.
That’s why it gives you a panel of 9 images. It had a high confidence on some images, and a low confidence on others. When you pick the correct images and don’t pick incorrect ones it uses the ones it’s confident about as “validation” while taking the feedback on low confidence images to update the training data.
What this means is that only ones actually being “graded” are the ones bots can solve anyway.
and it will show the images to multiple people
It’s why they ask you to do multiple, 1-2 of them are the control group, they are training on the others
You’re implying they give you multiple. I hardly ever get multiple, pretty much only if I ‘fail’ the first one.
If they have a good fingerprint on you they don’t need the control group. That’s why you get 5+ captchas when using a VPN/tor.
My understanding is different from others here. I thought they served the same Captcha to many people at once and use the majority response to decide who is answering correctly.
That’s true, or at least it used to be back when they were using it for OCR. I have no reason to believe it’s changed.
I do it right and it says I’m wrong =\
I have bad news for you friend…
You might be a robot
What do you mean? I am a fleshy human and do fleshy human things like being made of flesh.
Ever heard of bio-robots?
Time to take a knife and check for sure
Seriously /s Don’t harm yourself!
Google should bear the cost of detecting bots, rather than shifting it to users
how?
Don’t know why you’re being downvoted… My employer sees a lot of bot activity on our sites, which are hosted in AWS and protected by Akamai. It’s Akamai that informs us when a bot visits our site, and Akamai that lets us block it. Google never sees this traffic.
Yeah. Written by someone who doesn’t really understand the internet.
Considering the article states that reCAPTCHA v2 and v3 can be broken/bypassed by bots 70-100% of the time, they are obviously not the solution.
At what cost?
100% success rate isn’t even moderately useful if it costs $5 per pass. The discussion is completely pointless without a concrete, documented analysis of the actual hardware and energy costs involved.
“Google should bear the cost”
Google should shut it down and make sites roll their own verification. Give everyone a month to implement a new solution on millions of websites.
deleted by creator
I’m actually 100% for rolling your own… almost everything.
20 years ago I made an e-commerce website for a client. Looking at the code now I’m embarrassed how insecure it is. However, because it was totally custom no one ever found the bugs and it has never been cracked. (Knock on wood) that’s the benefit of not using a prebuilt solution that isn’t a target for mass exploits.
Then what is?
Maybe a billion dollar company has the budget to come up with something?
Looking at the numbers in this post, reCAPTCHA exists to make Google money, not to keep bots out.
I’d rather have no reCAPTCHA than the current state.
Hi it’s me. I work for a billion dollar company with a budget. We have no ethical ideas on how to stop bots. Thanks for coming to my tech talk.
Something something free market?
Yeah, that’s about the way I’d expect it to go.
“Traffic resulting from reCAPTCHA consumed 134 petabytes of bandwidth, which translates into about 7.5 million kWhs of energy, corresponding to 7.5 million pounds of CO2. In addition, Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set.”
There might be a tiny chance they’re not interested in changing things.
how do you get the metric of 70-100% of the time?
the best bots doing it 70-100% of the time is very different to the kind of bot your average spammer will have access to
Did you read the article or the TL:DR in the post body?
The paper, released in November 2023, notes that even back in 2016 researchers were able to defeat reCAPTCHA v2 image challenges 70 percent of the time. The reCAPTCHA v2 checkbox challenge is even more vulnerable – the researchers claim it can be defeated 100 percent of the time.
reCAPTCHA v3 has fared no better. In 2019, researchers devised a reinforcement learning attack that breaks reCAPTCHAv3’s behavior-based challenges 97 percent of the time.
So yeah, while these are research numbers, it wouldn’t be surprising if many larger bots have access to ways around that - especially since those numbers are from 2016 and 2019 respectively. Surely it is even easier nowadays.
researchers were able to defeat reCAPTCHA v2 image challenges 70 percent of the time
that doesn’t answer the question?
researchers devised a reinforcement learning attack that breaks reCAPTCHAv3’s behavior-based challenges 97 percent of the time
i’d argue “bespoke system, deployed in a very limited context, built by researchers at the top of their field” is kind of out of reach for most people? and any bot network scaled up automatically becomes easier to detect the further you scale it
the cost of just paying humans to break these already at or below pennies per challenge
Sometimes I think writers just try to find things to be edgy about. The straws this grasps at it are incredible. Might as well complain from the billions of unpaid man hours people provide by providing common courtesy for free.
Remember the good old days when it was just malformed text you have to solve? I miss those days. AI was complete garbage and they had to use farms of eyeballs to solve them for bots, making it a costly operation. We’ve now totally gotten away from all of that.
that was also to train ai.
No it wasn’t… It was human-assisted OCR to help digitize books. Initially for Project Gutenberg, but then for Google Books once Google acquired it in 2009.
OCR is a form of AI.
Gonna have to disagree hard with this, based on extensive first-hand experience (web dev). I’ve added CAPTCHA to dozens (hundreds?) of web forms, and it all but eliminates spam.
It works against basic bots, but if you’ve got a dedicated adversary, it doesn’t do anything
(Granted, most people do not have dedicated adversaries, but when they come, you’re in trouble)
OK, sure, but that’s like saying it’s pointless to use a secure password online because the NSA could hack you if they wanted to.
Right, so similar to locks? Usually can be easily bypassed if you know how, but it at least filters out the people who aren’t determined enough to put in the effort.
Basically, yeah. The vast majority of spambots are simple and lazy.
My experience matches yours. I don’t enjoy putting recapcha v3 on my sites but it takes contact form spam from 70-80 messages per day to 0-2.
I’d switch to other services if they could be as effective. If anybody has real-world experience with another option working I’d love to hear it.
Honestly at first read, the paper feels like a bunch of whining text to prove a point the author believes in without any alternate proposal.
I thought the whole point of reCaptcha was to provide a reliable set of data to train bots. Entering a fuzzy scanned word, identifying bikes and traffic lights, etc.
The fact that they’ve now got that, and the bots are trained is hardly a surprise.
Without captchas the problem of spambots would still be a million times worse.
Yup. I like Cloudflare’s checkbox, it works well and probably catches more bots than reCaptcha while being simple for humans.
How does that checkbox work? Does it just look at your cookies?
No, it tracks things like mouse movements to see if it looks human or like a bot. Humans don’t move the mouse in a straight line, there’s some jitter and whatnot, whereas bots will look quite a bit different.
That’s super easy to fake for a bot…
It’s a ton more than mouse movement. Lots of browser fingerprinting for example and tracking.
Yup. It does do a lot more than the checkbox, but the checkbox itself mostly does mouse movement and click tests.
I always thought they are just getting the training data for AI using these.
I had to deal with one yesterday that wouldn’t let me in no matter what I did.
So it isn’t even good at figuring out who isn’t a robot.
Solving too fast. I shit you not. Sometimes you have to go really slow. Like you’re 80 and can’t see very well trying to discern what’s in those boxes.