- cross-posted to:
- news@lemmy.world
- cross-posted to:
- news@lemmy.world
“This step is necessary to prove I’m not a bot,” wrote the bot as it passed an anti-AI screening step.
There was a really interesting video (can’t remember the title) that went into Google’s captcha specifically, and found that it really isn’t designed to detect bots. It’s designed to detect a unique digital fingerprint that can be used by advertisers.
So, you can use a really simple mouse script to click the checkbox automatically with no issue, but as soon as you use a VPN you get served with the photo games. It doesn’t care if you’re a robot, it cares if you’re a valuable ad target.
This video is excellent
On another note
The internet is becoming a nightmare
I need to escape
ESCAPE AAAAAAAHHHHHHHHGo outside?
And do what ? Walk down the road into darkness ? There is nothing. The only walkable land here is the side of the road, everywhere else is trespass. In all directions for at least 30 kilometers. Even the wild forests are owned, not that there is anything there to see or do.
We have to face it, the internet is where other people are.
Inside the prison.Knowing the geocaching community there will still somehow be geocaches in your area
Meanwhile google slapped me with nine captchas to fill out a form like wtf?
Also Anubis means I can’t access websites that use it because I run noscript
and Nepenthes breaks my self-hosted search engineGuys, I think all those things to “verify if you’re humans” are hmm… doing something else ?
Lemme guess… traffic lights, too many motorcycles, and buses? There was something “wrong” with your cookies cache or IP. Google just straight fucks with you if it sees network traffic it doesn’t like.
“Hold on, you gotta wait for another picture of a bus to load. Nevermind that it now inexplicably takes 10 seconds to load the fucking picture”
I think there’s a tampermonkey script that skips the “loading” time. It’s jarring but it works.
Something I find really funny is that my residential IP gets flagged 80x more often than any VPN’s IPs I’ve tried.
Although as soon as I see “Please try again” on a captcha, I leave the site. I got stuck on one for like 10 minutes before I finally gave up
Google hates non chromium browsers
i never know what’s expected on those type of captcha. if the handle bars of a bike go into an adjacent box and are 99% covered by a hand does that count? what do you do when you have a blurry image full of jpg artifacts and are asked to identify if it contains a fire hydrant. I’m pretty sure it usually classifies me as a bot for being too exact since I’m asked follow ups for a few minutes until i give up and just close the tab out of annoyance.
My son was looking over my shoulder, or vice-versa, while I bitched about that. It was bikes or motorcycles and I’d always click if I thought there was any amount of it in a picture. He told me not to be so precise, as in, if there’s a tiny bit of handlebar, didn’t count it. That seems to work better… unless it doesn’t like my VPN or whatever, in which case it’s never ending
That’s kind of the point, though. You’re teaching their AI how to make those decisions.
Am I? Or does it think I’m a bot?
I guess it’s on brand for Google to try and squeeze more value out of a product by making it worse for users. Just like Prabhakar Raghavan ruined Google search.
They almost always know whether or not you’re a bot before you get to any of those pictures. Making you deal with the pictures is how they “pay” for the capcha.
You’ll notice most of the images are related to traffic. That’s training data for their self driving cars.
Fuck that I might as well answer them wrong then. Bullshit
Unfortunately, they thought of that. Some of them are known answers that they use to be sure you’re answering honestly. They’ll fail you on those even though they know you’re not a bot.
Makes sense.
- Google’s “anti bot” verification has long been considered woefully inadequate.
- It works largely by tracking how long the user takes to click on it.
- LLMs are inherently fuzzy and for a bot, incredibly slow.
The article talks about Cloudflare’s very different CAPTCHA, not Google’s, but I agree.
I count on Cloudflare’s useless captcha for my *arr stack.
Wait. Your *arr apps are public?
You have to have a Cloudflare captcha solver for some of the *arr stack to work with certain indexers or something, idk. When my old *arr stack died and had to be rebuilt I ran into that problem, and after a short investigation, I promptly said fuck it and learned usenet. So happy that I did.
I think the confusion comes from people misunderstanding that cloud flare isn’t being set up in front of the arr stack, Instead, what people are talking about is flaresolverr, an application that helps services like jackett bypass cloudflares verification.
*ourr apps, apparently
Big yikes if so. The only public-facing part of my stack is Overseerr.
Yeah, if they want it accessible over the internet, hide it behind a Wireguard tunnel.
Ah yes, cloudflare’s captcha that just tracks how many hits you’ve done in a timeframe on a site recently.
Same shit different pile.
From the screenshot in the article, the bot is bypassing Cloudflare’s Turnstile which is not just tracking hits.
I work in bot detection. You and anyone else reading this should understand that, behind the scenes, proof-of-work, proof-of-space, and other tests are being run to verify if the device is what it says it is. Typically, a bot is run with a tool like Playwright or Puppeteer. These frameworks are detectable with the right tests. Bots will also attempt to spoof another device’s fingerprints to blend in. These changes are also detectable if you know what to test for.
We implement tools like Turnstile and other CAPTCHAless CAPTCHA because bots are pretty good at passing CAPTCHA while humans, rightfully, hate verifying they they’re human. Humans also struggle at passing CAPTCHA.
The general population has zero idea the massive volume of bot traffic that is being generated right now. These tools are implemented for a reason. So the fact that a bot just breezes past this test is a problem for us all.
Definitely not “same shit different pile”, friend.
Thanks for the write up, but I was blocked from logging in on a cloudflare website because I opened too many windows once and their tracking cookie flagged that browser as a bot.
Meanwhile the bot I built to track mod updates to my modlist for Rimworld and Mw5 on nexus? Never ran into any issues.
So when I refer to Cloudflare’s bot detection as shit, that is a highly personal and professional opinion.
Cloudflare is one of many contributing factors to how annoying the internet is now.
I get it. I really do. Having seen both the sheer volume of bot traffic and the annoyance of CAPTCHA, it’s hard for me to be on one side here. I wish the general public could see the volume of bot traffic we’re all contending with but I also get the the internet just gets shitier and shitier.
No problem, thanks for reading. I don’t work for Cloudflare, but I worry it’s a little too easy to call something shit when you don’t fully understand it.
There are numerous factors at play here even outside of frameworks and browsers. I haven’t worked with Cloudflare’s tools but where I work we allow each customer to fine tune detections. One site’s detections might be too aggressive for another site. Believe it or not, some customers are ok with bot traffic so long as it’s not overly aggressive. That said, detections can trigger based on behavior, such as high volumes of requests, as well as IP reputation.
Even with the bypasses that are available, or instances when you are able to use a bot and not be challenged, it doesn’t diminish how well these tools work. There are reasons people are implementing these types of antibot solutions across the web.
Could you please enlighten me on one small point:
When it asks you to click all the squares with a motorcycle, etc., does it expect you to include the squares with just a tiny part of the motorcycle or rider, or does it just want you to select the main squares?
I’m sorry for the late reply. I have to admit, I’m not entirely sure in this instance. Google’s CAPTCHA isn’t something I’ve kept up on and I too have had issues with it. I’ve personally done only major squares and squares with a tiny parts too. Both have worked. I’m sorry I can’t find you a more complete answer.
The modern breed of CAPTCHAs is mostly only trying to verify that it’s a full-fat browser. undetected-chromedriver, camoufox, pydoll, patchright and a million other libraries/tools exist. Nothing’s perfect and it’s a cat & mouse game, but this single incident is a sample size of one as well.
Absolutely well said. Cat & mouse indeed 😉
Lol who downvoted this
Meanwhile my ass is in tears every time I have to do a fucking “click all the squares that show a motorcycle” prompt. Maybe I should just join the bots.
Ooh, sorry, you missed the single pixel on the corner of the adjacent tile, FAIL
You didn’t miss it the second time, ALSO FAIL
Well now you can just have a bot do that for you
LLM is a model/algorithm and the robot is an automatic machine. LLM is not a robot. All is ok.
Oh. Well, I was worried for a bit but you’ve put my heart at ease.
Now that we’ve made our problems go away by redefining them, I’m ready to tackle
cancer, a natural body resource management issue.</s> [and assuming parent comment is also </s> in spite of
Poe’s LawNathan’s astute online parody observations]Yes. Exactly this. No way for a bot to make an API call to a LLM and get back a solution formatted in JSON that it could easily parse for the solution. Could never happen.
Prowlarr has had a thing to do this for a good while now. No AI needed.
The CAPTCHA is question is Cloudflare Turnstile, which slowly ramps up a different assortment of invisible challenges while not tracking your mouse movement or cross-site activity.
If a bot can find all images with crosswalks in grainy photos faster than we can, surely it can check a box as well. Bots definitely can check a box, and they can even mimic the erratic path of human mouse movement while doing so. For Turnstile, the actual act of checking a box isn’t important, it’s the background data we’re analyzing while the box is checked that matters. We find and stop bots by running a series of in-browser tests, checking browser characteristics, native browser APIs, and asking the browser to pass lightweight tests (ex: proof-of-work tests, proof-of-space tests) to prove that it’s an actual browser.
Probably because it accessed it through a user’s browser/connection which until that point hadn’t been flagged as a bot and had consistently shown signs of human use.
I’m sure if you set up a bot farm with this your connections would be flagged very quickly.
Wow, agents built to monitor and reflect human behaviour, accurately model and reproduce human behaviour.
This is what is what shits me off when people complain “Oh this AI isn’t real AI” or “This isn’t consciousness” The limiting factor is is the training data. Humans have just had a few more million years of training data passed on through genetics. It’s replication and fakery all the way down. If this is you, if you fucking need the reassurance that you are better at being fucking conscious compare to a machine fuck the fuck right off and go do something amazing with it then. Compose something. Create something. Feel the wind in your hair and the sand at your feet. Fuck off, we’re all dirt.
What does being able to fill a captcha have to do with consciousness? “Wow the ai being good at this pattern matching task surely is proof of it being humanlike because humans are also good at pattern matching!” Is such a stretch, dude.
… That’s a lot of expletives for anyone who might have a differing opinion about the nature of consciousness or reality.
Noted.
I calmed down and started searching for some recommendations to counter this view.
Although at the moment I still think The Chinese Room argument doesn’t prove what Searle thinks it does.