Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges.

Pro@programming.dev · edit-2 30 days ago

Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges.

Probius@sopuli.xyz · 1 month ago

This type of large-scale crawling should be considered a DDoS and the people behind it should be charged with cyber crimes and sent to prison.

eah@programming.dev · 30 days ago

Applying the Computer Fraud and Abuse Act to corporations? Sign me up! Hey, they’re also people, aren’t they?

Blackmist@feddit.uk · 30 days ago

Business idea: AWS, but hosted entirely within the computing power of AI web crawlers.

excral@feddit.org · 29 days ago

I like the idea but couldn’t you just go the more direct route and mine crypto?

anton@lemmy.blahaj.zone · 6 days ago

Yes, but hosting provides value to society.

chicken@lemmy.dbzer0.com · 1 month ago

Seems like such a massive waste of bandwidth since it’s the same work being repeated by many different actors to piece together the same dataset bit by bit.

chuckleslord@lemmy.world · 1 month ago

Ah Capitalism! Truly the king of efficiency /s

0_o7@lemmy.dbzer0.com · 30 days ago

I blocked almost all big players in hosting, China, Ruasia, Vietnam and now they’re now bombarding my site with residential IP address from all over the world. They must be using compromised smart home devices or phones with malware.

Soon everything on the internet will be behind a wall.

irelephant [he/him]@programming.dev · 30 days ago

This isn’t sustainable for the ai companies, when the bubble pops it will stop.

aev_software@programming.dev · 30 days ago

In the mean time, sites are getting DDOS-ed by scrapers. One way to stop your site from getting scraped is having it be inaccessible… which is what the scalpers are causing.

Normally I would assume DDOS-ing is performed in order to take a site offline. But ai-scalpers require the opposite. They need their targets online and willing. One would think they’d be a bit more careful about the damage they cause.

But they aren’t, because capitalism.

Natanael@infosec.pub · 30 days ago

If they had the slightest bit of survival instinct they’d share a archive.org / Google-ish scraper and web cache infrastructure, and pull from those caches, and everything would just be scraped once, repeated only occasionally.

Instead they’re building maximally dumb (as in literally counterproductive and self harming) scrapers who don’t know what they’re interacting with.

At what point will people start to track down and sabotage AI datacenters IRL?

rozodru@lemmy.world · 30 days ago

I run my own gitea instance on my own server and within the past week or so I’ve noticed it just getting absolutely nailed. One repo in particular, a Wayland WM I built. Just keeps getting hammered over and over by IPs in China.

ZILtoid1991@lemmy.world · 30 days ago

Just keeps getting hammered over and over by IPs in China.

Simple solution: Block Chinese IPs!

MonkderVierte@lemmy.zip · edit-2 30 days ago

I just thought that having a client side proof-of-work (or even only a delay) bound to the IP might deter the AI companies to choose to behave instead (because single-visit-per-IP crawlers get too expensive/slow and you can just block normal abusive crawlers). But they already have mind-blowing computing and money ressources and only want your data.

But if there was a simple-to-use integrated solution and every single webpage used this approach?

daniskarma@lemmy.dbzer0.com · 30 days ago

Solution was invented long ago. It’s called a captcha.

A little bother for legitimate users, but a good captcha is still hard to bypass even using AI.

And I think for the final user standpoint I prefer to lose 5 seconds in a captcha, than the browser running an unsolicited heavy crypto challenge on my end.

Kissaki@feddit.org · 30 days ago

For years, we’ve written that CAPTCHAs drive us crazy. Humans give up on CAPTCHA puzzles approximately 15% of the time and, maddeningly, CAPTCHAs are significantly easier for bots to solve than they are for humans.

https://blog.cloudflare.com/turnstile-ga/

I hate captchas.