Anubis - Weighs the soul of incoming HTTP requests using proof-of-work to stop AI crawlers

zutto@lemmy.fedi.zutto.fi · edit-2 8 months ago

Anubis - Weighs the soul of incoming HTTP requests using proof-of-work to stop AI crawlers

merthyr1831@lemmy.ml · 8 months ago

It’s a clever solution but I did see one recently that IMO was more elegant for noscript users. I can’t remember the name but it would create a dummy link that human users won’t touch, but webcrawlers will naturally navigate into, but then generates an infinitely deep tree of super basic HTML to force bots into endlessly trawling a cheap-to-serve portion of your webserver instead of something heavier. Might have even integrated with fail2ban to pick out obvious bots and keep them off your network for good.

paperd@lemmy.zip · 8 months ago

That’s a tarpit that you’re describing, like iocaine or nepthasis. Those are to feed the crawler junk data to try and make their eventual output bad.

Anubis tries to not let the AI crawlers in at all.

NeoNachtwaechter@lemmy.world · 8 months ago

generates an infinitely deep tree

Wouldn’t the bot simply limit the depth of it’s seek?

Cethin@lemmy.zip · 8 months ago

It could be infinitely wide too if they desired. It shouldn’t be that hard to do I wouldn’t think. I would suspect they limit the time a chain can use though to eventually escape out, though this still protects data because it obfuscates legitimate data that it wants. The goal isn’t to trap them forever. It’s to keep them from getting anything useful.

nickwitha_k (he/him)@lemmy.sdf.org · 8 months ago

That would be reasonable. The people running these things aren’t reasonable. They ignore every established mechanism to communicate a lack of consent to their activity because they don’t respect others’ agency and want everything.

randomblock1@lemmy.world · 8 months ago

Why Sha256? Literally every processor has a crypto accelerator and will easily pass. And datacenter servers have beefy server CPUs. This is only effective against no-JS scrapers.

poVoq@slrpnk.net · 8 months ago

It requires a bunch of browser features that non-user browsers don’t have, and the proof-of-work part is like the least relevant piece in this that only gets invoked once a week or so to generate a unique cookie.

I sometimes have the feeling that as soon as some crypto-currency related features are mentioned people shut off part of their brain. Either because they hate crypto-currencies or because crypto-currency scammers have trained them to only look at some technical implementation details and fail to see the larger picture that they are being scammed.

swelter_spark@reddthat.com · 8 months ago

So if you try to access a website using this technology via terminal, what happens? The connection fails?

Drew@sopuli.xyz · 8 months ago

If your browser doesn’t have a Mozilla user agent (I.e. like chrome or Firefox) it will pass directly. Most AI crawlers use these user agents to pretend to be human users

swelter_spark@reddthat.com · 8 months ago

What I’m thinking about is more that in Linux, it’s common to access URLs directly from the terminal for various purposes, instead of using a browser.

Drew@sopuli.xyz · 8 months ago

If you’re talking about something like curl, that also uses its own User agent unless asked to impersonate some other UA. If not, then maybe I can’t help.

enemenemu@lemm.ee · 8 months ago

Meaning it wastes time and power such that it gets expensive on a large scale? Or does it mine crypto?

zutto@lemmy.fedi.zutto.fi · edit-2 8 months ago

Yes, Anubis uses proof of work, like some cryptocurrencies do as well, to slow down/mitigate mass scale crawling by making them do expensive computation.

https://lemmy.world/post/27101209 has a great article attached to it about this.

–

Edit: Just to be clear, this doesn’t mine any cryptos, just uses same idea for slowing down the requests.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 8 months ago

And, yet, the same people here lauding this for intentionally burning energy will turn around and spew vitriol at cryptocurrencies which are reviled for doing exactly the same thing.

Proof of work contributes to global warming. The only functional, IRL, difference between this and crypto mining is that this doesn’t generate digital currency.

There are a very few POW systems that do good, like BOINC, which is a POW system that awards points for work done; the work is science, protein analysis, SETI searches, that sort of thing. The work itself is valuable and needs doing; they found a way to make the POW constructive. But just causing a visitor to use more electricity to “stick it” to crawlers is not ethically better than crypto mining.

Just be aware of the hypocrisy.

lime!@feddit.nu · 8 months ago

the functional difference is that this does it once. you could just as well accuse git of being a major contributor to global warming.

hash algorithms are useful. running billions of them to make monopoly money is not.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 8 months ago

Which party of git performs proof-of-work? Specifically, intentionally inefficient algorithms whose output is thrown away?

lime!@feddit.nu · 8 months ago

the hashing part? it’s the same algo as here.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 8 months ago

That’s not proof of work, though.

git is performing hashes to generate identifiers for versions of files so it can tell when they changed. It’s like moving rocks to build a house.

Proof of work is moving rocks from one pile to another and back again, for the only purpose of taking up your time all day.

lime!@feddit.nu · 8 months ago

okay, git using the same algorithm may have been a bad example. let’s go with video games then. the energy usage for the fraction of a second it takes for the anubis challenge-response dance to complete, even on phones, is literally nothing compared to playing minecraft for a minute.

if you’re mining, you do billions of cycles of sha256 calculations a second for hours every day. anubis does maybe 1000, once, if you’re unlucky. the method of “verification” is the wrong thing to be upset at, especially since it can be changed

Cethin@lemmy.zip · 8 months ago

Proof of work is just that, proof that it did work. What work it’s doing isn’t defined by that definition. Git doesn’t ask for proof, but it does do work. Presumably the proof part isn’t the thing you have an issue with. I agree it sucks that this isn’t being used to do something constructive, but as long as it’s kept to a minimum in user time scales, it shouldn’t be a big deal.

Crypto currencies are an issue because they do the work continuously, 24/7. This is a one-time operation per view (I assume per view and not once ever), which with human input times isn’t going to be much. AI garbage does consume massive amounts of power though, so damaging those is beneficial.

dpflug@kbin.earth · 8 months ago

This is a stopgap while we try to find a new way to stop the DDOS happening right now. It might even be adapted to do useful work, if need be.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 8 months ago

Hook into BOINC, or something? That’s an idea.

Sucks for people who have scripts disabled, or are using browsers without JS support, though.

dpflug@kbin.earth · 8 months ago

It does, and I’m sure everyone will welcome a solution that lets them open things back up for those users without the abusers crippling them. It’s a matter of finding one.

CodeHead@lemmy.world · 8 months ago

This isn’t hypocrisy. The git repo said this was “a bit like a nuclear response”, and like any nuclear response, I believe they expect everyone to suffer.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 8 months ago

Not hypocrisy by the author, but by every reader who cheers this while hating on cryptocurrency.

IME most of these people can’t tell the difference between a cryptocurrency, a blockchain, and a public ledger, but have very strong opinions about anyway.

F04118F@feddit.nl · 8 months ago

Goretantath@lemm.ee · 8 months ago

I think the maze approach is better, this seems like it hurts valid users if the web more than a company would be.

N0x0n@lemmy.ml · edit-2 8 months ago

For those not aware, nepenthes is an example for the above mentioned approach !

BackgrndNoize@lemmy.world · 8 months ago

This looks like it can can actually fuck up some models, but the unnecessary CPU load it will generate means most websites won’t use it unfortunately

lemonuri@lemmy.ml · edit-2 8 months ago

I did not find any instruction on the source page on how to actually deploy this. That would be a nice touch imho.

d0ntpan1c@lemmy.blahaj.zone · 8 months ago

There are some detailed instructions on the docs site, tho I agree it’d be nice to have in the readme, too.

Sounds like the dev was not expecting this much interest for the project out of nowhere so there will def be gaps.

SanndyTheManndy@lemmy.kya.moe · 8 months ago

The docker image page has it

JustAnotherKay@lemmy.world · 8 months ago

Or even a quick link to the relevant portion of the docs at least would be cool

Daniel Quinn@lemmy.ca · 8 months ago

It’s a rather brilliant idea really, but when you consider the environmental implications of forcing web requests to ensure proof of work to function, this effectively burns a more coal for every site that implements it.

marauding_gibberish142@lemmy.dbzer0.com · 8 months ago

I don’t think AI companies care, and I wholeheartedly support any and all FOSS projects using PoW when serving their websites. I’d rather have that than have them go down

computergeek125@lemmy.world · 8 months ago

Found the FF14 fan lol
The release names are hilarious

Couldbealeotard@lemmy.world · 8 months ago

What’s the ffxiv reference here?

Anubis is from Egyptian mythology.

PumaStoleMyBluff@lemmy.world · 8 months ago

The names of release versions are famous FFXIV Garleans

mannycalavera@feddit.uk · 8 months ago

Upvote for the name and tag line alone!

e0qdk@reddthat.com · 8 months ago

Giant middle finger from me – and probably everyone else who uses NoScript – for trying to enshittify what’s left of the good parts of the web.

Seriously, FUCK THAT.

LiveLM@lemmy.zip · 8 months ago

You should blame the big tech giants and their callous disregard for everyone else for the Enshittification, not the folks just trying to keep their servers up.

zutto@lemmy.fedi.zutto.fi · 8 months ago

They’re working on no-js support too, but this just had to be put out without it due to the amount of AI crawler bots causing denial of service to normal users.

nrab@sh.itjust.works · 8 months ago

You should fuck capitalism and corporations instead because they are the reason we can’t have nice things. They took the web from us

drkt@lemmy.dbzer0.com · 8 months ago

Anubis is provided to the public for free in order to help advance the common good. In return, we ask (but not demand, these are words on the internet, not word of law) that you not remove the Anubis character from your deployment.
If you want to run an unbranded or white-label version of Anubis, please contact Xe to arrange a contract.

This is icky to me. Cool idea, but this is weird.

LiveLM@lemmy.zip · 8 months ago

…Why? It’s just telling companies they can get support + white-labeling for a fee, and asking you keep their silly little character in a tongue-and-cheek manner.
Just like they say, you can modify the code and remove for free if you really want, they’re not forbidding you from doing so or anything

bitcrafter@programming.dev · 8 months ago

Just like they say, you can modify the code and remove for free if you really want, they’re not forbidding you from doing so or anything

True, but I think you are discounting the risk that the actual god Anubis will take displeasure at such an act, potentially dooming one’s real life soul.

TheOakTree@lemm.ee · 8 months ago

Yeah, it seems entirely optional. It’s not like manually removing the Anubis character will revoke your access to the code. However, I still do find it a bit weird that they’re asking for that.

I just can’t imagine most companies implementing Anubis and keeping the character or paying for the service, given that it’s open source. It’s just unprofessional for the first impression of a company’s website being the Anubis devs’ manga OC…

F04118F@feddit.nl · 8 months ago

It is very different from the usual flat corporate style yes, but this is just their branding. Their blog is full of anime characters like that.

And it’s not like you’re looking at a literal ad for their company or with their name on it. In that sense it is subtle, though a bit unusual.

TheOakTree@lemm.ee · 8 months ago

I don’t think it’s necessarily a bad thing. Subtle but unusual is a good way to describe it.

However, I would like to point out that if it is their branding, then the character appearing is an advertisement for the service. It’s just not very conventional or effective advertising, but they’re not making money from a vast majority of implementations, so it’s not very egregious anyway.

Possibly linux@lemmy.zip · edit-2 8 months ago

It is not great on many levels.

It only runs against the Firefox user agent. This is not great as the user agent can easy be changed. It may work now but tomorrow that could all change.
It doesn’t measure load so even if your website has only a few people accessing it they will stick have to do the proof of work.
The POW algorithm is not well designed and requires a lot of compute on the server which means that it could be used as a denial of service attack vector. It also uses sha256 which isn’t optimized for a proof of work type calculation and can be brute forced pretty easily with hardware.
I don’t really care for the animé cat girl thing. This is more of a personal thing but I don’t think it is appropriate.

In summary the Tor implementation is a lot better. I would love to see someone port it to the clearnet. I think this project was created by someone lacking experience which I find a bit concerning.

zutto@lemmy.fedi.zutto.fi · 8 months ago

Doesn’t run against Firefox only, it runs against whatever you configure it to. And also, from personal experience, I can tell you that majority of the AI crawlers have keyword “Mozilla” in the user agent.
Yes, this isn’t cloudflare, but I’m pretty sure that’s on the Todo list. If not, make an issue to the project please.
The computational requirements on the server side are a less than a fraction of the cost what the bots have to spend, literally. A non-issue. This tool is to combat the denial of service that these bots cause by accessing high cost services, such as git blame on gitlab. My phone can do 100k sha256 sums per second (with single thread), you can safely assume any server to outperform this arm chip, so you’d need so much resources to cause denial of service that you might as well overload the server with traffic instead of one sha256 calculation.

And this isn’t really comparable to Tor. This is a self hostable service to sit between your web server/cdn and service that is being attacked by mass crawling.

marauding_gibberish142@lemmy.dbzer0.com · 8 months ago

Xe is insanely talented. If she is who I think she is, then I’ve watched her speak and her depth of knowledge across computer science topics is insane.

nrab@sh.itjust.works · 8 months ago

…you do realize that brute forcing it is the work you use to prove yourself, right? That’s the whole point of PoW

Possibly linux@lemmy.zip · 8 months ago

True, I should of phrased that better.

The issue is that sha256 is fairly easy to do at scale. Modern high performance hardware is well optimized for it so you could still perform attack with a bunch of GPUs. AI scrapers tend to have a lot of those.

David From Space@orbiting.observer · 8 months ago

I use https://sx.catgirl.cloud/ so I’m already primed to have anime catgirls protecting my webs.

Bilb!@lem.monster · 8 months ago

Catgirls, jackalgirls, all embarrassing. Go full-on furry.

marauding_gibberish142@lemmy.dbzer0.com · 8 months ago

I look forward to TOR’s PoW coming out for FOSS WAFs

_cryptagion [he/him]@lemmy.dbzer0.com · 8 months ago

Nice. Crypto miners disguised as anti-AI.

drkt@lemmy.dbzer0.com · 8 months ago

what about this is crypto mining?

Anubis - Weighs the soul of incoming HTTP requests using proof-of-work to stop AI crawlers

Anubis - Weighs the soul of incoming HTTP requests using proof-of-work to stop AI crawlers

GitHub - TecharoHQ/anubis: Weighs the soul of incoming HTTP requests using proof-of-work to stop AI crawlers