Admiral Patrick

I’m surprisingly level-headed for being a walking knot of anxiety.

Ask me anything.

Special skills include: Knowing all the “na na na nah nah nah na” parts of the Three’s Company theme.

I also develop Tesseract UI for Lemmy/Sublinks

  • 146 Posts
  • 1.73K Comments
Joined 3 years ago
cake
Cake day: June 6th, 2023

help-circle
  • Admiral Patrick@dubvee.orgtoLemmy Shitpost@lemmy.worldShout out to...
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    5 days ago

    I’d have to look, but I’m going with ignorance of its toxicity like with other things such as radium being used frivolously .

    We painted our kids’ rooms with it, painted pencils yellow with it, used it for water lines, put it in gasoline where we then breathed the leaded fumes for decades, and more.

    As for finally getting around to replacing lead water lines, well, infrastructure isn’t sexy and no one wants to pay for it.








  • Speaking from recent experience, the smaller thing can, and often does, turn into a larger thing all on its own and always at the worst time.

    For the last 3 years, I knew my water heater was on its last legs. I kept putting it off until two Saturdays ago I had a wet basement and no hot water. The kick in the ass was that it wasn’t that hard to replace the unit: 2 hours of labor to install and 2 hours to drain, remove, and clean around the old one. Cost me just under $650 including same day delivery which was awesome because I would have had to rent a truck and drafted someone to help me load/unload it otherwise.

    So my advice is when you allocate time to address the small problem, give yourself double that in case it turns into a bigger project. It’s always easier to deal with big stuff when it’s not a surprise.











  • Just about anything as long as you don’t need to serve it to hundreds of people simultaneously. Hell, I once hosted Jellyfin over a 3G hotpot and it managed.

    Pretty much any web-based app will work fine. Streaming servers (Emby, Plex, Jellyfin, etc) work fine for a few simultaneous people as long as you’re not trying to push 4K or something. 1080p can work fine at 4 Mbps or less (transcoding is your friend here). Chat servers (Matrix, XMPP, etc) are also a good candidate.

    I hosted everything I wanted with 30 Mbps upload before I got symmetric fiber.



  • Thanks!

    Mostly there’s three steps involved:

    1. Setup Nepenthes to receive the traffic
    2. Perform bot detection on inbound requests (I use a regex list and one is provided below)
    3. Configure traffic rules in your load balancer / reverse proxy to send the detected bot traffic to Nepenthes instead of the actual backend for the service(s) you run.

    Here’s a rough guide I commented a while back: https://dubvee.org/comment/5198738

    Here’s the post link at lemmy.world which should have that comment visible: https://lemmy.world/post/40374746

    You’ll have to resolve my comment link on your instance since my instance is set to private now, but in case that doesn’t work, here’s the text of it:

    So, I set this up recently and agree with all of your points about the actual integration being glossed over.

    I already had bot detection setup in my Nginx config, so adding Nepenthes was just changing the behavior of that. Previously, I had just returned either 404 or 444 to those requests but now it redirects them to Nepenthes.

    Rather than trying to do rewrites and pretend the Nepenthes content is under my app’s URL namespace, I just do a redirect which the bot crawlers tend to follow just fine.

    There’s several parts to this to keep my config sane. Each of those are in include files.

    • An include file that looks at the user agent, compares it to a list of bot UA regexes, and sets a variable to either 0 or 1. By itself, that include file doesn’t do anything more than set that variable. This allows me to have it as a global config without having it apply to every virtual host.

    • An include file that performs the action if a variable is set to true. This has to be included in the server portion of each virtual host where I want the bot traffic to go to Nepenthes. If this isn’t included in a virtual host’s server block, then bot traffic is allowed.

    • A virtual host where the Nepenthes content is presented. I run a subdomain (content.mydomain.xyz). You could also do this as a path off of your protected domain, but this works for me and keeps my already complex config from getting any worse. Plus, it was easier to integrate into my existing bot config. Had I not already had that, I would have run it off of a path (and may go back and do that when I have time to mess with it again).

    The map-bot-user-agents.conf is included in the http section of Nginx and applies to all virtual hosts. You can either include this in the main nginx.conf or at the top (above the server section) in your individual virtual host config file(s).

    The deny-disallowed.conf is included individually in each virtual hosts’s server section. Even though the bot detection is global, if the virtual host’s server section does not include the action file, then nothing is done.

    Files

    map-bot-user-agents.conf

    Note that I’m treating Google’s crawler the same as an AI bot because…well, it is. They’re abusing their search position by double-dipping on the crawler so you can’t opt out of being crawled for AI training without also preventing it from crawling you for search engine indexing. Depending on your needs, you may need to comment that out. I’ve also commented out the Python requests user agent. And forgive the mess at the bottom of the file. I inherited the seed list of user agents and haven’t cleaned up that massive regex one-liner.

    # Map bot user agents
    ## Sets the $ua_disallowed variable to 0 or 1 depending on the user agent. Non-bot UAs are 0, bots are 1
    
    map $http_user_agent $ua_disallowed {
        default 		0;
        "~PerplexityBot"	1;
        "~PetalBot"		1;
        "~applebot"		1;
        "~compatible; zot"	1;
        "~Meta"		1;
        "~SurdotlyBot"	1;
        "~zgrab"		1;
        "~OAI-SearchBot"	1;
        "~Protopage"	1;
        "~Google-Test"	1;
        "~BacklinksExtendedBot" 1;
        "~microsoft-for-startups" 1;
        "~CCBot"		1;
        "~ClaudeBot"	1;
        "~VelenPublicWebCrawler"	1;
        "~WellKnownBot"	1;
        #"~python-requests"	1;
        "~bitdiscovery"	1;
        "~bingbot"		1;
        "~SemrushBot" 	1;
        "~Bytespider" 	1;
        "~AhrefsBot" 	1;
        "~AwarioBot"	1;
    #    "~Poduptime" 	1;
        "~GPTBot" 		1;
        "~DotBot"	 	1;
        "~ImagesiftBot"	1;
        "~Amazonbot"	1;
        "~GuzzleHttp" 	1;
        "~DataForSeoBot" 	1;
        "~StractBot"	1;
        "~Googlebot"	1;
        "~Barkrowler"	1;
        "~SeznamBot"	1;
        "~FriendlyCrawler"	1;
        "~facebookexternalhit" 1;
        "~*(?i)(80legs|360Spider|Aboundex|Abonti|Acunetix|^AIBOT|^Alexibot|Alligator|AllSubmitter|Apexoo|^asterias|^attach|^BackDoorBot|^BackStreet|^BackWeb|Badass|Bandit|Baid|Baiduspider|^BatchFTP|^Bigfoot|^Black.Hole|^BlackWidow|BlackWidow|^BlowFish|Blow|^BotALot|Buddy|^BuiltBotTough|
    ^Bullseye|^BunnySlippers|BBBike|^Cegbfeieh|^CheeseBot|^CherryPicker|^ChinaClaw|^Cogentbot|CPython|Collector|cognitiveseo|Copier|^CopyRightCheck|^cosmos|^Crescent|CSHttp|^Custo|^Demon|^Devil|^DISCo|^DIIbot|discobot|^DittoSpyder|Download.Demon|Download.Devil|Download.Wonder|^dragonfl
    y|^Drip|^eCatch|^EasyDL|^ebingbong|^EirGrabber|^EmailCollector|^EmailSiphon|^EmailWolf|^EroCrawler|^Exabot|^Express|Extractor|^EyeNetIE|FHscan|^FHscan|^flunky|^Foobot|^FrontPage|GalaxyBot|^gotit|Grabber|^GrabNet|^Grafula|^Harvest|^HEADMasterSEO|^hloader|^HMView|^HTTrack|httrack|HTT
    rack|htmlparser|^humanlinks|^IlseBot|Image.Stripper|Image.Sucker|imagefetch|^InfoNaviRobot|^InfoTekies|^Intelliseek|^InterGET|^Iria|^Jakarta|^JennyBot|^JetCar|JikeSpider|^JOC|^JustView|^Jyxobot|^Kenjin.Spider|^Keyword.Density|libwww|^larbin|LeechFTP|LeechGet|^LexiBot|^lftp|^libWeb|
    ^likse|^LinkextractorPro|^LinkScan|^LNSpiderguy|^LinkWalker|msnbot|MSIECrawler|MJ12bot|MegaIndex|^Magnet|^Mag-Net|^MarkWatch|Mass.Downloader|masscan|^Mata.Hari|^Memo|^MIIxpc|^NAMEPROTECT|^Navroad|^NearSite|^NetAnts|^Netcraft|^NetMechanic|^NetSpider|^NetZIP|^NextGenSearchBot|^NICErs
    PRO|^niki-bot|^NimbleCrawler|^Nimbostratus-Bot|^Ninja|^Nmap|nmap|^NPbot|Offline.Explorer|Offline.Navigator|OpenLinkProfiler|^Octopus|^Openfind|^OutfoxBot|Pixray|probethenet|proximic|^PageGrabber|^pavuk|^pcBrowser|^Pockey|^ProPowerBot|^ProWebWalker|^psbot|^Pump|python-requests\/|^Qu
    eryN.Metasearch|^RealDownload|Reaper|^Reaper|^Ripper|Ripper|Recorder|^ReGet|^RepoMonkey|^RMA|scanbot|SEOkicks-Robot|seoscanners|^Stripper|^Sucker|Siphon|Siteimprove|^SiteSnagger|SiteSucker|^SlySearch|^SmartDownload|^Snake|^Snapbot|^Snoopy|Sosospider|^sogou|spbot|^SpaceBison|^spanne
    r|^SpankBot|Spinn4r|^Sqworm|Sqworm|Stripper|Sucker|^SuperBot|SuperHTTP|^SuperHTTP|^Surfbot|^suzuran|^Szukacz|^tAkeOut|^Teleport|^Telesoft|^TurnitinBot|^The.Intraformant|^TheNomad|^TightTwatBot|^Titan|^True_Robot|^turingos|^TurnitinBot|^URLy.Warning|^Vacuum|^VCI|VidibleScraper|^Void
    EYE|^WebAuto|^WebBandit|^WebCopier|^WebEnhancer|^WebFetch|^Web.Image.Collector|^WebLeacher|^WebmasterWorldForumBot|WebPix|^WebReaper|^WebSauger|Website.eXtractor|^Webster|WebShag|^WebStripper|WebSucker|^WebWhacker|^WebZIP|Whack|Whacker|^Widow|Widow|WinHTTrack|^WISENutbot|WWWOFFLE|^
    WWWOFFLE|^WWW-Collector-E|^Xaldon|^Xenu|^Zade|^Zeus|ZmEu|^Zyborg|SemrushBot|^WebFuck|^MJ12bot|^majestic12|^WallpapersHD)" 1;
    
    }
    
    
    deny-disallowed.conf
    # Deny disallowed user agents
    if ($ua_disallowed) { 
        # This redirects them to the Nepenthes domain. So far, pretty much all the bot crawlers have been happy to accept the redirect and crawl the tarpit continuously 
    	return 301 https://content.mydomain.xyz/;
    }