Hi,

I know this is quite impossible to diagnose from afar, but I came across the posting from lemmy.world admins talking about the attacks they are facing where the database will get overwhelmed and the server doesn’t respond anymore. And something similar seemed to have happened to my own servers.

Now, I’m running my own self-hosted Lemmy and Mastodon instances (on 2 seperate VPS) and had them become completely unresponsive yesterday. Mastodon and Lemmy both showed the “there is an internal/database error” message and my other services (Nextcloud and Synapse) didn’t load or respond.

Login into my VPS console showed me that both servers ran at 100% CPU load since a couple of hours. I can’t currently SSH into these servers, as I’m away for a couple of days and forgot to bring my private SSH key on my Laptop. So, for now I just switched the servers off.

Anyway, the main question is: what should I look at in troubleshooting when I’m back home? I’m a beginner in selfhosting and I run these instances just for myself and don’t mind if I’d have to roll them back a couple days (I have backups). But I would like to learn from this and get better at running my own services.

For reference: I run everything in docker containers behind Nginx Proxy Manager as my reverse proxy. I have only ports 80, 443 and 22 open to the outside. I have fail2ban set up. The Mastodon and Lemmy instances are not open for registration and just have 2 users each (admin + my account).

  • @Anafroj@sh.itjust.works
    link
    fedilink
    English
    7
    edit-2
    11 months ago

    The best you can do to know if it was an attack is to inspect the logs when you have time. There are a lot of things that can cause a process going wild without being an attack. Sometimes, even filling the RAM can cause the CPU to appear overloaded (and will freeze the system anyway). One simple way to figure out if it’s an attack : reboot. If it’s a bug, everything will get back to normal. If it’s a DDoS, the problem will reappear up to a few minutes after reboot. If it’s a simple DoS (someone exploiting a bug of a software to overload it), it will reappear or not given if the exploit was automated and recurring, or was just a one-shot.

    The fact that both your machines fell at the same time would tend to make think it’s an attack. On the other hand, it may just be a surge of activity on the network with VPSes with way not enough resources to handle it. Or it may even be a noisy neighbor problem (the other people sharing with you the real hardware on which your VPSes run who will orverload it).