← All talks

How to Fight DDoS Attacks from the Command Line

Bsides CT · 202545:01233 viewsPublished 2026-01Watch on YouTube ↗
Speakers
Tags
About this talk
Michael McMahon shares tools and techniques for defending websites against DDoS attacks, AI scrapers, and vulnerability scanners using free, self-hosted command-line utilities. Drawing from his experience protecting the Free Software Foundation's infrastructure under sustained attack, he demonstrates log analysis, firewall rules, rate limiting, and automated response systems including fail2ban, modsecurity, and IP-set blocking.
Show original YouTube description
How to fight DDoS attacks from the command line Michael McMahon (reuploaded due to previous video being cut off at the end) The modern Internet is a hostile environment to run a website or service in 2025 full of aggressive scrapers, vulnerability scanners, and CI/CD services. Many sites have chosen to keep their sites running by hiding behind Cloudflare and other shields. Some of our sites have been under attack for more than a year now and I will share several tools and techniques that I use as a system administrator at the Free Software Foundation to keep the sites up. I will share some of the tools that I use including monitoring tools, analysis tools such as custom bash scripts to analyze logs and local ASN look-ups, automated protection, and firewall tools.
Show transcript [en]

All right, guys. Next up, we have How to Fight DOS Attacks from the Command Line with Mike and McMahon.

Thank you. First, I want to ask some questions. Who's actually running web servers or services publicly on the internet? Yeah, I I know there was a lot of uh hands this morning for people on the defensive side. So, this is going to be a defense stop. That's where my bias is. I'm going to run through some quick little licensing things. Uh this is a copy left talk, copyright Mike McMahon 2025 released under the creative common attribution share like 4.0. don't have international license. The opinions expressed here are my own and not that of my employer. So, I've been working at the FSA since 2019. Is anybody familiar with the FSA? All right. So, we're kind of a unique

nonprofit organization. We only run free software from the the bottom up. None of our stuff's non-free, proprietary. I found that to be very interesting because I got to run everything. I got to put my my fingers in all the pots. Uh we run 70 sites and services. And when I started there was four people on the tech team. Now there are two people in Texas. We have a couple of volunteers and full-time employees and the volunteers together make the FSF CISOPS team. Very appreciative of that team. Uh before that I have experience in manufacturing, uh education, game development, uh music development. Um it's a a pretty wide background. Why I'm talking to you today is because

for the last year uh we've been under DOS tax. Uh before that we've been on attacks off and on but for more than a year we've been undergoing u constant DOS attacks. I have a mooning graph right here that I think is crazy. Bob, one of the FSIS apps members, identified a unique pattern on Savannah. Uh Savannah is our software forge and uh we were able to filter on this um pattern, put them in IP set and at one time we had 5 million 5.1 one million IPs matching that pattern. That's a lot of IPs attacking one site. Uh the scope of this talk is very specific to the FSF. I'm only going to be talking about free

tools. What that means if you're not familiar with the concept is that there are four freedoms of free software. That is to run, edit, contribute, and share. If you can do that with the code, all four of those things, run, edit, contribute, share, and it's free software. And I don't care about price. The all these tools I'm going to be talking about happen to be free. That's not the quote. Points freedom. If they are not with those four freedoms, I'm going to call it non-free or unknown. If it's uh n free software behind them, Even if you're not interested in it from a philosophical standpoint, I believe that in the security field, you cannot practice the trust but verify concepts

unless your tools are free. Now, uh I'm specifically talking about small to mediumsiz attacks. I've got a friend who works at a larger nonprofit organization and they cannot log the data that they are getting. I I'm calling a large attack uh a large site. Um we can still log. I'm calling that a small to medium. I'm going to be talking specifically about web servers. This might apply to other things, but I'm usually talking about EngineX and Apache that can log. Uh, this might apply to other ones like Caddyy. I know that's a popular one, but I'm rooting for engineext at the moment. I'm not going to be talking about AI or blockchain. And all these tools can be

self-hosted. So, what what's going on here? Some of this is speculation because these uh the abuse that we're seeing is not coming with a an abuse uh note. Uh in some attacks in the past has been um an anonymous email that comes in and says this is who we are. This is what we're doing. These are our demands. And we don't have that this time. And there's multiple tasks. It's not it's not one thing, but most people are assuming that a lot of these are AI scrapers. This is u AI tech bros with startups uh using botn nets to scrape the internet as much as they can to get the most recent fresh capture of the web as fast as they

possibly can. Uh AI is kind of controversial. Some people hate it, want to block it all as much as they can. But it's interesting because sometimes if you have a message to spread, you want to get scraped so that your message becomes part of the the model and whenever somebody asks an an agent a question um they'll get back an answer that includes your organization. Um so you have to decide for yourself where you draw that line. I draw that line at uh bots that identify themselves and respect the robots.ext file. Second one's vulnerability scanners. I'm going to skip that. Besides, you know what that's about. So, where are uh AI crawlers and uh vulnerability scanners coming from? Uh,

traditional VPNs. That's a big one. I love the appearance in privacy. So, I'm not blocking these out, right? I do block bad ones as I see them, but um those tend to be temp uh temporary. Botn nets are a big one. Um, these look like they're coming from everywhere. cloud providers, residential addresses, companies everywhere. Um, a particularly interesting one that kind of matches the scale that I'm seeing is void unpatched Android smart TVs got compromised and the millions and um that would fall under the the consumer residential IP address ranges. An interesting one that's come up is uh people selling their bandwidth to make some extra cash. An example of this is magneticroxy.com, but there's a whole bunch of different u

organizations that will sell that will purchase your bandwidth. And what that essentially does is you get paid a little bit to be a part of a bot. Uh so that doesn't make your IP address look good. I can't imagine that follows the terms of service of your um internet service provider, but do they care? I I don't know. Uh I don't like it though. I would suggest not doing that. One that really grinds my gears is uh crawling as a service. One of the tools that took down one of our sites uh was uh scrapey. And I looked into the issue tracker scrapey. And the devs don't care. Um they don't build it in a way that

respects websites. Um so the company behind it is called Zeit. They don't care if you read their site. It will get your blood pressure boiling. Show the link at the end. Uh, another one that's pretty frustrating is poorly configured CIC student services. Uh, this is mostly companies that are running server farms of automation. They make sure that their builds work, there's not any new updates. uh nothing's changed. Companies don't need to do that. They don't need to hit external sites every time one of their devs submits a patch to their infrastructure.

If you're building spiders and CI automation uh and you're hitting external servers, please identify yourselves with a user agent and a link with contact. When I see something like that, I can reach out. Um otherwise, I just block it. I'll screen test. So all that together, it seems that the health of the web has some serious problems right now. When I talk about DOS, I I typically get three suggestions and I don't use any of these. I'm going to tell you why. Traditional DOS protection is routing your traffic through a third party proxy. And um on the third party proxy, they have some magic that uh looks for DOS patterns and blocks it. That's bad for privacy because your

whole set of connections go through this third party. Can you trust third party? Let's add one more person in the no. Doesn't need to know. Cloudflare does a lot of things. Uh so when I talk about cloud fair, it's not one thing, it's many services. It's also bad for privacy and it can be very bad for privacy potentially. Um, sometimes you have to share your TLS certificates with it. So if cloud player is compromised, that could be a man-in-the-middle that against all the clients. I don't want to participate in that. It's non-free JavaScript. So that that's right up for us. And then the captions that they use in addition to being non-free are often accessibility issues.

Who failed a capsule this week? You know, I did. I failed to identify a duck. >> Yeah. So a lot of the captions require uh that you have mental facilities to identify things that you have strong eyesight and the alternative is usually that you can hear. So there is a a better way. The the third one is Anubis and go away. Uh these are very similar. As far as I know, these are 100% free software. The default functionality uses JavaScript. And uh my organization has a a complicated history with that. So we don't we're not going to put a JavaScript gateway in front of our site. There are alternatives to JavaScript uh that they allow

and that's very good. Another reason why we're not using this JavaScript challenge is because the difficulty is um dynamic in how much abuse your site is getting. Difficulty challenge 2 takes my old laptop couple seconds. Difficulty 4 took my laptop 64 seconds to get into the um GNO getl 64 seconds is a lot of time.

I've read an article that researchers have figured out a way to automate those checks as well. So if a scraper is has a botnet, they can have one expensive node that just solves these challenges on the command line. Then they had an army of cheap VPS's and botn nets and TVs and whatever that are able to scrape somewhat slower after they get these checks done. With all that said, I think Anubis and Go away uh might be the best ethical option for you and your organization if the rest of my talk is overwhelming. I'm going to be talking about a lot of things. Now, I got to put a disclaimer here. You got to be mindful of your rules.

If you're blocking search engines, um, that's going to hurt your page rank. search engine optimization. If you're blocking AI crawlers, there might not know who you are afterwards. And a really complicated one is carrier grade. This is where many different computers share the same address space. Big examples are China, Brazil, Peru, uh, mobile phones, VPNs. If you're blocking by IP address, and you pretty much have to, it's going to be unfair to these DGETs. Only a few of my tools are compatible with that in mind. So talk web servers. How can we how can we configure these whenever you can reduce the complexity of your sites? If you're running a uh a WordPress, think about could it be a stack site?

If you have expensive things that can be accessed from the internet, consider putting an authentication card. Distribute as much as you can near to other data centers, other states, other countries, other conflicts if you have the ability to do so. Now, from a privacy standpoint, keep in mind, Prism has some more privacy patients when you're dealing with interconental traffic. Uh, maybe that matters, maybe it doesn't. If you're running a server farm, reach out about inkind donations that you can help us out with. Something that I really like that may or may not be effective is robots.ext files. To me, this is stating what you as a web administrator consent to bots hitting your site with.

I have a good example on fsf.org/roots with text. Feel free to copy from it. An important part of it is sprawl delay. Now, it's not an official part of the RFC 9309 robots.ext standard, so it only works for those that listen. Bing listens. A lot of big ones do listen, but Google doesn't. And a lot of people just follow what Google does. It's really frustrating.

And we pretty much have to do whatever Google wants because they've got monopoly on search. Hate it. One thing about robots.ext file is it's just a sign. It doesn't have anything to do with enforcement. It's on them to follow the sign. The rest of the top is going to be about enforcement. Rate limiting is an important part of a web server config. Engine X has it built in. Patchet 2 does not. Uh but you can add it in with mod quality of service QoS. QoS also helps for slow loris attacks. We have seen two more things about web servers before we move on. Shed the load. If you know that you're getting attacked by a

certain uh user agent, you can configure it to just return a 403 instead of doing the full requests. Just do a quick little 403 page. That helps quite a bit. Problem is that uh the attacker knows you know that they're attack. A really strong example of that is blocking empty user agent strings. Most of the MVUs or agent strings are vulnerability scanners. So if you drop those, you're already a little more safe after that. Talking about large traffic uh that you can't log mod security is a w a web application firewall. The new version only works with engine X. So this is another reason why I'm saying engine X is good. The old version of mod security 2 works

with a patch of two. I am using it. It is good. I'm using it until I can switch everything to Android. It can do a lot. So, I don't have time to go really too far into it, but it really helps with China traffic. If I identify a bad pattern coming from China, I can uh sculpt that traffic, block some of the requests that are bad, and allow normal people that are viewing the site. I'm not going to go too too far into monitoring, but I do want to highlight these four tools. Uptime Puma is one of my new favorites. Pretty simple. tells you if a site is making requests in a good file. Prometheus, everybody should know

Prometheus moon in saw the graph in that second slide. Htop it's a good one when you're actually on machine. I'm going to show uh two previews of what uh climb looks like. A lot of people have seen this one. It's very simple. Uh green means that uh Fankuma's getting requests. Uh red kind of a bad example because there's no red on it, but red means there there's uh more than 48 seconds uh response time. We'll show the next one. On the back end, there's a more detailed view. That's specific response times to graph. I love this view. Tells me if a site is uh actual outlining. Uh this big long red bar was internal panic.

Before that was a DOS attack is all over the place. and afterwards was uh after fixed it with some of these other tools.

Most of the analysis depends on logs. Uh some people are against logs because of privacy. I think you need logs if you can handle it. I made a tool to help me look at logs called log review. I'm going to give you a little demo of it. It isn't as flashy as some of the others, patchy top and go access, but it helps me find very obvious abnormalities in web server track. And then once you have some IP addresses, you look them up with a one of my favorite tools, IP to ASN-Web service. This is a local ASM lookup tool. It's free and I've got it on my machine here. Before I show you a live demo, um let's

talk about standard log formats. To understand log review, you need to understand logs. Uh here is one line of a web server log. Sound of a lot. This is from uh list.goo.org. But the first part is the IP address, then the date, the actual request, the uh response code 200 in this case, the size per and then their user agent. User agent and request are typically the most important ones to look at when analyzing for abuse. uh and the IP address of course uh date is helpful to see how many requests from me a certain time and uh size my tool doesn't do size but um Patrick top and go access also analy I don't know how familiar are command

line but pipes are very important and a lot of uh my tools are are piping things from here to there everywhere. Uh I want to show you uh this series that I use all the time. Sort to unique count to sort numerically. On the left column, we got u an unsorted list of cats, dogs, and squirrel. Uh after you run it through sort, it puts it in alphabetical order. Cat, cat, cat, cat, cat, dog, dog, dog, squirrel. After you run it to unique, it counts the numbers and it's much abbreviated as that whole sorting by number puts them in order. So log review is doing that but in a a larger scale. Right.

So, uh this is real traffic. I've redacted a few uh that might not be of use.

Uh here's the uh sample of the log config. It's running on this uh this one from August 28th. Um specifically only Chrome 112. I was seeing a an abuse of Chrome version 112 which is several years old at this point.

If you have a lot of logs, it might take a while. There's I don't know 800,000 lines on this one. It kind of goes fast once it gets going. It shows a few things. It looks for the top uh most frequent IPs and for each one it looks at what their user agents are. What are some of the last three requests they made and what are the requests that they make the most. I think with those little bits of data you can kind of get a a gist of something. you kind of quickly identify whether it's a vulnerability scanner or whether it's a scraper or if it's broken and repeating the same thing over and

over time. And then at the end, total stats of user agents on this one. It's not very interesting because it's specifically Chrome 112. That last one is an outlier. Uh, it's got almost 800,000 hits from a very old uh browser. That's That's not right. In a moment, I'll show you I'll go back to these IPs. But, uh, this this one uh,

sorry, my view is a little different than your view. So, All right. See, here's here's one block of uh an address I 136.226 A1800. This was uh all one user agent. They are scraping um archives of glamorous. We'll look at find out where they come from.

All right. So once you once you have some data to work with, there's some manual actions you can take. Uh you can configure firewalls, build new rules. Um that gets kind of tedious if you're under a big attack. So I've automated that. I'll show you the scripts in a second. At a certain point, if you're using traditional firewall tools, you will run out of rules. A lot of them have a limit of uh 65,000 somewhere like that. If you're dealing with a 5 million IP botnet, that's not enough. So, IP set is a way to expand that. You can also block by country. Uh that's called geo fencing. I don't I feel kind of gross doing that.

Some people don't even need other countries. Like I I got a friend who works on a US-based nonprofit. Anything that's not in the US isn't his clients. Doesn't care. He can drop all that traffic. I don't have that luxury. And he can write manual abuse reports. Sometimes those work, sometimes they're a waste of time. So, here's a a collection of scripts. Uh, I published these somewhat recently. There's a QR code uh that links to the repo. I'm calling it firewall block gen info. Uh, uses it's a wrapper for um IP to ASN web service. makes a comma separated values list of the IP, their ASN description of the ASN, the country, and the date. And then I have several other scripts

that are variations of that go a step further and build firewall rules. Uh I've got one for IP tables, one for uh UFW shorewall IP set. I've got a little script that will uh fetch every C for an ASN. Um so you can block that if you see a pattern. And I've got one for country codes as well for geo fencing. So I do demo some of those.

I like to set up my talks to be completely offline. So, I've got a web server running on this laptop and it's got the data. Uh, normally it fetches the data from IP to ASN every hour. So, it's up to date. I'm going to start IP to ASN. Uh, it doesn't have any output, so I'm just going to assume that it works. And in I'll show you IP to ASN info first. The IPs that we saw in the log review are these three. So I can run IP ASN info. And here we have uh the IP address, the ASM, the description of the ASN, the country, and the date. two and three doesn't really give you a

big insight into what's going on. But if you look up thousands that match one specific abuse pattern, maybe Zcaler is uh the ASN that an attacker is using block Zcaler.

ASN's IP set requires the internet. So I recorded a a terminal session uh using it. This just going to be like I'm typing but I'm not actually typing. So, one time I saw 10 cent, the company behind uh Tik Tok was aggressively scraping our site. Uh this is one of their ASNs 1322 03 somewhere back. So this uses an external site that you can request an ASN from. It does the the text first thing and turns it into an IP set.

This is very very helpful. Once you have the IP set script then you can run it. If you have ID set installed in IP tables, it will block all that tracker. If you block this up the chain, say you have a say you're self-hosting and you're running a a website on a VM, if you go up the chain and run it on the host, that's more effective. If you run it uh further up on a router, it's even more effective. Show you the same thing for countries. I don't like to block countries outright, but um sometimes it's the best thing to get your site back up. This example's blocking China, Hong Kong, and Brazil.

I try to practice what I preach. So between each one there is a sleep. I'm noting their uh servers.

I think I have Brazil blocked on couple sites right now.

I don't think I have China block at the moment, which is good.

So, it doesn't take too long. If you if you're talking a whole long range, it does take a lot more

All right. So, uh along the time I was using fail to bam. Fail to ban is a program where you can write regular expressions to look for a pattern in logs and then take action on that pattern. Last week I migrated to a different alternative to fail to ban called reaction. And reaction's written in Rust. Fail to written in Python. Uh reaction is fast. It's lower on resources. It's very cool. It's kind of a build it yourself kind of deal. Uh fail spam has configs you can kind of rework reaction. you got to build it yourself. But a lot of it's carrying over from what I've learned from fail to ban. It did not have an example for IP set.

So I wasn't able to use it, but last week I figured it out. So uh it's not on their wiki yet, but uh if you look at this issue, it's got an IP set example that does work. It is uh defending www.pum.org right now. You can also trigger various things. Blocking is the the most obvious one. You can also report to uh block list.v which will send abuse reports automatically on your behalf once a day or uh abuse IPDB. This one's kind of interesting. What I've talked about so far is defense. There's an interesting concept called offensive defense, which you guys might like. It's a little spicy. Tarpets are an effective way to ethically trap bots that don't listen.

Next slide's going to be a specific example of a simple and effective tarpit. Merov engines are a way to fill a crawler with jerk data at the expense of your CPU time and bandwidth. I think that's a a bad trade-off with my limited resources, but not everybody does. I can from Princess Bride reference is the new one. And uh zip bombs are um a new way to fight back. I don't know if this is effective. If you're doing this, please make sure that it's you're absolutely positive there are no false positives. This blog post has an example of how to set up the config for it. And uh that GitHub link has a one terabyte one

megabyte to one terabyte zip. This is a a tarpit. I got it from Martin. He lives in New England. Uh he's the main dev on Inkscape. Great guy. So this is all all of it right here on the screen. It's a little small. Sorry. First off, you got to have in your robots.ext file uh a line uh two lines that say don't go to the tarpit. Is there every bot that's reading the twit uh that's reading the the robots text file but listen won't go there uh in the HTML you got to have a link to it but um Martin did a clever thing where he puts it outside of the view of the browser

and uh it's also uh hidden from screen readers so no human should see that if they're browsing the page normal they'll only know about it in the robots.ext file and the source code of the page. If there's just one link and one bot goes there, uh, a lot of these are botn nets. So, if you drop one leg of the octopus, it doesn't matter. You can keep changing that link. You can keep getting more legs. And then once once they uh go to that link, then you can set a fail to ban or reaction filter to block them. If you feel a little spicier, you can zip bomb. It's up to you.

I want to do some shout outs. Uh thank you Bides Connecticut. Thank you uh Sacred Heart University. Uh thanks for nothing scrapey and zeite and uh thank you to uh reaction IP2 asn web service IP set uptime kuma moonin nubis go away prometheus moonin mod security patch 2 enginex quality of service failed to ban patch top co access for utils said a curl IP tables inkscape the penth help Ian operating systems trisle debian Graphin OS. This is Libra Office, Firefox, a browser Mark Vim Neoim Gim made the the first slide with Gim and all their dependencies. Thank you to the volunteers that make up the FSF sysops team and all of you. Thank you.

Still got a few thoughts. How can you help us? I love my job. I think what I do is really cool. If you like what I do, there are ways to donate. Become a member. You can donate a one-off. Your organization can become a patron. You can volunteer with the office ops team. And what would really help is to actually use these tools I'm talking about, contribute patches, submit issues, write blog posts about using them and configuring those. Uh, all this helps. And I really want you to stop giving money to these proprietary alternatives. If your organization does give money to the prop proprietary alternator

side has a tendency to rugpool you get you hooked. How how many prices went up this year? You know it's a non-trivial amount. and adding crawl delay uh crawl delay to the RFCD 9309 standard would help out the internet. There's a couple references I didn't have time to uh I didn't have space on the slides to put man I I hate zeite. Don't go there if you have blood pressure problems. Here's my contact information. There's a QR code with the slide download link. I made some changes to the slides. So, this is the slides from this morning. If you visit it uh later tonight or tomorrow, it will be exactly what you saw today. I'm on Macedon,

my IRC neck. I got my permanent and uh temporary emails. I've only got about a minute left, so take a question now, but I'm going to be hanging out of the lobby if you want to talk. Any questions? I know I threw a lot at you. [laughter] Questions. Um, you piece of you should. >> Yeah, probably. Um the best of my ability I block show what I know show I need to update that but show is and there's a few other show

>> I don't think showdown shows up in the user agent but census and some of the others do. So you can block them reaction event.