Continuous Security: Monitoring & Active Defense in the Cloud

Name: Continuous Security: Monitoring & Active Defense in the Cloud
Uploaded: 2018-04-15
Duration: 58 min 4 s
Description: Eric Johnson demonstrates cloud-native security monitoring and active defense techniques using AWS infrastructure. The talk covers logging, alerting, and automated response mechanisms deployed in a purple-team exercise, including WAF configuration, honeypots, and real-time detection of scanning and

BSides Iowa · 201858:04143 viewsPublished 2018-04Watch on YouTube ↗

Speakers

Eric Johnson

Tags

CategoryTechnical

TopicCloud IAM Detection Engineering Threat Intel

TeamBlue Purple

ResearchCase Studies and Incidents Analysis

StyleTalk

Mentioned in this talk

Tools used

AWS CloudFormation AWS Web Application Firewall CloudFront netcat Nmap

Platforms

AWS Lambda HipChat

Service

Slack

About this talk

Eric Johnson demonstrates cloud-native security monitoring and active defense techniques using AWS infrastructure. The talk covers logging, alerting, and automated response mechanisms deployed in a purple-team exercise, including WAF configuration, honeypots, and real-time detection of scanning and exploitation attempts against intentionally vulnerable cloud targets.

Show original YouTube description

BSides Iowa 2018 - Track 1 Speaker: Eric Johnson Monitoring and feedback loops from production is a critical tenant in DevOps for measuring performance, runtime errors, statistics, and changes. In the SecDevOps world, security teams can take advantage of DevOps monitoring tools to increase security visibility, identify anomalies, and respond swiftly to real time attacks.

Show transcript [en]

let's do it well we'll try to pick up some time here and make Greg feel a little bit at ease as we head into lunch so welcome to the first session of the morning so hopefully you all have your coffee and all that fun stuff so we're going to talk about monitoring today that sound like everyone's favorite subject yes absolutely we need to look at all the things so the the short story here is let me introduce myself so this context made sense my name is Eric Johnson I have a problem of having too many jobs all at once I can't say no to anything so I spent about 30-40 percent of my year working with the SANS Institute I

author and teach courses all over the globe conveniently which helps me you know get out and see things outside of the great state of Iowa mostly in the application security world I've authored secure coding classes and on the.net framework taught secure coding classes on various Java frameworks mobile application security lately over the last few years this whole DevOps explosion onto the world has come into play and we've spent the last two years writing a class on you know how to integrate security kind of into a DevOps lifecycle model as part of that we have this whole cloud thing that it doesn't seem like that's gonna go away any time soon so we started wiring a lot of

security automation into the cloud and one of the big pieces of that is monitoring and monitoring what's going on in production and getting feedback into the beginning of the development pipeline so those feedback loops from production is kind of what's landing us in this monitoring discussion today I've written some static analysis tools lots of security tools that do security consulting mobile test app pen tests various sorts of secure development lifecycle consulting engagements that that keeps me fairly busy in my free time along with two younger children that that keep me running around quite a bit so that's myself in a nutshell contact infos up on the slide at the bottom I will put this back up towards

the end so you can take your pictures and all that fun stuff so getting to the topic of today continuous monitoring it's called active defense in the cloud this idea kind of popped up I was having a conversation with some folks about the sans Cloud Security Summit that ran at the end of February in San Diego and I had signed up to present something at it and I had nothing to really talk about yet so I was on vacation in Mexico in early February and NYX stark has anyone seen Nick stark speak before last year he blew up a computer here which is pretty awesome I'm not sure what he's got in store for us this year but he sent me a message

and asked me if I wanted to speak about something at the sec DSM group in February as we're late January I think I'm getting my dates mixed up but as we were talking about what we could talk about I said hey what do you think about actually talking about monitoring and an audit logging with a cloud focus and if I show up with this AWS environment just sitting there nice and shiny with some vulnerable targets inside of it letting the whole SEC TSM group hack the environment for an hour or so in all I'm gonna show on the screen as a lot of charts and pretty graphs and alerting and monitoring thresholds to see what I

can see from a blue team perspective and we'll see what the red team gets into from a red team perspective and he said well that sounds awesome let's do that so I spent the rest of my week in Mexico just kind of sitting by the pool building out this AWS infrastructure which is actually kind of a feat in itself as we were talking about in the back if you think about how we had to do that ten years ago where you had to actually get on prim servers to build this infrastructure out and it was a couple clicks of a button in AWS to launch some stacks and build some stuff and I'll go through everything I set up

for them over the course of the next 45 minutes or so the first topic we'll talk about is why do we care about this whole logging and monitoring thing hopefully most of you appreciate the importance of this if not I will give you some examples as to why you should care about this then we'll talk about the monitoring and active defense techniques that I put in place inside of the cloud environment kind of some traps I laid for the sec TSM crew before they walked in the door and then we'll talk about the engagement the purple team event as we'll call it what happens and then the last section is let's do our post mortem so we'll kind of go

through this whole incident response process over the course of the talk so that's kind of the that's the layout sound exciting all right let's rock it obligatory here's the section that we're in slide number one of four let's talk about logging and monitoring real quick this has been something that largely is not new to the world of information security I would say in most organizations this is sometimes neglected at the same time evidenced by the recent inclusion does anyone follow the OAuth top ten lists and some things like that was actually included in the list in the release here in the 2017 edition and the reason is is because your monitoring of your audit log files

provides you with incredible feedback from production and gives you insight into what's happening in real time so the idea what this is is that a lot of us have logs are we actually monitoring those and are we actually aggregating the log data to build charts and graphs and define whatever normal is and then automatically alert folks when something abnormal is going on so that's the whole reason that this was included and in some cases you can take actions appropriate to defend your environment as something is going on in real time so that's kind of the the reason it was included from a cloud focus I thought to myself well what sort of breaches have happened in the cloud lately that could

have been detected with just very basic monitoring and logging techniques now there's a splattering of these on the slide years anyone's seen that whole your s3 bucket is showing problem going on in the world yeah so there's there's lots of these there's a few examples we've got some geospatial satellite data that was in a public-facing bucket I'm sure that was classified as probably top-secret information that you know you wouldn't want the world to know about we've also got deep root analytics that gave away all of our voter records information that's you know just one of the examples we've also got Verizon dumping a bunch of bucket data out from a personal perspective we ran into a situation

somebody written a Java filter before no job of devs in here yeah we've got one what's if Java filter do you do whatever you want it's kind of like a mini laugh out in front of your if from a security perspective right it's gonna run some code on every request in every response

absolutely yeah it's very very flexible in terms of what you can do I've seen these use quite a bit for security features in the past maybe I've got an authentication or an authorization check I want to make on some application and I'll do that before the request actually reaches the underlying core app code now in one of our applications at one point we actually had a Java filter running and inside of it there is this debug switch and this was not ever supposed to be turned on and inside of that debug switch it logged the entire requests and response data into and that's three bucket and it was the helpful you know for debugging and development and maybe

in your kind of testing environments and things like that what do you think happens to this debug flag at some point down the line of it's sitting there in production in a financial system that's got lots and lots of financial records PII data what do you think happened somebody turned it on and the person that turned it on was the engineer that knew that it was actually part of the filter because they wanted to troubleshoot something okay what do you think they forgot to do after they left the company a couple weeks later turn it back off so this filter is collecting at one point about 1.5 million records inside of it over the course of you know

several months will say just loading data data that is regulated but for that matter into this bucket what do you think the permissions on the bucket were - - public - read that's the command-line switch in AWS to make this mistake does that sound like a problem if not for us though I kind of freaked out a little bit we've got this whole realization that this is going on and yes you turn the filter off right away and like let's blow the filter away first of all so that's never happens again but you've got this whole problem of do you know if anybody actually read any of the information out of the bucket do we have to make a report - you know

the OCC and the SEC and the FFI you know all of these acronyms out there in the world that regulate the financial world and say oops we lost 1.5 million records that's the question what feature on that bucket do you think saved us from actually having to do that right we actually had logging available we knew from an auditing perspective that there was not one single read operation against a single item inside of that bucket which saved us a pretty big headache so that gets down to the underlying topic that we're talking about today which is auditing and monitoring of all of our things so that's just one example and yes we deleted the bucket pretty

quick afterwards as well let's talk about another example now this would never happen in real life but let's say that hypothetically there was some sort of struts vulnerability that allowed you to get remote code execution on some sort of a server on the Internet and let's just say that that server potentially had access to a hundred and forty million social security numbers and credit data for all of the people in this room most likely again just pretend that this you know maybe happened in real life that's not the same group is it yeah you're on you're on it there so let's talk about monitoring in this scenario what they did they popped the app server the web box sitting in the

DMZ big deal that's gonna happen to lots of web servers out on the internet so think about this from an auditing and an monitoring perspective when they use that web box as a proxy and they pivoted back into the internal server whether it was a database or a web service or whatever and they queried a hundred and forty million records out of that back-end system do you think that that activity maybe would have been flagged as abnormal taking on that many requests in a very short period of time it seems like that would be an anomaly to me if you average 200,000 requests a day and suddenly you're getting 10 million requests over that let's just say 14 day

time span how does no one notice that gets back to monitoring logging alerting looking for anomalies kind of taking this concept that we're talking about here and we'll keep going here is there a person or is it an animal so back to the not Equifax situation there's another aspect that I looked into and said when this struts vulnerability is actually exploited do you think that ends up in an Apache log file has anybody looked at what exception that throws so it's an invalid content type exception is what gets logged in your Apache log files so again from a monitoring and logging perspective we had two chances to detect this one was the abnormal 140 million requests to the

back-end server the other one is more of a needle in the haystack did you realize that you had this invalid content-type exception drop into the Apache log file and then did you have alerting on that to be notified that yes somebody just popped your vulnerable struts installation so a couple different things to take a look at there from a monitoring perspective that could have stopped this maybe but maybe they get a hundred thousand records or two hundred thousand records out but it's not a hundred and forty million so the blast radius could be much smaller just based on simple auditing monitoring and logging all right what do they have in common now we've talked about a lot of this already

we've gots missing monitoring and logging facilities that could have minimized the actual problem what do you think now someone is climbing through the rafters up there the other problem is if you are logging the security teams truly may not actually know where the log files are so we're just talking about Sims a little bit in our context and it's like yeah you might have this very big fancy Splunk installation that's costing you tons and tons of money but the data that you need to have detected these things is not actually in the sim system so that's another problem another one is that we've got no automated way to aggregate the information and make it visible to folks

so you can see that anomaly happen in real time so let's talk about active defense a little bit now who's seen some of the talks out there where it's like active defense means that I get to go hack at hack back against all of my adversaries out there in the world has anybody seen anything related to that it turns out you're not actually supposed to do that and does everyone aware that now some countries don't have those laws ours does so you can't just go hack back against them and say well they attack me first so when looking for a definition what is active defense actually supposed to be doing I stumbled into Robert Emily as anybody seen Robert

Lee speak he said dragos security he's a big ICS security guru and he's got a white paper in the reading room at sans that basically says here's what it is it's your analysts monitoring and responding to and learning from networks or knowledge of the threats internal to your network and internals the big key word there so you can take responses but you can only change things inside of your running network so that's the big key word the big takeaway there so what does that mean you have to know where all of your things are in order to actually defend them and you have to know where all of the audit log data associated with them is as well so in

the cloud world this is actually kind of difficult does anybody work in AWS and/or Azure on a regular basis does anybody really thoroughly think that all of their auditing and logging facilities are easy to understand where everything is going so in this environment that I'll introduce you to I ended up with four primary places that I had to go look for log data most of my flow log data is in this V PC flow log area within cloud wash cloud front has its own set of distribution logs that are piped out into an s3 bucket then we've got our al B your load balancer information also ends up in oh that's a different bucket you have to go look in

that other place to find that thing and then you've got your app server logs so we've got potentially an Apache and nginx server we've got log files there and then we've got a bunch of docker containers running and then those have log files inside of them that you have to go peel out of the docker container and get them back out into some other place that you can go look at it and now you start to think well it's a no wonder that people can't monitor and log any of their information find it respond attack defend etc because we don't know where to go look for these things so I'll take you through a quick journey here of what

we dug into VVC flow log data is similar to net flow kind of on your on networks from a logging perspective they're not as in-depth though we've asked AWS over and over to give us more insight more visibility here but you get some aggregated data so there's four examples on the slide you've got your source to your destination IP source destination ports you've got some aggregate how many bytes were in the packet and what log status was it accept was it reject those are the kinds of bits you're going to get out of that log file so you've got to go into cloud watch to find that and that's where you can dig that information out

of moving down the stack so we set up in the environments and CloudFront distributions so we've got cloud front log files that are stored in an s3 bucket as compressed text so you've got to then go pull the GZ file out of the s3 bucket and then extract the GZ file to find the actual log data so that's another place you've got to go look and then here you've got some TLS protocol info you've got what endpoint did the individual try to get to what the response code was what the user agent was some good diagnostic information that you can use to identify some anomalies digging into your load balancers oh yeah we've got to go look

in a different as three bucket here more compressed text more unzipping we can go in and grab URLs you can grab user agents you can get some of your TLS cipher info the response codes again similar information yet it's in a different place then you've got your app server logs which the big bold word on the slide is really important here if you don't do anything these are ephemeral log files how many times do you think Netflix rolls over their instances in a one day period has anybody ever thought about that before the average Netflix instance lives for about an hour so they've got ephemeral log files that are sitting there maybe an Apache or whatever their server is

that if you roll the instance over guess what happens to your log file just wave bye to it but it's gone so your forensics data has to be extracted out into some other area so you can install agents on our instance to get that data in the cloud watch this becomes even more complicated when you actually put it inside of a docker container because now there's other extractions that you have to do to get that working but the idea with this is same sort of log information you need the information you either response codes you need the endpoints you need the IP addresses that are making these requests in order to turn them into actionable intelligence

so that's kind of where we're heading here how many of you are just excited to start this process and build all of your metrics up one person says yes - so there's a project out there if you're wondering ok you've given me a couple of examples but how do I really get started here the project name is actually very appropriately named it's called monitoring sucks and it's a git organization so you're going to github go to the monitoring sucks organization and they've got seven or eight different repositories that are geared towards helping you build your active defense techniques just by gathering log data in monitoring for certain events so some examples there's one repository called

tools here's all of the different auditing and logging and monitoring tools that you can use to help you track this stuff so it'll talk about using stats D which is open source from Etsy gravano and graphite which are open source kind of metrics gathering tools it'll go into using cloud watch maybe using your elk stack and your Cabana dashboards it gives you all the different tool chains out there to help you figure out where to get started here there's also a bunch of blog post there's tons of references to Etsy's Coda's craft blog where they've gone through how they built their entire dev Seck ops monitoring around this concept so they've got lots of examples out

there and then there's just a metrics catalog here's an example there's tons of metrics well which port which service are you trying to monitor here's an example of port 80 ok if your LAN port 80 make sure that you actually monitor your duration connection how long it took request bytes the response code it tells you all the data that you should be logging and you can go through that for your network infrastructure for your databases your web servers your message queues lots and lots of services in that repo that can help you see if you're tracking at least everything that you should be tracking so that's step one now you've got the data and you can

start turning them into meaningful security metrics and this is where charts and graphs enter the picture and it gets really really exciting so let's just play a game here if you get a spike of 404 errors and put your security hat on and think about it yeah if somebody might just be trying to go to a page that doesn't exist so if you average a thousand of those a day on a system and then suddenly you see 10,000 404 errors in ten minutes what's going on on your system right now scanning yes somebody's probably running dirt buster against your site looking for evil admin interfaces like my PHP admin or WP dash admin looking for common exposed admin

interfaces things like that so you could detect that in real time just by monitoring normal vs. in anomaly how about 500 errors same game what's going on if you're seeing a bunch of 500 errors

yeah somebody could be running injection style payloads against your site and actually causing sequel exceptions or some sort of command injection style error to occur which is going to be just a normal 500 error so you can start to detect those in real time if you get a bunch of user agent headers in your logs from nikto or w3 AF or sequel map or some sort of very commonly known scanning header some script Kitty is probably just bored and running w3f against your system these are all signatures that you should be picking up on pretty much within the minute to 5 minute window that it starts you should be being alerted about it that's the

active defense side some challenges here I just showed you a whole bunch of different log files with a whole bunch of different information it is hard to actually process this especially in large-scale enterprise usually you've got to aggregate this into one centralized system Splunk is probably I would say by far and away the most commonly used kind of enterprise level tool for this has anyone just used the ELQ stack to ingest all this information into something like a Cabana dashboard so Phil Hagin is a sans instructor that runs around in the forensics curriculum if you're looking for completely free open source kind of similar to Splunk you can check out his virtual machine it's called soft elke

SOF ALK it's an open source virtual machine with all of the data ingestion tools you could ever possibly imagine pre-built pre-configured ready to roll on it all you have to do is download the VM turn it on and start putting data inside of it it's got the dashboards pre-configured on it and you can use those to kind of customize and build out your own dashboards that's a great rolling distribution for a virtual machine to just get started here to make up for the fact that we've got this overloaded kind of information that that we can't really process from a human perspective now I mentioned the needle in the haystack let's talk about that invalid content type exception from

these struts breach how many of you are gonna dig through your Apache log files and notice that one line out of 10 million in the log file unless you are incredibly good at reading log files which I don't know a single human that loves to do that and is good at it let's just say that we probably aren't going to see that because humans don't have the time or the attention span to locate that needle in a haystack so that's another problem so what's the solution to make up for us not being able to process these here's your answer take the data and convert them into meaningful dashboards if we had a dashboard in our hypothetical

organization that got beat up by the struts vulnerability and they had some sort of a dashboard that was looking for the number of invalid content type exception errors which there should be very few on that end point that was vulnerable to that and you saw one thing pop up is that may be worth looking into and if something shows up on the dashboard and you automatically fired a slack message off to the security team that said you need to look at this right now and see what's going on on that box would that maybe be an appropriate way to respond to that metric none of that happens if you don't have visibility into it there are tons of graphing

utilities out there on the slide some examples these are some of the charts that I snapshot at during our purple team exercise where I said wow look there's a tremendous spike in requests going in through cloud front that just pretty much told me that I went from my normal activity to I have lots of people attacking the environment right now my wife that I set up suddenly was only blocking ten requests a day now it's blocking hundreds and hundreds per every five minutes uncle intelligence that yes something is going on inside of the infrastructure absolutely so we can set a lot of those things up and then use them to drive some of the automated

responses to what we might want to do if some of these events actually happen so one of my students in my class told me about what they do at one of their it's a ticket exchange and I'm not exactly sure which one so just pretend it's like Ticketmaster for example so it's pretend it's Monday morning it's 10:00 a.m. we're just releasing a batch of tickets to this concert what all of the scalpers do at that moment right at 10:00 a.m. right we want automated purchasing to go snatch all of the tickets out of there so then they can go resell it this company has very good automation detection built into their system and yes the very first you know what's a

most simple step one thing they could have done is just kind of blacklisted their IP address and block them out of the system temporarily but is that going to stop them no because they can easily just hop on to a different let's just say network change their IP address hop on at or whatever rotate their IP and now they're back into the system right so instead of doing that they're using more of a honey trap style methodology where they actually feed them over into this environment that's not the real environment it's almost like a playground that has no actual bearing on what's going on in production and they let them try to purchase tickets and

they send them to that screen that actually says we're letting you be the one that says searching for available tickets and it just kind of sits there and lets that wheel spin for 20 minutes before they're like man this thing's broken something's going on here that's active defense right they're having them go waste their time on something worthless meanwhile the rest of us normal users are using the existing system like nothing else is actually happening so just a very good example of kind of where it gets exciting use your imagination so when I looked at automation for the blue team exercise I said what can I do internal to this fake Network I'll set up and I said well we

can block IP addresses that's really easy to do with the AWS laughs I didn't get to the point where I got functions in the environment to redirect them to an alternate location so that could be something we could do down the road kind of following the Ticketmaster example have any of you played with the AWS wife before this is inside of the environment it's not something that you have to go grab from f5 it's actually incredibly cheap at a lower scale and I'll get into some of the costs and what it does but it automatically protects a lot of your cloud content and also injects a lot of automation for you to react to events

that are going on in the environment it sounds pretty awesome right so let's take a look at this a little bit closer here's what we got they have AWS Labs has a cloud formation automation environment in github you download it you launch the stack you turn it on and out of the box you block sequel injection attacks you're blocking cross-site scripting attacks you've got managed IP lists so coming out of your reputations like spam house the emerging threats tor exit nodes those IP lists are being parsed every hour in being back filled into your firewall for you so you just automatically know all of the different bad actors that are out there and you're blocking them from

getting to your resources just by turning this thing on there's flood protection involved so if somebody goes and runs an in map scan against your environment which Jarrod McClaren I don't think is in here he's probably back in the CTF when I set this environment up I had him kind of play around in it for a little bit just to see if things were working within five minutes he texts me back and says your wife hates me all I did was run in map and now I'm blacklisted from the entire environment I can't get into anything that's built into it just by turning it on we've got a honeypot URL built into it which I will get into here in a

moment the github repos on the slide you can go pull this down and you can play with it whenever you have a moment it takes about 10 minutes to launch into an environment and protect anything that you've got running behind either a load balancer or behind a cloud front distribution this is specific to AWS technically you could protect a back-end web server with it if you proxied through an AWS cloud front distribution so cloud front has the ability to have a any server in the world as its origin behind it so you could have cloud front point to an on-prem web server if you wanted to you would just need to make sure that I

can't skirt around the distribution to your web server does that make sense but it is possible in theory I haven't seen it used in that context a whole lot but mostly geared towards AWS resources so here's the diagram on the slide it might be kind of hard to read there's a lot going on there but that diagram explains we've got your load balancer your distribution all of the log data from the load balancer the distribution is fed into an s3 bucket which will then parse a little bit later we've also got all of the laughs rules being auto applied so if you get a request parameter and it's got some injection style payload in it that hits one of the

cross-site scripting or sequel injection rules you just get blocked right out of the gate we've also got these scanners so we've got lambda functions that will auto install and they will parse the log files and say did somebody just hit this site a hundred thousand times in the last minute for example and if you did it'll automatically take the IP address from the log files and it will blacklist it so you can't play anymore so it'll pretty much kick you out within a minute we'll say we've also got the lambda function that is pretty much just a honeypot endpoint so let's talk about the honeypot endpoints for a minute because this is my favorite feature of

the entire thing it creates an API gateway endpoints just some random URL and if you go to it it auto black list of your IP address it sounds kind of fun right and then what you can do is you can embed that out in your site so it has you add maybe in the footer of your site in anchor tag with a nofollow directive on it so none of the automated crawlers will hit it that just goes to that endpoint what bots are going to hit that a lot of the app spiders that are out there if they're trying to a new all of your resources will just Auto request request that IP address and

guess what happens to your system the second that you request it you're out you've got to go rotate your IP address what I usually do is alias this is something a little more enticing so I'll have domain.com slash super admin interface as the URL to make it look like it's something really important that I'm trying to hide from the world and that's what we'll blacklist the IP address you can also take that endpoint and drop it off into your robots.txt file and put a disallow tag on it because what do attackers do pretty much is the very first step in their recon they're gonna look in the robots.txt file and say hey what does Eric not want

me to see in this website no there's super admin interface let me browse that and see what's going on and then they're just automatically kicked out they can't play anymore so now we're getting into some of these automated techniques here's the mandatory AWS diagram that explains what's going on here but essentially you hit the endpoint it passes it to the gateway which hits the lambda function and all the lambda function does is just say what's your IP address and it add you to the Black List table inside of the web so you can't get back into the system anymore which is pretty fun to do this how many of you love to write code a couple hands you'll

love this stuff because it's really easy if you don't if you're on the security team and you've been thrown into this operational kind of monitoring role and you don't have a lot of dev experience just go snag somebody from the software engineering team and say hey I have this cool idea can you help me in this say why yes I can I'm great at whatever language you've got Python node c-sharp Java Go et cetera and Ville right up this function or something very similar to it to help you automate defenses and about 200 lines of code this is inside of the lambda function that rolls out along with the laughs it just grabs the IP address it says am I

looking at a wife regional which is your al B or a wife global which is cloud front and it says okay let's go ahead and just stand up that AWS API and we'll just add the IP address and the appropriate blacklist rule that's how simple this can be in the world of service automation anyway should we turn it on and see what it looks like yeah why not right that's what we're here for so here's the environment so I sat down there in Cabo and I said let's just stand this up what do I want to give the SEC DSM group to play with here I wanted to use something fairly real-world because I know there are probably

millions of people that have launched the PCI DSS starter template into their AWS account and said PCI is definitely secure so I don't have to worry about anything going on inside of it okay so we'll test this out I then launched a very popular wordpress stack because you know that's powering a very large percentage of the internet I would say at this point and just say we'll see what happens to this thing and then I took one more step and I said there's this vulnerable target does anyone used the more modern single page vulnerable Oh wasp app it's called the juice shop it just sells like weren't juice on the market on it yet so it's a it's actually

a great application if you want to practice vulnerabilities and exploitation and all that fun stuff and I said well throw this out there on the Internet what could go wrong so I launched all of those I did modify some of the quick starter templates to kind of scale down and turn some things off but you know it's launching like T to extra large instances out of the gate i scaled those back to smalls and mediums so I could at least have pen test permission on them but not get charged a boatload of money in the couple of weeks that were there so my updates are on my github repo that was on the last slide what we got out of the

box here's what PCI does for you here's a management subnet or management VPC sorry the management VP C is intended to have some of your aggregates let's just say your Jenkins servers your Cabana dashboards all of the kind of management's non applicated application related resources inside of your account sitting in there you get a jump box a bastion host set up for you out of the box so you can go to the box and then that's how you can pivot back into all of the things behind it so that jump box is a very common target right out of the gate and I didn't touch anything all I did was launch it with a key and I just let it sit there and then

I gave the key to Nick Stark and a couple other people so they could go in and try to add a couple of interesting things to find in the environment and that's it so we've got that sitting there logging and monitoring is included if you enable it in the PCI starter set so that's good so we've got a lot of metrics that you can then graph then we've got our WordPress stack so I launched the WordPress stack I peered the management subnet to the apps subnet that contains WordPress so we can actually connect to those instances and administer them if needed and I did that so if somebody popped the jump box that they could pivot back into some more

important information and this actually launched a pretty redundant environment so you can see we've got a load balancer that scales across a couple of availability zones we've got the app server that's in a different subnet that's also scaled we've also got the databases scaled as redundant reader writers and failover czar setups so we've got a pretty redundant WordPress environment all going through and alb as far as the juice shop goes we've also got this is me just dropping things in there a couple new instances will scale those with a cloud front distribution and just spin them up in a docker container real quick so we can actually target those and and see what goes on

inside of here so I launched it into the world I had to get AWS permission to actually run the simulated event here so if you do this back in your own organization I learned a valuable lesson here is this is not technically a pen test pen test is two days lead time so of course at the 11th hour on Monday night I said oh hey I've got this thing coming up on Thursday can you approve this and they said oh that's a simulated event Eric we need seven days lead time for that and I kind of said well I've got a problem here it's gonna happen on Thursday most likely regardless of your response so it

would be nice if you could just push this through for me which they did so just FYI if you decide to take this back and do this your organization and then we show up Thursday night so the SEC TSM crew as it turns out it has anybody participated in any of these CTS with them you know over the course of the last couple years you all are pretty good at this so I put some of your accolades on the slide here and I probably missed some this was just a small subset of all of the different CTFs that have been won by that group so I knew that I had some nice attackers targeting the environment and we just

set you free on it for an hour and this is kind of where the sec dsm talk ended we didn't write the last piece of the story yet so let's take a look at what actually happened during the event this is where it gets kind of entertaining here's the environment that started I made this I spent a lot of time writing up this really pretty WordPress page that had a logo and everything and I said hey this is the monitoring sucks environments here's what I've got set up here's your IP addresses here's the web servers and all of these fun things and then I put some dashboards up on the screen now we can see as far as the blue team wins

okay what did I see that actually helped me identify attacks that are in progress you'll notice the massive spike in the reject packets in the flow log data so we've got this dashboard it averages out now I was getting about 30 rejects a minute just with it sitting on the internet for two weeks and then suddenly that spikes to 77304 during the exercise so I'd say that's some pretty good kind of indicators of attack we'll call it that something bad is going on there so alerting yes it's telling me I'm getting spammed because I set it up to an email address in this inbox saying there's bad stuff going on here something is happening so that's

blue team win number one then we'll look at our WordPress dashboard so I set up a couple of really quick dashboards on the WordPress instance and said okay we've got requests coming in we're averaging 422 a minute which is actually kind of a lot of traffic if you think about just some website that the world wasn't supposed to know about that someone obviously was finding it through just regular scanning and that goes up to six thousand eight hundred and five requests a minute number of 500 errors those are kind of important right number of 500 errors goes from two a minute to a hundred and seven a minute that's a pretty sharp incline in 500 errors so

again I'm seeing scanners as you mentioned injection attempts or things causing errors entering the system very easily and very quickly through just the monitoring techniques so blue team win right we're in pretty good shape here I'm starting to feel good I'm getting a false sense of confidence at this point let's take a look at the juice application all right so we've got request count going from 50 to 3500 a minutes so people obviously scanning them four and 500 error rates are just sharply spiking on there which is good because there's lots of vulnerabilities in that and yet I see the the wofe metric graph sitting right next to it the wife is blocking five requests per

minute and that spikes to nine hundred and twenty blocks per minute pretty quickly and this is within probably two minutes of the exercise starting the very next thing that happens is I start getting complaints Eric the webpage is unavailable my wife actually kicked in saw the bad IP address that was originating scans from it and triggered the lambda function which blacklisted the IP and I blocked them and says hey the request is now being blocked by cloud front that's a blue team win the bad part is that all of us sitting in the same building we're actually using the same IP address so actually the entire room could no longer get to any of the resources I'd still a

win right at this point I'm like yep my work here is done I'm out you guys have to go to the bar across the street and start attacking it from over there so we opened it up I opened up the IP address I put it in there's a whitelist filter that's in there so you can actually test things from your own machine or from your management box or wherever another blue team win this could have been all slack notifications or HipChat messages or whatever this is just an inbox that I set up that was just spammed with alerts indicating all of these evil things going on that's what should have been happening in the struts

example we are violating all of these thresholds something bad is going on someone needs to go look at this and that's kind of where the blue team part stops as soon as I opened up the IP address we had some problems come into play here now it turns out that WordPress is not very secure when I launched the stack I purposely launched the oldest version of WordPress that I could launch that was supported by the stack so it was probably minus 6 on the version scale for WordPress one interesting thing that I learned is that the wife and I still don't know why does not block WP scans against the WordPress instance so very quickly we could get diagnostic data

we're enumerate the WordPress plugins and you can see had I installed some vulnerable plugins the the instance itself is completely toast at that point so that's a problem on the screen you can see a net cat command that ran off of one of Tom pols servers out there he probably has these scattered across the world I would guess but this is just one of them so Tom finds a shell that I buried into the WordPress app called happiness that PHP is what I named it and it's just sitting there in the root it's got a PHP shell console sitting there and very quickly has shell on the actual WordPress instance which I think we all

have to just assume that this will happen to your WordPress box if you have one of these at some point during the lifetime of it so the question is can we detect this and can we block it and it also turns out that WordPress out of the box does a terrible job of doing secrets management the database credentials are stored in the wp-config.php file right there in the web root with the clear text database creds just sitting there so what do you think Tom did first when he got access to this well let's connect to the database and see if I can connect to it so Tom connects to the database very quickly dumps the tables and then just runs a

very simple command called drop database WordPress semicolon and now you're looking at the homepage of the site so database down legit right that's fair game we learned very quickly here that that database account launched by the wordpress stack that's running probably millions of WordPress instances around the world has a really terrible database permission policy set up on it by default is there any reason the web instance needs permissions to drop to the database I'm just gonna go on a limb and say no so that's something we learned pretty quick Tom continues to pillage this is the home page about five minutes after that where he just moves the shell up to the root of the website so now you go to it

and the home page is back up except it gives you an actual command shell box and he put his mugshot on there so meet Tom he's back in the CTF room if you want to go shake his hand and give him congrats on a job well done with just completely breaking this in a very short period of time it's quite impressive so I would say WordPress itself we've proven that you know there's a reason that these get it popped all of the time opportunities so opportunities WordPress is a special beast here you have to monitor these a little bit closer then let's just say the three hours of dashboards I spent putting together to stand up monitoring on that instance I

would highly recommend figuring out why WP scan didn't trip the laughs because you have to be able to detect and block those if you're running WordPress out in the world so that would be my number one take away to stop this from happening obviously secrets management might be something that you should also address because if you pop the web box you should not be able to drop the database so shifting those creds off into some sort of secrets management vault whether it's the AWS vault parameter store hashey corpus of all who cares which one you use getting them out of the clear text file at rest on the disk is a also a very big takeaway that would need to

happen let's see I learned very quickly I didn't have any dashboards on my database connections because I never really thought that far down the line that the database would just mysteriously somehow so that would almost be like a reverse dashboard where if you're averaging a hundred database requests per minute and then suddenly it drops to zero that might indicate a problem so that was a dashboard that just wasn't even in the equation when I started we've also got the the juice shop application so on that side nobody told me or at least sent me screenshots of anything bad that happened to that app and it's probably because everyone was banging against WordPress and we didn't have that much time to really

crack into that one but the audit log data from inside the docker container now I launched it with the log driver for docker that was supposed to pipe the information out to cloud watch for me and it never arrived so getting the log files out of the container is a challenge something that you're going to have to address especially if you're in a containerized environments let's see what else did I put on here from the PCI side I was missing some information just in the log data for the PCI stack alone we had by default no VPC flow data inside of the management stack the app stack had it but that management subnet you know the

one that is responsible for handling all administrative items and connections inbound to your environment had no flow log information available that is a big whiff on the template side maybe I didn't turn it on I haven't gone back to look at it yet but something that would need to be addressed there's also no logging on the ssh side on the bastion or the jump box host so you should be able to see if you're on boarding a bunch of failed SSH connections that should be a chart in an event that gets triggered also monitoring successful logins to that instance Toms egress connection on port to-to-to-to may have also been something you could have blocked with a standard Knakal and our

security group rule on the kind of instance level WordPress side so outbound allow star kind of hurt me there and that's the default in that template I should have been able to maybe block that connection out of the gate we've also got the cloud front distribution that was forwarding traffic into the vulnerable wasp application was missing an origin access identity which means I could skirt around cloud front and the laughs and connect directly to the instance which bypasses all of the security features so that's the postmortem side that's what I learned just by digging around in the dashboard looking through the log files summarizing this very quickly here and we'll wrap it up I know we've got a

couple of minutes left doing the exercise as a whole does anybody do this internally at their organization uh I would highly recommend it on a Friday afternoon get some devs get some ops folks get some engineers get some security folks all in the room and just play this game and do the post-mortem together because we identified just in an hour a whole bunch of different places that we're missing monitoring and security that you would not want to really be finding for the first time as you're being attacked so playing the game can help you build your defenses out that'd be my number one take it will take away code reviews obviously someone reviewing that WordPress app would have

easily seen those creds and a config file so basic linting and scanning and get hooks to block secrets from being committed to your repos probably would be a good thing that that we could do in that situation the honeypot endpoints are awesome I will say that was my one big win from the whole thing is that I did block the room out so that's kind of the conclusion side I will do we have a couple minutes or we pretty much at that point contact info I'll open it up for questions at this point for anybody that any other thoughts you can think of on the defense or monitoring side that would be kind of fun there

absolutely it's a good point so yes the comment is this is going to prevent a lot of the automated kind of brute force smash-and-grab style attacks those are just going to be eliminated right out of the gate a real attacker that really wants to compromise your stuff they're probably going to go the manual route and not use a lot of automation and hopefully we've got other logging and event triggers if they do get into something that catch that but yeah absolutely this is the script kitty detection as I always say hopefully the goal honestly is to frustrate the attacker that they can't run automations and they just go attack Equifax instead of your company that's the end goal is another

that's awesome so yeah OkCupid for those listening later has a scenario where they detect a bot and they just move them off into a different environment and let the bots talk to the other BOTS that's pretty that's pretty awesome it's kind of its like the Ticketmaster thing you can get very creative with these techniques and that's that's the fun stuff that's where this gets really excited any other questions yeah

right good that's a great question what's what are the options when you've got a sim on prim and then you've got all this cloud data it's difficult a lot of people end up in a hybrid mode depending on your cloud contract because the cloud providers love to charge you tons and tons of money to extract all of that log data back into your on-prem system another problem is if you have a non-print system that you know maybe charges you based on the volume of information that you drop into it the cloud data it can be very noisy so a lot of folks have trouble filtering that down to relevant events and putting it in the on-premise sim just from a

costing perspective so if you've got deep pockets you could put it all in your on-premise sim and just continually be syncing that information up that's completely doable but it can be costly some folks choose to just keep them separate and keep the data within the cloud to avoid those those charges from an architecture pattern I'd be of a fan of having it in one centralized system if you could do it but not always an option for for everybody all right I think we're about out of time so sorry guys I didn't catch you up any on our schedule here I'll have to pass that burden on to the next speaker all right thanks all

Continuous Security: Monitoring & Active Defense in the Cloud

Related talks