← All talks

Logging, Monitoring, and Alerting in AWS (The TL;DR)

BSidesSF · 201824:59459 viewsPublished 2018-04Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Jonathon Poling - Logging, Monitoring, and Alerting in AWS (The TL;DR) With AWS’ ever-increasing number services and ever-growing complexity, individuals and organizations are desperately seeking the “TL;DR” of what services are available to protect them from and respond to attacks, and how to best configure them for effective and efficient monitoring, alerting, and incident response. The first part of this presentation will walk the audience through the core services and capabilities that are critical to logging, monitoring, alerting, and responding to threats. The second part will walk the audience through specific monitoring and alerting configurations that the audience can immediately apply to their infrastructure to begin and/or improve their path toward securing their AWS infrastructure. Whether you’re just starting out in AWS or have been using it for years, there is something for everyone to learn or brush up on in ensuring your org is best prepared to monitor for and respond to a compromise.
Show transcript [en]

[Music]

thank you that was that was everything I could have expected and much much much more I'm sorry for the video guy I'm gonna make your life really hard because I'm probably gonna walk her home like this and look like this because I usually look at the screen so I'm Jonathan poling I made the trek completely underwater a journey here from the East Bay which was a bit of a trek so I will be delivering logging monitoring and alerting in AWS the really difficult part of this is delivering something that is useful that you can take away in a 25-minute talk with the amount of documentation that exists in AWS and what do you pay attention what are you

not - what do you what do you take home what do you do I've been through a lot of talks that per hour you know hour long and you leave and you have like a year's worth of homework this is not gonna be that right there should be things you take away from this everyone leaves with something right and if you leave with nothing you still take away the fact that you will never get these 25 minutes back of your time so that I can assure you so a little bit about me like really who cares right who cares about what I've done in the past the point is like why me why am I giving this up why

should you pay attention to me why am i someone that can deliver this information right so I've spent way too much time over the last several years pouring through AWS documentation and there is a metric ton of it I spent a lot of time building cheat sheets hoarding tools and doing that I built the incident response service line for secure works that our company offers and doing all that I've come up with the kind of too long didn't read version of what to pay attention to in these three main areas right because this is this is the stuff that matters this is the stuff that we're gonna have to look at when we come in for instant responses stuff

you're gonna have to deal with when incident happens right not if put one so the agenda always start off with the AWS security model it is very very important and it is often misunderstood where AWS leaves off and where customers pick up so we'll run through logging monitoring alerting and I'm gonna leave a little bit of time for a few softball questions so please queue those up anyone want to volunteer for one of those right now all right great nothing like rounding yourself this early in the talk all right so anyone here this is why I moved to eight oh yes right like AWS does it all for you sit back you say well you just

enable you're like yeah we moved to AWS we work on other projects now hurry and all our stuff just sits in AWS right no right no not not how it works right so here's the shared responsibility model that's that's very very critical right and there's there's two operative words here that are very very important right so AWS is responsible for security of the cloud the physical infrastructure the things they provide you the bare metal things the the hardware right but you are responsible for security in the cloud everything you do in there you are responsible for it there may be security tools and things that help you secure it that is not AWS as responsibilities and

if you email them telling them that you're not going to get a very favorable response right so this is why we're so everyone I'm literally not going to advance until everyone says this I've I've I've built time into this presentation to sit here in silence right so repeat after me security is my responsibility and AWS that was that was the best reception I've ever had of that usually people don't do that and that was a bluff I have zero time built into this president Oh suckers and so so there's tons of logging in AWS right but here's here's the main ones that we're going to be focusing on so we have cloud trail cloud watch events there's a

couple levels to that VPC basically NetFlow s3 where you store stuff and we're also going to throw and config there as well because I think that's pretty important this is borrowed from scott piper if you guys know him he also does a lot of AWS talks but i thought this was really appropriate and kind of a nice succinct way of presenting what are these main logging sources right where do their or do they contain what's their delay what's where they stored things like that so that's pretty useful I thought I throw it in here so logging in AWS right so the cloud trail for those who are unaware this is your sis log on steroids right so these are

enabled by default for retention of ninety days but by default you know there's however many regions and they continue keep growing each regions logs are only stored within that region so if you go through if you have your first incident you say great we had an incident in Northern California right and we say okay great let's look through the crowd cloud tree logs and we'll see what account was compromised and blah blah we have an access key and great everything and so you search all logs but none of those logs in your Northern California s3 bucket give you any insight to any other regions by default right that's a big problem and that's a problem we see a lot of times when we go

to companies and they say hey we've had an incident come in here here's all of our logs and we have to spend a ton of time aggregating them fortunate for you you are in this talk and now you know that you can configure these from the get-go to with a nice lob Shin called apply trail to all regions this is something you absolutely want to select when you're configuring cloud trail 100% apply trail to all regions what this is going to do is consolidate every region's logs to one and you'll get to pick whatever that region is you'll probably want to make it close to whatever region you house most your resources where you're going to be doing

stuff with that data but this is a nice easy click option that's only a relatively recent development what you also want to do is disable global logging on all the other buckets otherwise you're gonna be dealing with dupes and this global logging is account sign and things like that that aren't really region specific so if you leave here with nothing else this this is gonna save you tons of time this is gonna save responders tons of time you will thank yourself take this back to your boss look what I learned so cloud watch so these are essentially there's a couple different streams for cloud watch log so system performance metrics these are enabled by default sent every 15

minutes you can enable detailed monitoring and I'll send in every one minute okay great you can also install systems managers and agents to help collect logs from your hosts so you can also instrument cloud watch with aged agents and all your instances and they'll collect all your host logs so you want to collect all your sis logs for Linux servers or Windows event logs it will also collect them and throw them right in the cloud watch for you to to log and alert on config' this is essentially your compliance tracking so easily set up by the console or CLI most of these things are easily set up by the console or CLI and by default Amazon gives you a set of

config rules that monitor for certain compliance things which is great so you are going to want to enable config use the baseline rule set and also develop your own custom configure oles based on what you want to monitor for compliance so compliance we're talking about is an instance launched in a certain region that it shouldn't be launched into does someone not have a password on their account they not enabled mfa this is a great place to manage and monitor that sort of compliance related activity only recently AWS has implemented multi account multi-region data region which basically was the corollary of the cloud trail apply you know to all regions right so before before I think a month

or two ago these also used to exist in separate regions so you'd have a config issue and you'd have in one region but you have no idea whether that applied to another region or you couldn't scan the rest of your logs in one singular bucket for this sort of stuff so that's been super useful thank you AWS bonus you can also use this from monitoring software inventory changes right so this requires them to be configured as managed instances which is a different selection when you create them but it's baked right in it's simply a configuration option when you do that this is an easy you know tracking the monitoring solution natively in AWS allows you to

track certain things so s3 so this is where your daily live right this is your long-term storage stuff right so the first thing you want to do is enable mfa delete no one can delete anything without multi-factor authentication because as an attacker I'm going to come in and I'm going to delete everything that I possibly can and if you have this configured but where nothing can be deleted without multi-factor authentication as an attacker I'm not likely to have that right this is a data preservation thing bucket level logging and able by default this is like create a bucket remove a bucket delete a bucket blah blah which is great so that's enabled by default and every

oh I have s3 logging enabled and they go oh my god our stuff was compromised we don't know what they took can you come in a look at these logs we're like yeah like a bucket was created you know twenty eight months ago and one was deleted six months ago we have no idea what what data they touched or what they took or what what what they even looked at right that's where the object level aka event logging comes into play so this can be configured by its called data events it's not very straightforward but essentially data events the objects in the bucket objects are data some of this stuff's really confusing but regardless you don't have

to fudge through it this is what it means you want to enable it and this will give you access to what actions were taken against the objects was an object put in there was an object copied from there was an object created deleted blah blah blah that's going to be very very critical in your investigations as well server access logs also can be enabled if you like to sort through Apache ish type logs it's convenient it's also extra data to store you've got a way the ROI of that but that isn't an extra source of analysis if you want to look at that so with that when you have that enabled you can also see like maybe

what was the size of the transfer right what was the size of this file what uh what user agent might have been used things like that which can be pretty useful VPC flow logs right so these are essentially your net flow logs these can be enabled for a V PC which is essentially you know a VLAN or a subnet or however you want to title it or an elastic network interface which is basically a network interface right and I would suggest that you enable this for anything you even remotely care about right so it's so storage is cheap you're gonna store these somewhere in AWS but figure that out later because this is a super cheap way of getting

immediate traffic analysis access to whatever's going on in your systems right so maybe just start with certain V pcs that are where your servers are located or start with a few test ones and just set up enable it you'll see the net flow logs coming in you'll see well ok great way of them coming in now now what do I do like I see where they are well they're log to Cloud watch vlogs as a separate log group with a stream and what can you do in cloud watch you can enable metrics you can enable alerts and alarms for certain things so what does this mean so if I simply enable the VPC flow logs for a

subnet or something that I care about it's constantly having you know data going across the Netflix great what do I do with that won't Claud watch I say well if this is enabled for all of our servers well let me set up an alarm for brute force already P or SSH attempts right when we see inbound outbound traffic to known bad IPs if you're keeping track of indicators of compromised apply them here this is easy enable the V PC flow logs it's literally just a click or one one or two CLI commands they're already going to cloud watch you just go in there and create an alarm and say if there's this many logins from a single IP in this many

minutes flag on it or store indicators of known malicious IPs like a blacklist or something you know poor man's blacklist right this is super easy here right what about data X well this is a great way to monitor for data X so rights a lot of times what we called in and that the critical thing is you know seal all exective it's like Gregory but like was anything taken did anything go out right I don't know unless you have V PC flow logs enabled right I can't tell you that because nothing's measuring the traffic unless you have these enabled right and if you do they're gonna be an absolute boon right this is the network

corollary to the cloud trail logs right these are absolutely awesome so the TLDR of logging right so we have log in monitoring learning the first logging aspect right quick rundown basic ones are enabled by default these both others should be configured by you right go poke around play around if no one's gonna do this for you how can I centralize them sometimes they're easy some with cloud trails sometimes you just click a button and sometimes they're not so for instance if you have V PC flow logs in a certain region and there's m2 cloud watch cloud watch is notorious and that it is very region specific so if you want that another region you're gonna have to go

through a very convoluted measure of sending it to a Kinesis stream and then using lambda to fan out to some other bucket right and it's not necessarily straightforward but these are keywords here take these stick a picture research what these mean because this is your easiest way to essentially data replicate stuff to to other regions so how should I configure them well you know we can only go so far in 25 minutes but these are the baseline of what to configure how to configure them what should I name the Gecko like this is like the top 80 90 percent that's gonna give you the most information right so baseline recommendations will go through on the

next following slides for monitoring and alerting so what should I monitor in general this is kind of a very 30,000 foot view of mod ring so if you're looking for environment and numeration or recon you're going to be looking for API requests typically of get and list right and this is not this straightforward believe me like this the the amount of training and education that's needed to really delineate what are the good gets what are the bad gets what should I expect what should be listing stuff which shouldn't that's very deep talks that's gonna take hours and days but this is kind of the the high-level gist of it if you're new to this or just want to get started

resource and data event collection gets describes lists lookups right those are going to be your your core prefixes to to commands that are being performed to do that sort of thing resource creation modification delete to say will remove the API is really awesome and being very uniform and and how it presents the information to you and what certain things mean right so it deletes something is obviously going to have deleted something that's really awesome useful and then log tampering modifications so if you see stop logging right on a cloud trail bucket that needs to be an alarm right through cloud trail logs through lambda functions something right that needs to be alarm like why should someone be stopping cloud trail

logging right why should someone do update logging or update something like who is doing the updates and why like these are things we need to be thinking about as we go through this or set something what are we setting this is this is not like a set something configuration the host OS this is like at a account level set something right so these are things we need to be paying attention to in general so what services and logs should I monitor specifically well this is a bit of a list but these would be the focuses so it's easy to get lost in the soup of here's a service here's a name it's Athena this week and

then it's glue but glue is really behind it and it's Amazon already yes we know that because it doesn't ETL for it's like don't get lost in all that sort of stuff like this is the list these are the things that you should pay attention to first all right so we talked about cloud trail cloud Watch config ec2 this is where all your instances you know your resources your your operating systems and your host live I am this is where your resource policies are this is where your users live identity access management is what it stands for guard duty is a topic we'll hit on a little bit that Amazon came out for monitoring your account or assisting in

monitoring your account s3 is where your stuff lives STS is where things like assuming roles will live were things like accesses to certain things role based stuff is performed and then trusted advisor is interesting thing enabled by default that has some interesting features and it's essentially doing some some high-level granular checks in your account like to do users exist who haven't rotated their access keys or who hasn't logged in in 300 days or things like that so take that stuff use that as a baseline right and build on that depending on your business so what native tools exists for monitoring and alerting and there's some great stuff so essentially the quad Fichte of monitoring and alerting is

config for resource changes compliance type things plus cloud trail all your sis locks plus cloud watch which remember houses your VPC flow logs right so you have basically the full stack here and then you can use lambda which is your ad-hoc running of things and codes and responses to things that kind of get you to a great place natively so if you had these log sources and you're using lambda for alerting a response here in a great place like this is a great place to start trusted advisor of course on top of that cloud watch agents again which you can these can be installed on to the host systems to provide additional movement of logs and monitoring of certain things

and sending them to cloud watch for for alerting and then guard duty automatically monitors VPC flow logs cloud trail and DNS logs which is kind of awesome so they came out with this service several months ago or I don't know when it was sometime late last year I think it reinvent and it's Amazon's kind of foray into here we're gonna help protect you a little bit right but they're non the business protection remember there you're responsible for doing this stuff so they provide you a tool and it does some very useful stuff so automatically it will monitor this stuff this is a great place to start but it's region specific because of course right this is

how Amazon typically does stuff it's region limited we hold out hopes that it will have a an able globally feature very soon and another caveat is it only analyzes cloud trail management events not log data events so that's something you're gonna want to keep in mind and this is just kind of a sample infrastructure of how using native tools might look right so with the quad vechta esque type setup right this is this is an example of how am I looking and there's there's a link at the bottom that you absolutely can't see because of the interrogation light but the slides will be available and you'll have access to all the sort of stuff but this is

kind of just an overview of how that might look in your account right these are the things and how they work together and what feeds into what and and how things get populated so what about third-party tools there are a ton and there are a ton of great great great tools by a bunch of different companies cloud custodian is great cloud inquisitor cloud tracker I can't go through all of these I could spend you know probably 30 minutes on each one of these I would just say take a picture or look at these slides later and investigate most of these are on github most of these are readily available you have a lot of great tools from both

local Silicon Valley companies and companies across the globe so definitely look into those there are great force multipliers they they are companies that have been in the position where your and they decided I'm going to harness the beast and I'm going to make this available to other people so I would HIGHLY highly suggest standing on the shoulders of those giants rather than reinventing the wheel here there is a lot of great stuff with just these tools that are complete comprehensive solutions for for monitoring alerting and logging so what's what's what's it take away of this what sorry monitoring alerting well kind of everything unfortunately everything that you can from basic changes from root logins at a high level

all the way down to specific things that are specific API calls that are being made across your infrastructure unfortunately I can't distill that down further in a 25-minute slot but given the set of tools both native and in third-party it should give you access and insight into how to kind of wrangle all of those things together and put together a comprehensive monitoring and alerting solution so in what order should I set up monitoring and alerting great so this is this also touches home with me and I start think about is how do you set up a write program and then I started thinking about how people try to set up thread Intel programs and a lot of people

unfortunately missed the mark because they try to do all these really complicated stuff from the beginning and then we start with third-party $50,000.00 feeds and tools and then we try to work our way backwards to whether we can actually do anything with it don't do that right this is the way to do it so clearly define your monitoring goals what do you need to monitor as a company you may not need to monitor what the person next to you needs to monitor understand your existing data and information right that's critical what date are you getting what are the log sources what do they look like is that stuff that we need we need to augment

with other third-party sources so exhaust native tools and capabilities before you move on to third-party right so these third-party tools exist because they have exhausted their native capabilities and they want to build something that Adhan for their capabilities and they're making it useful for everyone right so you need to do that as well right and so once you number three and you're kind of sure about its don't proceed to four go back to three because you've likely missed something right it's very easy to move on to the next thing right make sure you absolutely exhaust all of your native tools capabilities a lot of great ones in AWS right and that's going to build you a solid

foundation of understanding and access and all the things you need and then augment with third-party tools so with that that was a record of blowing through 23 slides in under 25 minutes which leaves us exactly three minutes for questions what about it yes so that wasn't considered a major log source in terms of native sources and utilities RDS is great and it's actually behind a lot of great things like there's another talk I duel for analysis tools and capabilities to which if you know RDS maybe you're familiar with Athena which uses RDS as kind of a back-end and SPARQL to do native et yelling and searching of fs3 logs which is I highly recommend as well

so again not not everyone's gonna be using that right everyone is gonna be using these everyone right if you're an AWS you need to be using all these you don't need to be using RDS you might write so for that what about it's kind of what about it for you if it's important to you you need to be monitoring and alerting on that right I need to remind those API calls is something getting set up is something any added access deleted that sort of thing and we also yes it's easier to just unable to send everything and do filtering on the backend so you with with lambda the the key function there is gonna be the replication so the

question was can you do filtering with lambda you can but essentially what we're using it for as a replication function so you want to replicate everything and then worry about filtering on the on the back end and whatever it's stored at the end any more questions yes sir are there cost concerns with flow logs maybe it's probably not the cost comes in everywhere for AWS because everything you do cost something that's useful right but it's hard to offer that advice based on I don't know the size of your company and things like that but again it's these are all judgment calls like right these are the this is the introduction to at least I know what I

need to be looking at right and I and I know like as a best practice I want to enable these right to start going it and then see use you know the billing Center AWS cost billing rate is terrible as it can be sometimes but you know set up a see what am I using how much for our traffic how much is being stored and what's that cost associated with it and then that's an ROI decision for free to your businesses absolutely anyone else I can't see I think that's it all right and I think that's right on time ish yeah all right thank you