← All talks

Building an Auto-Remediation Platform for the Cloud

BSides SLC · 202230:5373 viewsPublished 2023-01Watch on YouTube ↗
Speakers
Tags
StyleTalk
Show transcript [en]

welcome hello thank you for coming today my name is Taylor Wilson and I'll be talking a little bit about building an auto remediation platform for the cloud subtitled take advantage of all your cspms real quick just a minute about who I am again my name is Taylor Wilson I got a degree from UVU in technology management I think it's a sort of a sleeper degree right it's kind of a hidden degree it's a great fit for a lot of us in information security it's cross between business and I.T with a little bit more in project management than like an information systems degree career-wise I did CIS admin work for a few years security focused I was the new guy on the team at the time and they're like hey there's this new thing it's called security why don't you go ahead and learn that for us so we don't have to and that's how I ended up in the security side of things from the security side last six years I've been I've been doing just security full-time and Cloud security as well that's as the cloud security engineer or architect are now now director of engineering and architecture at NuSkin just down the street here um disclaimer I'm definitely AWS biased just because that's what I use the most and it's what lives in my brain right all of the terms that I'll use has specific examples specific application are AWS focused but they work across the the principles the same everywhere including any other Cloud that you use that isn't listed here right um let's see thesis right what I'm talking about I say thesis because I think it makes me sound smarter than I am but it's get value from all your cspm tools by creating your own automated Central remediation system it's easier than it sounds right and that's what we're talking about today it's it's pretty easy to extract value from all these sources of cloud security posture management that you probably already have today by show of hands real quick who has Cloud workloads deployed at their other company or personally anyone who doesn't is fooling yourself because you do everyone does um real quick infosec control types right I categorize a lot of these into just two simple buckets there are preventative controls and detective controls preventative of course being you prevent a in this case I'll be talking about a bad Cloud configuration for being pushed out through IAM at the Indian access management right people only have permission to do exactly what you want them to the way you want them to do it service control policies or sort of settings you can set per Cloud account to say you know we want things done this way or through pre-deployment infrastructure as code scans right in a CI job you scan the things this this looks good it's good to go on right and then detective controls you deploy it then you assess it and say is this does this meet my policy or not and those assessments you know come in as alerts or risk which you then remediate either manually or hopefully automatically um which would I rather have of course preventative and a perfect world we would only need preventative controls every workload we would be deployed with zero misconfiguration there'd be zero vulnerabilities as of when it's deployed or ever in the future right of course that's we all know that's not what really happens um but real quick before we get into preventative versus Detective why do companies use cloud right it's like the thing everyone's doing it these days I know I think it's important that we understand as Information Security Professionals um the business value the cloud brings um you're able to scale your costs with Revenue right you say here's a service it's paper for use as more people use it you're making more money your costs go up with your Revenue right and they go down those Revenue drops um you can quickly onboard new technology you can participate in more rapid Innovation I think of for example recently we had some devs say Hey you know this Kafka thing sounds great let's check it out for a little bit in AWS again that's just my default mindset but you go in manage service for it deploy it you don't have to worry about the operating system the hardware all of your access patterns into any new technology that's provided by a cloud provider is a known access pattern right it's security groups or IEM it just really lowers the barrier of Entry you know two two three days later they're done figuring out what it's all about they turn it off costs like you know two or three dollars um that's that's real business value and then developer efficiency I think we often underestimate the the value of having devs close to their deployments um their deployments are simple infrastructure is code template you push it out and it's there and it's deployed a few minutes later um I think devops is a lot like zero trust which in my in my opinion is a journey not a final destination at least not usually you're all there's always a give and take a push and pull a little bit of balance that goes into like what devops means for you so with that in mind let's Circle back to our two control types preventative and detective in the real world there will always need to be a balanced approach you prevent the worst things from happening but in order to get some of that value from the cloud for your company you need to also have detective controls there are some things you cannot prevent and some things that you'll just have to you know detect and respond to so prevent the worst things respond to everything else cspm I've mentioned it a few times already let's dig a little bit deeper into what that actually is uh in infosec we love acronyms right and it's just like everything's abbreviated everywhere what it is cloud security posture management it's a way to identify risk in the cloud of cloud configuration right it's the configuration of your Cloud control plane it's a configuration of resources as seen by your cloud provider right it's it's anything that you could get to through the AWS web console for example so what it isn't is like vulnerability scanning of like your operating systems deployed in the cloud it's not EDR and point detection it's not application security although there's a case to be made that application security and Cloud security are extremely related and I've got another talk about that it sure is but cspn as a definition Cloud security that it will identify misconfigurations let you know about it so I'm a very Hands-On practical learner so this is what helps me right here are some example findings that you might get out of a cspm tool Amis that are shared with an account that isn't yours right Ami being like a golden image of a virtual machine in the cloud IEM role with overprivileged policy attached you know this role looks like it can do more than it should or an unencrypted RDS instance just a database right in the cloud so let's look at how we would prevent or detect these and sort of the level of effort required to prevent versus detect these things now I did handpick these examples to prove my point of course but an am I shared with an account that isn't yours it is hard to prevent you would in order to prevent that you would have to maintain a list of all of your approved accounts you can share with in every single policy in every single cloud account you have maybe it's doable if you have just a couple accounts no big deal and just a few policies and roles but when you start operating at any kind of scale that that won't work right I mean we operated a relatively small scale and AWS and have 50 Plus accounts that constantly churn a little bit right um the in example a role with too many permissions you could say centralize all IEM roles with one single team who knows exactly what they're doing and will always honor the principle of least privilege then you're slowing down your deployments not getting value from the cloud um you could try service or permission boundaries in AWS for example that would require a change to every single deployment that goes into the cloud which is you know we have thousands and thousands of workloads running it would slow down be hard it'd be hard to hard to prevent easy to detect unencrypted RDS instances same thing right very hard to prevent through like IEM conditions or service control policies easy to just see that it's been done wrong go in and fix it so that is cspm is the source of information that we can use to respond to security misconfigurations now back in the olden days right when I first started doing Cloud which was just six years ago um we decided hey there's a new space it's called Cloud security posture management we're in the cloud let's see what's out there used to go out look for cspm Solutions cspm tools do a vendor selection proof a concept Etc buy something implement it good to go today every single security adjacent Cloud adjacent tool that's out there has some sort of for free will throw in some cspn type findings right they're they're all over the place I mean I was really surprised when like my my APM application performance monitoring tool it's like hey plus here's all the stuff about your Cloud configurations like well there are some good there's some good data there there's some good findings um EDR tools especially have those these days API security it's like some of my network stuff is surprising Network like nids tool it's like hey and since we're here in your Cloud looking at Network stuff we'll look around a little bit more and give you some good cspn findings so what we're talking cspms they really are a commodity they're they're something that everything has there's all this valuable data out there and my thesis today is that we can take advantage extract value from all of these Solutions easier than you think it might be right um so how do we get value from all these things real quick what a cspm standard workflow might be this is basically incident response in a lot of ways you get an alert you determine is is that resource allowed to be an exception to the policy there will always be exceptions as much as we might wish there weren't but it does make sense that there are right for example um we have some sandbox and lab AWS accounts back to the RDS encrypted database alert from from earlier it costs more money to have encrypted RDS instances right and in our lab accounts we have pretty strong guarantees of no personal data in there or sensitive data in those accounts and so let's just save a few bucks and I'll you know allow our sandbox and lab accounts to not adhere to that rule so you'll always have an allow this to check then you'll do your response right maybe you you know that's the stick's a problem whatever you define that to be fix the problem and then I'm a huge advocate for training right identify who made the configuration that was flagged as being insecure and let them know how they can avoid that problem in the future right so I'll talk more about this in a moment so we'll get to it then um so diving a little deeper into the respond box there you know there's kind of four basic things that you do you correct the resources configuration hopefully automatically that's you know if I have an S3 bucket policy that allows unencrypted objects I'll go in and change it to only allow encrypted objects you can very often do that in the cloud without any outage or downtime or incident you could terminate the resource some resources in the cloud are difficult to adjust once live and kind of do require delete and redeploy so you can terminate the resource only if you have good training and alerts to say tell the person that deployed an S3 bucket the reason why it deleted itself after 10 seconds is because it wasn't secure and they need to change this so you know that one only works if you have some good training involved or automatic response to the end user who deployed it and or so other response type add something to the backlog right if a human needs to make a decision it's more of a strategic Direction what do we do here there's no established pattern to follow set into a backlog right we'll have like a GRC team or something um prioritize it send it out to the right person have that conversation with them and always log these alerts for the sock to correlate with other threat Intel and other alerts right it's always good for them to know when I'm looking at this resource what other alerts have been associated with this in the in the past getting a little bit more into the training side now if a cspm response workflow I'm a huge fan of just-in-time training jit it's kind of a term from the manufacturing space that I think really applies well to a lot of infosec applications identify the Violator tell them what the problem was what we did you know this is your old bucket policy we changed it this is the new one uh tell them how they can avoid this problem in the future right and we send out infrastructure as code you know terraform serverless framework cdk whatever um cloudformation we send a snippet say if you put this code use this configuration in your template you can deploy as many buckets as you want and they'll always be compliant and then lastly we always link to a security standard where they can just find out more like why is this even important that we require encrypted objects in S3 right or also there'll be a some like if they need if they feel the need to petition for an exception from this rule or policy for that resource give them instructions on how to how to do that uh we we send these out just via like email because it's the simplest way but you could easily do it through through your chat app or whatever um so so we've talked a little bit about what a standard workflow is for cspn let's talk a little bit about that centralized remediation service this is in my experience the best way to get value from all of those tools which have included and are starting to include increasingly more Cloud posture management Cloud configuration findings uh in in their alerts collect all the sources send them have them send events into one single place and do your response process from there right as a bonus you can often handle a lot of the default findings from your cloud provider right in AWS it's for example config or cloudtrail or Macy or whatever the same system will often apply to to those you can handle them from from the same centralized mutation system do your exception policy management there your response and your jit training now you might ask why why would I not do that from the cspm itself some not all not even most in my experience some cspm tools will have the ability to click a button and it will go into your cloud provider and make the change on your behalf if you do that you're missing out on a couple key steps to what I believe our remediation workflow should be right you would need to maintain a list of exceptions in every single tool that has cspn capabilities right instead of one central place to say this resource is allowed to be exempt from this policy you'd be stuck maintaining that a loud list in every single resource that's expensive from an operational standpoint also cspm tools don't often I've yet to see it identify the user that made the change and have a nice training email sent to them at the time that they're that their resources remediated or changed um in general it feels like you have a lot less control if you're trying to Leverage The built-in cspm response functionality as well as no centralized circuit breaker or you know on off switch kill switch I imagine a large machine running something's gone wrong where's that big red button you just slap it and it and it turns off right um you know maybe it's some resource that is production critical is you know being deleted right after it's deployed every time it's causing an issue do you want to look through all five 10 20 of your cspm sources to figure out which one's doing it or just have a centralized let's just turn that off for now figure it out and go from there so that's the advantage of a centralized service you might ask how is this what I'm proposing different from like standard soar it isn't it's the same thing so we'll talk a little bit about doing it yourself versus doing it in a low code platform if you're doing it yourself you of course have the ability to tailor it to exactly what you want you can customize it to your heart's content um I will say the people writing the code if you do it yourself um often they only need a low level of coding experience they don't need infosec experience as well right we as professionals can dictate what we want to happen and they can be enabled to make it happen in code the way they want um which lowers the barrier of Entry right it makes it easier to find people that are capable of writing these simple simple scripts to do not a centralized remediation platform um around here especially we have a lot of code boot camps or development boot camps you can get Junior devs out of there very affordably with excellent node experience node.js all the colleges and universities in the area I feel like python is on every curriculum around here for a lot of different majors and python is a great language with uh to be able to interact with Cloud apis everyone has an SDK for it it's quite simple um I myself run an intern program where we have a revolving door of two sometimes three interns come through and that's what they get to work on is adding more rules and code to our centralized remediation service um if you're doing it yourself it's all serverless so extremely cheap right and event based so you get very short response times between when the alert comes in from the cspm tool and you've acted on it it's that's less than a couple pennies and happens within a few seconds usually doing it in a low code platform is absolutely good enough like you can make you can extract a lot of value from your cspn by doing the same general workflow in your low code platform it could be even your store which is tied to your sim your security incident event management tool lower barrier of entry in most cases I will say some soar platforms I've seen out there don't integrate very easily at least in the in a secure manner with your Cloud providers most of them do some of them have a little bit of a hard time with that just in like credential management you'd have to fall back to some sort of Legacy authentication methods but it certainly yes it could it could be your sore I mean if you want to do it yeah you could do it a lot of those low code platforms which are gaining popularity and there's some there's some certain advantages to the to those and I will summarize again it's easier than it seems we'll get into that in just a minute here I will say learn from my mistakes avoid this Pitfall I was that poor lady falling into the hole there six years ago a cspm tool was the dedicated thing and there wasn't budget for it so I was all right well we'll just do I'll just do what I can with what I've got and I'm not a programmer but I hacked together a few scripts and made this the system and it and it worked but don't write your own evaluation logic use the evaluation logic built into all of your cspms built into all of your Cloud providers they'll give you security alerts don't try to say I'm going to look at this S3 bucket and check all these things to make sure that it meets my standards that's commodity it's out there everyone's got it don't write your own because then you're stuck maintaining your own unless the caveat there of course is if you have a very specific Threat Vector or you're using a cloud service and a very unconventional manner which I've seen many times then short you can it is easy to just add your own evaluation logic to say anytime whatever it is you're looking at is created let me assess it and go through my own checklist to make sure it's compliant or not and like for example in AW