Beyond Manual: Enhancing And Scaling Security With Automation

Name: Beyond Manual: Enhancing And Scaling Security With Automation
Uploaded: 2024-11-18
Duration: 28 min 55 s
Description: Christian Bauer explores how security teams can scale beyond manual processes through automation, using external attack surface monitoring as a detailed case study. He demonstrates a practical workflow orchestration approach using containers and Kubernetes to automatically discover public assets, sc

BSides Munich · 202428:5552 viewsPublished 2024-11Watch on YouTube ↗

Speakers

Christian Bauer

Tags

CategoryTechnical

TopicDetection Engineering DevSecOps Vulnerability Research

TeamBlue

StyleTalk

About this talk

Christian Bauer explores how security teams can scale beyond manual processes through automation, using external attack surface monitoring as a detailed case study. He demonstrates a practical workflow orchestration approach using containers and Kubernetes to automatically discover public assets, scan for vulnerabilities, and alert owners—then zooms out to discuss broader automation patterns (credential scanning, compliance checks, user access audits) and the tooling platforms that enable them.

Show transcript [en]

yeah so as mentioned this is talk about security automation so there are two things I have to say first first because this is about automation this is very much an engineering doc right because it's about implementing security tasks or automating them and the second thing I have about half an hour for this which is not enough time for such a topic so what I have done here is this stock is split into two parts part number one is a very specific example of a secur Automation and implementation that I'm going to show here here and it should give you an idea what is possible what can be done because I think it does have a certain complexity level of what I'm

presenting and then in the second part I'm basically taking a step back looking at it from a broader perspective and what do you need to build security automations and what kind of tooling can I use for that and what's the overall platform for that but first things first why security automation that's my personal perspective I think the problem with security resers you have a problem scaling right I've always worked in let's say Tech organizations or software driven organizations the problem there is always the security team is much smaller than the engineering organization and you might not only be dealing with engineering but you also have to cover corporate security so how do you deal with that so the idea that I

had about two years ago where I started working on these kind of things is to basically offload things which were done manually overd do automation for obviously freeing up time to do other things instead of these manual or repetitive tasks that you might not even enjoy doing so having said that as mentioned I will start with a very specific example first and that example is external attack service monitoring or external attack service Management in short years and how many people here are familiar with this termal maybe already using such a solution okay there maybe five roughly five people okay so I'm not aware of any dictionary level definition for what esm is so this is my attempt at defining

it first as an organization you have assets a laptop would be one asset but that's not really relevant here the kind of assets that we interested here in here is everything that's facing the internet publicly facing the internet examples for that are for example web application that's the first thing that will come to your mind but that's not the only thing right look at at this more from an infrastructure level what is exposed to the internet is everything that has a public or DNS name that is for example the low balancer where the applications are sitting by the low balancer but it can also be a VPN Gateway for example this is called external adex

surface because this is reachable to everyone on the internet now the problem with that or what might be a problem these assets or the applications on those assets might have certain problems on the most basic level it's a cve like for example something that exploited remote code execution that can be exploited over the network for VPN gateways or in general for appliances it's something that happens more often than we will probably want to but it can be more basic things for example somebody install some web application some off-the-shelf application and this application has a default credential for administrative user but whoever installed that software has not changed that credential so that application is now reachable to everyone

on the internet and it still has default username and password and the idea of the external de service monitoring is we want to be able to detect these kind of things automatically now how can I do that or specifically in this talk how can I Implement that or what do I need so let's first start on this from a conceptual level first this has multiple stages right the very first stage is I have to know what's my Surface or in that case what are the public facing or publicly routable IP addresses or DNS names of my assets I have to somehow collect this information the output of this stage is I have a large list of addresses for

example this IP address which is associated with a virtual machine and some additional media dat information like this machine is on AWS cloud provider or this machine is on an on premise Network like office Network Munich and so on now when I have this large list of addresses the next question is is this Tru exposed to the public because just because he has a public IP doesn't mean it's really reachable because it could still be fireballed right that means the next step is have to perform a board scan to see if this is truly reachable for from the outside world the output of that stage is I have a subset now of addresses instead of

this very large list I have a smaller list now and of obviously this is kind of a blackbox approach I'm not making any assumptions I'm just collecting these addresses and then try to find out what's hiding behind these addresses after I have finished the board scan I know what's really reachable what systems then the next step that I can use here for example is a viability scanner this is nothing new you can use something like nesus for example the output of this stage is I now have a list of findings for each address I have a particular finding like this IP address has now this CV or something like that or particular web application vulnerability now when I've done that I

have a large list of addresses with findings but then the first problem is okay I have a finding for IP address 1 to 3 4 now what system is that I don't know right so I have to somewh correlate this back to the information I had in the very beginning That's What I Call training of the results so I have a finding for particular address and I'm correlating with the regional meta information from the very first step that says this finding is for this address I know this is a virtual machine sitting in this particular cloud account or this is a machine that is somewhere on an on-premise Network in Munich or in Frankfurt or something like

that now when I have that they have a large list of findings with metadata I can publish this results somehow some people might want to to push this into the seam system or some other system by the is and some are sending out alerts there no remind me of any new findings if there's no finding there's no alert if there's a finding then obviously I will have an alert now that's very abstract so how can I actually Implement something like this the way I've drawn it here the vulnerability can can only happen after the board scan the board scan can only happen after I've finished the address collection and address collection if by collecting data from different systems

this might actually run in parallel in short this is a workflow and the best approach to implement a workflow is to use a workflow stration engine the implementation I'm going to show now here is Implement using AO workflows Alo is a kuus native orchestration engine meaning that it's running on top of a kuus cluster basically everything or every every automation TK is running container it's one way of doing it but it's not the only way so let's look directly into a specific example so I cannot do a live demo now what I've brought is a small video that shows what's actually happening from my user perspective so let me switch to the

video so I'm manually launching an esm scan here now this is the argu submit command so I'm sending it to argu which is in the background the browser window shows you the web interface for Argo workflows so what we see I'm submitting Argo submit and this will soon start a workflow yep there it is so we going into the workflow now so what we see here top town the workl has been launched there are circles each circle is basically one desk in the workflow What's Happening Here is blue means this is currently being executed yellow means this is casual for execution now I've redacted a little bit so don't be surprised I'm collecting here data from

four different environments two Cloud providers one DNS provider there's also a spreadsheet that contains address ranges from all prise systems with metadata now the add collection is pretty fast here because I'm using an asset database to condense this information when I've completed all these asset data collections then the next step is merching results to have one big file with all the addresses that I'm building here right now it's currently scheduled now that was already executed now it's branching off I have two things with the branch on the left this is basically just taking all these addresses and Publishing them on a spreadsheet just for reference to know what kind of external deck surface do we

have on the right the blue one that's already has been started now is the actual board scin that's happening now the interesting thing is as I mentioned this is kubernetes native so so each task here is actually a container which is being executed here that means when it's run as a container I can actually look into the container and see the output what's actually happening in there so we go and do that for the board scan and what we actually see here is that we are performing a scanner over almost 500 addresses what you also see here is output of the board scanner which is NAU in that case now I don't have really time to

show everything and also the works flow as a whole takes hours because I think the board scin is maybe 20 to 30 minutes and the actual vulnerability scan is quite a few hours so instead of waiting for that or going to the end of the video I'll go back to the presentation and show you a screenshot of the final work finalized workflow and that's what you see here so the video was a top down representation what we see here is a left to right representation so stage one to four stage one was the address data collection this is what we have seen in the video so as mentioned this is data being collected from an address database

for two Cloud providers for the DNS provider with all the DNS names and for basically a Google spreadsheet that is a large list of networks higher ranges for all the own premise systems the data is merged and then in step two the board scan what we still have seen in the video and then afterwards we are launching three different scans here one is an Asos based scan the second one is a nuclear based scan the third one is actually an SSH Bo first because I have seen situations where people just launch a virtual machine and you permit username password authentication that machine with a weak password so I'm trying to find if there's instances of those as well each

of these scan stages also publishes the results and at the end which is stage four I'm mering all the results and also publishing a report somewhere now I would like to get in all of these details and show what's actually how this looks like from implementation perspective but I don't have the time for that so instead what I'm going to do I'm going to show you one of these green circles so one of the dks was specifically the nuclear based scan stage as a reminder this is running on top of kubernetes so everything is a container and this is the entire script that performs the nucle scan stage so as remember we have an input

and an output for the scan stage the input is a list of addresses that we want to scan the output is the list of findings this is also what's specified here so the input first in the beginning which is the host port list which is this a container so it's mounted inside the temporary directory same there's an output so when this task completes Aro will automatically take this output file nuclear results which is in the temporary directory and make it available to the next stage as an input and below that we have a script which is just a best script which is executed inside the container and there are basically just two important commands the first one is the nule update

template nule is a scanner that uses basically scan de plates so I'm downloading the latest set of scan templates with all the scan signatures then I'm launching the scan here at the bottom what the scan here does is it takes the input the host port list which is a long list of addresses it's going to launch the scan and I'm also saying I'm only scanning templates that have medium critical high and unknown severity levels I'm not interested in the rest the scanner is launched it's running it's writing the outputs the findings do this result file in the temporary directory nuclear results when the scanner is finished this task is completed nule will sorry AO will

automatically grab the result file and then make it available to the next stage and that's it it is basically like 10 lines of code when that's an entire scan state with nuclear the similar approach also applies to everything else that we've seen here before most of the things in the workflow are just a small collection of bat scripts with I think at most maybe 50 lines of bat script code and that's it you just have to stick it together the way you need it based on input and output information now this is one scan stage I said at the end we want somewh collect these results and publish them also somewhere and what's Happening Here I

have not taken screenshot but just the actual output looks very similar to that if there's a finding that has a certain severity threshold this is sent out to a to a slack channel so this suddenly pop up saying he there's a new finding here for this particular machine also gives you the context information where is this machine located what kind of machine is it and description of the finding itself and even better if you also have an asset owner database you can take this one step further you can correlate the finding for the asset with the asset ownership so you can automatically send out notifications as well to the owner saying hey there's this new finding

please do something about it and as mentioned if there are no findings you will not be bothered if there are findings you will see this popping up Suddenly and maybe even send out automatic notifications so I'm not even in the loop anymore I know what's happening but I don't have to deal with it anymore this is the very rough overview of how it works I mean the idea is this is not very complicated just takes a little bit of time to implement it then you have something that's running automatically in the background all the time now taking a step back what are the individual building blocks that I need to build something like this so as mentioned the software stack

here AO Workforce is in that case the relevant component here for the orchestration AO itself is running on top of kubernetes so I need a kubernetes cluster and I need container images with all the tools that I need and on do I have my scripts which implement the business logic or the automation logic in that case if I don't want to use Argo I can also use for example dector which is also kues native functionally very equivalent but can also use that doesn't make a difference if you don't want to co this at all there's also other off offerings whether it's a cloud provide a native offering or use something completely different like Apache airflow which is also an

orchestration engine either way the D is always using the workfl stration engine to do most of the job you only have to connect the dots that's it now how are we triggering a workflow or an automation the ASM examp is something that you can do based on time right like maybe once a week once every two days maybe even once a day I want to trigger that and run it automatically in Aro that's called a Chron workflow because you use a Chron definition to Define how often it's executed and when but that is not the only way how I can do that I can also launch it based on some external events something is happening and I want

this external event to trigger workflow or an automation Aro itself does not support that natively but there's an extension called agu events that allows me doing that it can consume an external event and then launch a workflow based on that so a specific example there was a security incident at octra I think two or 3 years ago their support system had been preached and the problem was their support tickets or some of them contain customer credentials tokens to be specific so the deck has preached their support system they grab the tokens and then probably try to move into the customer environments the idea is a ticketing system can contain credentials and they shouldn't be in there but how can I know

that so what we want to do here now is I want to know when somebody created or updated a ticket or added an attachment is there a credential somewhere in there and this is something I can Implement so every times how the creates or updates a dick the system can notify web hook web Hook is something that every diting system I know supports so you create or update a ticket the ticking system will send will notify the V hook with this information which ticket has been updated or created what we do next is we push out this web hook to a message queue the message cond this information d x has been updated or created and then the next step is this

is where algo events comes in AR can monitor this queue you can see oh there's a new event in that case a ticking information or ticking system it will grab that message and then it will launch a workflow which is ultimately triggered by this um message so what I can do then in the workflow I can take this message which contains information on the ticket the workflow will conduct the ticking system and then grab the ticket itself with all the information or attachments it has and then performance a credential can over that and if there's a credential theck I will be notified again and I can respond to that like for example if it was uploaded by the customer we will

notify the customer we can redact the credential from the de and so on this is just one example for an event based automation where the trigger is not time but rather um an external event in that case a tier being created or updated and the time from the ticker being created until you have to ification is something that takes maybe at most a few seconds and that's it so you can respond very quickly here there are more examples in the esm scan I had in the beginning the address data collection was really fast and that's because the data was scrapped from an asset database this is one example for an automation there are tools like for

example Cloud query that can read data from external systems systems can have an API like for example your Cloud provide environments so once a day I can launch an automation there that uses Cloud query to grab the data from my cloud environments like The Watcher machines how are the networks configured how do the firewalls look like and we write this information into a database for example Cloud query is very flexible here you can write it in the classical SQL database you can write it in the graph database whatever you want and then I have this information sitting there in the S database now when I have data in an es database I can can actually now start

implementing compliance checks with that for example give me all disk images that are not encrypted or give me all F rules that are open to the internet and associated with a vir machine I have this data in the a database so I can just write a database query that actually identifies these kind of misconfigurations this is something I can launch immediately after the s data collection so I've automated compliance checks as part of that yeah the external tax ofice monitoring already mentioned from a timing perspective the asset data collection is something when you have the infrastructure in place like the cators and Argo you can implement this in maybe two days the compliance checks is also just a matter

of a few days the external deack service monitoring depending on how many data sources you have and how many checks or scanners you want to use it takes a few weeks maybe but that's it some more examples yeah the credential skans we already mentioned the titing system but it's not the only source some examples of problems I for example have to deal with are source code repositories particularly those that do not have a cicd pipeline so you don't know what's sitting in there and this is also where I can do a credential scan it's usually something what the cicd pipeline does but if you don't have the pipeline then you are somewhat lost and this is how you can basic Force this

on everyone and it's a Especial problem for repositor that only contain scripts and I've seen plenty of B scripts that contain hardcoded credentials and this the effort for this is very small and I can do this for example if I have repository that has some activity in the last X hours I'm going to scan it another example are the dangling DNS records so what's a dangling DNS record I have a domain for example example.com and this domain is pointing to an IP address today I'm the owner of this IP address but tomorrow somebody's decommissioning that system but they forgot about the DNS name so they didn't s name is still poining that IP address that are no longer own so somebody else

could grab that the DEA here is to identify this kind of dangling references and when I have the DNS information inside the database and the ass information there then this is easily doable by just again a database query give me all DNS records the point do something that's no longer in my database that's a tangling DNS reference another example user audits it's particular something that you have to with certain compliance programs for example once a quarter I have to check if permissions are still okay so people do not have too many permissions this also something I can automate in that case I'm grabbing data from the HR System with the employees information and which departments they

are working for on the other end I'm grabbing permission data from these different systems and then I'm correlating this data like somebody was working maybe in this department X last week but now he's moved to another department but he still has the old permissions and he now has permissions for systems he should no longer have access to that's something I can also automatically check now using this data correlation analysis and this something I can do completely automated now with a very high fidelity rate and that's not even everything there's a lot more you can do the only condition is whatever you want to do first you must be expressible in code and do you must be able to somewh grab

the information that you need for doing so so to wrap up first can you automate everything no you cannot it's not possible but there are a lot of things that you can do automate and the nice thing about the automation is as a human I'm not working 24/7 the automation is though it's always running in the background doing these checks it's just a learing me if there is a finding now another advantage of this is that I can increase the frequency of these checks if I'm checking something manually once a quarter or once a month I can now run it once a week or once a day or even once every hour or almost real time if you use the event based um

launch also something like the external de service monitoring this is not something I couldn't really do manually I need Automation in order to do that I think that's the nice thing you can now do things that were not possible before manually and the best thing most of the things that I need to build something like this are available as open source so I can just just start from this with almost no cost associated with it in the beginning and uh one last word of advice when you want to implement external attack service monitoring it's very important when you're scanning data centers or service providers you have to really check your acceptable use policy if you're allowed to do that usually you

are allowed but you have to notify them in advance that you're doing so like hey every Tuesday and every first I'm going to scan my systems and they have to approve that and with that I'm at the end of the talk and I hope we still have some time for [Applause] questions yes we absolutely have time for questions any

questions we need to press the button okay thank you for presentation uh two questions what happens to alarms when they are repetitive because it's automatic scanning so you probably want to acknowledge and not getting this again and again and I guess uh what happens to reports because they said you build this workflow and it's like outputting to Json but you want to represent this information somehow and it's different tools different formats how do you glue this together thank you yeah so first the question about the alerting that EV on your alerting system right ideally you release an alerting system where when it pops up for the first time and then it's not going to report

again so it's silence silenced after the first um notification and two it depends on what you how you want to report so sending to slack is one way of doing it in a slack team channel the second option is you send slack message to the person who owns it third option you send on an email and the fourth option you create tickets automatically when you have the asset ownership information and the fifth option I mentioned the document is a strch in all the word document but these are more for internal reference they not really used for the and user and user here being the owner of the asset any more

questions yes sorry one question how big is your organization how big is your team and how many in your team can manage this it's not nasty question I I'm working on this and I know how much time it costs me and I'm not so far as you are the organization has 300 people the security team has three people okay thanks and how many of these three are working on this have the to okay thanks but to continue a little bit if you have a problem with the operation overhead they are also as a service offerings which make it easier for you understood perfect

[Music] thanks um thank you I want to ask how to deal with Shadow it or non- manageable it that's not directly um managed by the central ID if you are working in an engineering organization and Engineers like to do stuff themselves because they know better how you deal with that yeah it depends on where these assets are located if they're in a cloud you can directly read them right that's the easy part the problem is classical data centers and first you have to if you don't know this exist then you have a problem there's nothing you can do what happen here is when they are requesting resources like in data center it goes via it so it is a point of contact for

that because they know everything so I'm only grabbing the address ranges and that's it if there's 10 machines high sitting there or 100 it doesn't make a difference it will be automatically scanned but you need a partner in the organization who really knows where the assets are

located we do have time for one more question otherwise I guess you're around at the conference yeah for some

discussions and that's it thank you very much

Beyond Manual: Enhancing And Scaling Security With Automation

Related talks