← All talks

Risk Based Response With SOAR - Tom Wise

BSides Newcastle19:4527 viewsPublished 2024-01Watch on YouTube ↗
Show transcript [en]

hi everyone welcome to my talk um so I'm in the way of the projector um I'm Tom wise H I am one of the engineering leads for a company called ad Dharma that offer manage service and consult Services um I am also there is a reason for it a Sergeant at Arms at a local um hea club which is historical European Martial Arts Club um and there is a bit of shav of self- promotion there uh with the pictures as well so what is s well s is security orchestration Automation and response so basically dealing with security events with uh machine capability generally backed up by some sort of python or Java or some sort of language underneath but

SAR is also a weapon in your cyber Arsenal it's basically something you can use to both defend against and um attack right both offensively and defensively to your attackers um and obviously at machine speed as well because we're now talking at you know parallelization at quite a high level um like any weapon though without the proper training and technique it can be highly inefficient and risky to use especially if you're using it at the sort of speed and volume that we're Happ to use it at our days so bear with me on this one um but this is basically a really old um uh rule set used for people that fight in a Fencing Club so it says every single fencer is

obliged to start the jewel according to the tradition unlike a peasant who runs insanely for a weapon grabs it and wants to beat with it like a witless ox the reason I bring this in is in my experience in the last six seven years now with saw the first thing that company does when they buy this big shiny new tool of Saw is they go after fishing and anyone will know who's tried to tackle fishing in the past it is one of the biggest beasts that you can tackle because there's all different ioc types all different vectors all different sort of ways of getting through defenses and so usually they just beat at it and then they end up you

know delaying things backlogging things and really what you need to do is you need to focus on the basics so the last slide on swords with a demonstration I might add for those of you who are here so um on the left here you've got What's called the Mya Square which teaches you basically four basic attacks but what it also does is it teaches you all of the defenses to the attack that comes from the other way so just by doing the same attack and if you think about this as in you um find a hash in your environment you find it's bad you block it what does the attacker have to do just change a little thing

and that hash is now a new hash and you're doing that constant kind of dance with the attackers at this point but what you want to eventually do is you want to use that experience to kill the fight quick right with all security events we want to kill that quick whether it's a false positive get out of the way preferably don't even put it in front of an analyst or if it's really bad let's act straight away otherwise let's at least put it in some other bucket of categorization and to do that you use your experience to do what's called in sword fighting a master stroke so you hide behind the normal stroke but you pull it into gain the center and

then you threaten the adversary and then depending on the risk as in like how they react to that you can do different things if they don't react you just go over the top stab them in the face fight over if they give you a little bit of a nude you can pull it over the top killing glow to the head if they try and go really hard then you go around the outside and then again you kill the fight that is everything sword fighting so apologies if that wasn't what you were here for um so the current state in security operations so obviously high volume alerts coming in at the moment you know ticket um ticket fatigue is a thing

amount of cognitive load that we're putting on an analysts nowadays is quite high as as well but there is a skill shortage in the market you know Millions the last report is saying worldwide there is a massive shortage in kind of out of the box already to go security analysts while you're training up new people your backlog continues to grow so you know you are focusing on the training but that backlog you know you PL in all these new people for a reason and so your backlog does continue to grow um then when you do finally get them out of that training you you throw them at this backlog and because they're generally new you don't always have the

time to supervise them properly fast and undetermined decisions can be made and obviously things can be missed things can be slipped and obviously the idea of Saw is to provide that kind of consistent framework and also risk-based response provides a framework to use within saw um and obviously the eventuality of this is all of your well not all hopefully um but your analysts do then leave because they're not really doing stuff that makes them feel good they're generally high percent of false positives that they're dealing with so they're not actually getting into the nitty-gritty of security with a future risk aware State once you've got risk-based response and other kind of risk awareness in um in

action is the first thing you do is you immediately reduce the number of tickets an analyst gets because you can already kind of put the the known bad stuff quick the known good stuff out the way um what that then allows you to do is um also highly enrich all these events so when you've deemed that this is a viable event um in some way shape or form that needs to go in front of an analyst you can enrich it because again you can do thousands of events at the same time um you already know what you how you want to enrich it present that information to an analyst so all they have to do is go

yes that's bad or no that's okay they also then can focus on the real sort of high-end security alerts that come through because all your false positives and lower level stuff is generally going to be handled by this framework um what that allows them to do is gain understanding of better techniques tactics used you can then profile your attackers because let's be fair risk is different for every organization and every company so you need to sort of be able to tailor it and then the main thing about I guess the framework I'm talking about today is the the feedback mechanism so get your analyst to feed that back into the risk-based model to allow you to enhance

that and again further make those decisions quicker and more efficiently so risk-based response is a term I came up with probably about four or five years four years ago now I think when I delivered a talk um at the Splunk conference and um what it really needs what it really looks to do and I know it's going to sound incredibly simple is it really just looks to put your events into three primary categories which is known good known bad and suspicious so known good is like if for example if it's a fishing campaign but it's come from any you think it's a fishing campaign but it's come from internal email uh servers or vendors that you already know send out

campaigns things like that that's generally known good and benign and you can just be pushed to the side not even put in front of an analyst known bad one or more of the indicators or items inside of that event has been deemed to be bad by something that you trust right and so there for again those are known bad going to tackle those a different way um and Speedy to reduce the impact but everything else in the middle which let's be fair is going to be a bigger bigger bucket is kind of you're not quite sure at the moment and even the automation wouldn't be able to determine that straight away so we put it into the

bucket but what it does allow you to do is it allows you to still classify from a risk score so you're going to get the known good known bad be suspicious bucket is still going to be prioritized by a risk score so how do you start with space response well let's be fair you analysts will be doing this already right you will already have processes or things that they look for in the relevant tools that they use um in order to understand the impact or potential impact of an event and therefore that will then dictate what actions they will take but what you want to really do is you want to try and bring it standardize it

across your the sock right because you do there is some standardization but you know they might use their knowledge you know Bob might use his knowledge a bit differently and what you want to do is you want to correlate all of that together so you can get it as a standardized kind of enrichment and then you start building this risk-based response on top so you don't really necessarily have to do anything or buy anything new new Tool but you just have to standardize it and bring it all together obviously some examples are you know reputational sandbox Services virus total threat connects Etc some have on-prem and Cloud variants it just depends but the good thing about s is

it's generally able to access any API so and even if there isn't an API there is some other ways of of doing things with s for SSH or if there's a python library that allows you communicate with things for example pardon me but also things reaching into things like command uh cmdbs change tools so if you get an event about something happening on the network can you reach into your change system and see if it was an approved change because if you can that's a known good I don't need to see that anymore and then also you get you know the data lakes and your seams such as Splunk like we sang about this morning uh which I

was happy to see um Sentinel and obviously AWS with their security data late coming soon as well so start cooking so when I talk about risk based response in your own special source as I said before you you get vendors that will give you their risk scores um but you don't necessarily have access to how they've how they've formulated that score and so of course bring it into the risk-based response model but don't necessarily rely on just one vendors because again they might not be aware of your specific bepoke risk requirements so again as I said before capture all of the things that your stock do with existing Integrations perform a gap analysis are we actually

getting everything to make this consistently Deployable in an automation tool build a scoring Matrix which I'll talk about in a second to gather all those data points and then assign some scoring to a good bad and suspicious sort of category and then the main thing you need to do before you really start laying down any code is what is my scoring system am I going to go on a 1 to 10 basis am I going to go one to 100 and what what does each one mean right what what is my known bad act straight away what is my benign I'm not going to even send that to an analyst and then what is my everything else in the

middle so this is a very basic example of a scoring matrix it's a paper based exercise or spreadsheet based exercise that you would need to do I recommend version controlling it keeping it as a document that can be organically updated because this will lay the foundation for the code that you write into your automation tool to then work out if this is this then add this score sort thing obviously you would never use just one data point potentially but I'm trying to show you've got the four standard ioc types with maybe some of the well-known um uh Enrichment Services but then you've also got other things you can go after like usernames device enrichment header enrichment there's all sorts of

additional enrichment opportunities that you could definitely capture on this spreadsheet or even add to it as you go forward so the process at the moment is probably well known by a lot of people if they've ever worked inside of a sock it can take you anywhere from 2 to four minutes to gather information on just one indicator so if you've got an event that has 10 20 plus indicators you can imagine eventually that is going to take up a lot of time and even if we're talking 20 30 minutes that's too long at least to get to the point of understanding how bad it is um a lot of the risk and the prioritization decisions are kind of left to the

Analyst at that time because they're copying and pasting they're trying to understand as they're doing that and you can't really understand the risk or potential until you've at least understood generally what the event's actually about and how bad it might actually be um false positives as I've discussed already make up a large proportion of wasted time in a sock as well so if there is some way that we can get rid of even half of those then already your sock analysts are are are enabled to do a lot more um as I said before if your average triage is about 30 minutes for most events you know in ransomware that's the difference between operational and compromised right so

again you want to be able to really finish that fight quickly understand is this good is this bad does it need more investigation and at least if it needs more investigation and it's a very risky one but hasn't made the threshold for the bad it's still going to be the first one that gets tackled by your analysts so with an easy or say easy simple impl early implementation of risk-based response you automatically get all of your events enriched so no longer copying and pasting uh into events from from the basic stuff you might still need to go out and do some extra intelligence gathering but all of your basic enrichment and the presenting of that data to an analyst is all now

done in parallel multiple events at the same time so again this is before they even touched it the automation does all of this stuff for them it will then work out the score and provide that that prioritization and if you set those scoring limits up in your playbook then you can get into those three buckets that we were talking about and again the queue even if it is in a suspicious queue is still prioritized via risk score and you've reduced the amount of false positives already coming into the um into the um sock just by implementing something like this we implemented it for one customer um one use case fishing uh ironically um and we saw a 67%

reduction in false positives actually going to the analyst so again put that back into the time spent on an average um event that's half an hour per event and this was a large Financial organization so they saved many hours uh with a simple implementation of risk-based response so more mature um obviously you've got all the benefits of the early implementation but now you've got high severity events are acted upon immediately because you've become more confident that the risk scores that the S platform is providing you is valid right so you're like okay that one can just now be almost end to end automated at I would always recommend maybe leaving at least one checkpoint in there

like an approver of some description before taking action on a firewall or something like that unless of course you are absolutely sure that all those decisions are correct s get distracted by the uh train flying by um also your analyst decisions become a lot quicker and more accurate as well because you're presenting them with all the information that they need and they're they're just ready to go they're not having to go off copy and paste and do all this all they need to do is understand what you've presented to them and generally just make that decision of yes this is good leave it alone or uh no this is bad let's do something but then the final

bit as I said earlier with the frame framework is it's the feedback loop so with this just being simply a spreadsheet exercise that can evolve because let's be fair attack vectors evolve attackers evolve so there'll always be new things to look out for so just keep feeding back into this model so keep that spreadsheet version controlled um always keep it up to date um and also the main benefit with sore in general as well but especially after you've implemented something like this is your analysts are freed up now so they've got a lot of tasks to do and they can also feedback into it so I've seen customers do like monthly competitions where the sock will

recommend what they want to automate most of them it's like automate the annoying so whatever the sock get asked to do and roll their eyes every time they're asked to do it automate it with a sore platform but didn't realize it was going to be that dark so because we're heavily reliant on this risk score that we're calculating though trust would verify so what that means is because you're potentially going to make some pretty big decisions or do some pretty um big response actions based on this score you need to know that that score has been calculated properly and by doing so what you need to do sorry what you need to do is go back over so once you've finished

all your enrichment have the platform go back over and just check for any failures if there was any failures inform the analyst say look we've come up with this score based on what we know but there's been some issues in that calculation so please just go back and verify it it might end up being false it might end up Rising the priority but just let them know because if we're going to make fully automated decisions of this risk score we need to know that it's definitely correct and then as you get more confident like I say you can just remove that checkpoint and start fully automating things don't know how we're doing for time just for time I said I shave a

little bit of time off right you can't see these pictures very well so I do apologize so I do as you saw at the beginning I have a Splunk F I'm SP part of the Splunk trust um uh which is just community members but this is a Playbook that looks inside Splunk s now this is an all enrichment Playbook one enrichment Playbook to rule them all where I basically have a bunch of the other type of playbooks all running at the same time so I've got like a file line there I think that's a domain or URL one and then another one another one there and then at the end as you can see in the middle I just aggregate all those

scores together because I'm not providing scores per indicator I want a contextual score of the entire event because that allows me to do that initial prioritization um also I have these playbooks set up so that basically they will have a Gateway at the beginning of every Playbook to say have I even got any work to do because this gets around race conditions cuz if you sent like them all down the same uh Playbook and some of them didn't work and you had the aggregation piece that was waiting for the allall to finished then it wouldn't work so to get past that I buil build the playbooks so that they will still complete properly so that if something

is waiting for everything to complete it will complete in time but it's just basically best practice within at least this platform anyway probably should be in most other ones then this bit is just all about getting the information making it pretty presenting it back to an analyst in something as simple as a table um you could do it depending on the S platform you're using you could present it in all sorts of different ways this is marked down supported um so it could be markdown HTML or any other way as well that you can um present that information it's got to be easily inestable right from an analyst needs to be able to go y

that's bad um and then we just do a little bit of code again it's really simple it is is just get the data points do if statements to say if it's this and this is where we go back to your scoring Matrix if it equals this then okay that's five points and accumulate that score into a single list and then aggregate it at the end and then pass it out as domain score IP score and then do further aggregation to get your container score or your event score sorry um and then just pass out the information at the end sorry I'm just uh going through some of these a little bit quicker um so this is a very basic

Playbook that will get you the categorization that we were talking about so you've got the Playbook to rule them all which obviously has a few playbooks underneath it but once that all completes I have a decision that just says right what's the score if it's high it'll go around one way if it's low it'll go the other way otherwise it will escalate it to um an analyst and put it in front of them so just I realized that was very fast so thank you for holding with me so this is a just a quick recap too long didn't read um start off with enrichment like I know fishing is like a big you know Moby Dick sort of thing let's go

after it but if you can get the enrichment right and if you can get it enhanced enrichment right then all of the use cases just bolt on the side of it because let's be fair every use case has an enrichment requirement in some way shape or form yeah just finishing up in 30 seconds or so um so create a boring Matrix uh Version Control it like I say you might always need you might need to roll it back or you might need to go oh why did we assign a score back at this particular one but now that scores up cuz it does change right things will change if you've got a threat team that are

constantly monitoring for things like this you do get things that get downgraded upgraded as well so keep an eye on this and use it as a living document um build further context with the risk-based response model use new data points or even new ones from existing Integrations to get a better and more refined score I said earlier you know you can get the normal kind of indicator enrichment but consider user device header and any other type of enrichment that you can have access to from a S platform for and once it's once it's installed or once it's AA running nicely you can just bolt on all of the use cases and the really important thing

is validating that risk score because you've got to have confidence like I saw um customers will have like a Skynet sort of worry about s taking over the world and they have all this worry about oh but it's just going to start blocking stuff that it shouldn't be it could do if you write stuff badly in it so if you can get confidence in this score and prove to even people who don't understand the technology that it's always making the right decision then you get a lot more Buy in and a lot more help in order to build your s capabilities and that's that's everything thank you for listening