Automating Threat Detection and Response at Scale - Egxona Ferati

Name: Automating Threat Detection and Response at Scale - Egxona Ferati
Uploaded: 2023-05-12
Duration: 24 min 6 s
Description: If you are a SOC analyst or Incident Responder at any organisation or have ever dealt with confirming whether you have a compromised user, network, system, you share the pain of sifting through a lot of data and having difficulty drawing conclusions of what’s malicious or benign activity, finding th

BSides Prishtina24:06330 viewsPublished 2023-05Watch on YouTube ↗

Speakers

Egxona Ferati

Tags

CategoryTechnical

StyleTalk

About this talk

If you are a SOC analyst or Incident Responder at any organisation or have ever dealt with confirming whether you have a compromised user, network, system, you share the pain of sifting through a lot of data and having difficulty drawing conclusions of what’s malicious or benign activity, finding the correct data set to look at, figuring out how to isolate your network, etc. Oftentimes, allowlisting via threat detection logic itself does not improve rule fidelity enough to reduce noise resulting in alert fatigue for analysts. This presentation hopes to shed some light on similar pain points and solutions I have worked on so far.

Show transcript [en]

thank you so much uh thank you for sticking around to those of you that this afternoon uh my name is exona I'm really excited to be here this is my first time in kind of a security community in Kosovo um thank you to the organizers to all the speakers I really enjoyed the talks throughout the day uh so I I will be talking about how we're using automation to scale threat detection and response um this is kind of a rough agenda I'm going to talk a little bit about who I am and what I do uh we're gonna go through a review of what detection response really is uh I haven't heard so much about it today more so on the application side of vulnerability side so uh we'll go through a short overview and what that looks like then we're going to talk about why we even care about automation why does it matter um and then how we might be able to apply to different environments what that looks like the trade-offs around it and then some use case examples to kind of be more of a practical guide and how you might be able to incorporate it into your environment yeah so as mentioned before I'm a security engineer at meta or formerly known as Facebook I've been there for about five years um I studied in computer science at California State University uh East Bay so I've lived in the San Francisco Bay area for a while and then moved to London more recently I've been living there I'm really passionate about diversity and specifically women representation and infosec I've always been outnumbered and I'd like to change that and help as I can um I am involved in like women in cyber security organization or internally at the company an initiatives to build a community within that representation if I'm not working or in front of a computer I really enjoy being outside traveling spending time with my family I'm really passionate about well-being and health and like cooking healthy meals and things like that so now you know a little bit about me so let's talk about detection and response so basically what I do is I'm part of a team that responds to any malicious Insider or external threats to company data infrastructure Assets in general and what that looks like in terms of what we do specifically is kind of this life cycle so in order to find any Badness in the environment we basically have to get Telemetry or we have to ingest necessary data uh this might be I don't know network connections like DNS logs it might be things on the host like maybe you want to uh have EDR uh uh endpoint detection system that monitors and Aggregates data for you um so in order to do any kind of uh hunting or detecting or responding you want to make sure you have data that's actually reliably ingested uh you're not missing logs and stuff like that the next step is hunting around that data so basically you want to be proactive and not just wait for maybe a partner team to reach out and say oh there's a vulnerability and people are exploiting it you know nothing about and you actually have those kinds of servers and they're on patch and blah blah so you want to be proactive about identifying uh some ttps which stands for techniques uh tactics and procedures so basically uh open source or threatened threat intelligence in general about threat actors and uh or indicators of compromise that you might look for or just in general understanding your environment so what kind of machines you have running what types of servers what might anomalous look like in that environment and that can get really detailed and in depth but this is just in general what we identify as hunting once you have actually understood what your environment looks like what Badness looks like maybe you are feeding in threat intelligence and you're able to look against that we go to detection where you basically have some sort of detection engineering framework or whatever your company has to actually write a query or run some type of syntax logic to identify that event when it happens and trigger an alert for review for someone at that stage you would be in the response team and so in this part of the life cycle is basically where you grab that alert most likely someone an investigator or analyst is going to grab it and look at it and try to identify is it actually Badness indeed or maybe something happen and it's actually a false positive so a lot of manual analysis um and investigations to try and correlate information understand what actually happened because the event that was flagged is quite limiting you can't really make a lot of sense out of just maybe a single line of you know a log detail like a file event or something so this might look different depending on the organization like if you're a very small uh company you might Outsource some of this like maybe you you have a sock or like you hand off actually investigating the alerts and you only get reached out to if if they determine that it's something bad or maybe you know you have like I just want to point out that this might look different I'm this is just what I'm used to working at a bigger company so if we incorporate automation this is what it might look like and as mentioned before it's not like conclusive this is just my Approach um so after you detect basically and if an event is flagged that uh something bad happened maybe you have a compromised host I don't know uh this is the point where ideally you should have uh some type of automation framework Incorporated that picks up that alert and tries to do things automatically before it's even up for manual review for an investigator and engineer so this might be data aggregation uh checks against Intel or performing common steps that are manual otherwise and those can either be generated through uh kind of retroactive analysis of your investigations in the past so you can see that a lot of people are performing X activity or like uh action and you're able to actually automate that her the last step here that I added I just want to point out that uh ideally you have a feedback loop for all of the other components of your cycle and the post-mortem phase is kind of where you should be able to go back to your Automation and say actually you could have done better than you did and that's what we want we want to be able to Once something is uh stood up for manual review for an investigator engineer they're able to then have some sort of feedback loop that either goes to detection or automation depending on like what the limitations were so the next step is kind of why should I invest in automation during my time so far in my career I have gotten the chance to focus on different sub areas of detection and response and I have focused on automation a lot and I have had the opportunity to think about trade-offs here one of the things uh sometimes my job is really exciting other times it's a bit boring and basically I want to be able to automate more of the boring stuff so I can do more exciting stuff and so if I have to investigate an alert over and over again uh I don't really want to do it and so if it's a boring task and it's kind of mundane and I don't want to do it then I'd rather have automation to it manual work is not really efficient so over time you're not going to be able to scale maybe I don't know hundreds of alerts per day um so having to manually go through queries it's uh it's quite inefficient with that comes load management so for example you're gonna have to hire more people to review things manually rather than investing maybe an engineer to automate some of those things away The Next Step directly related is indirect false negative reduction just because with a lot of uh a lot of manual review of the same things a lot of people run into what it's called alert fatigue and basically you're looking at things all day uh and maybe you're so used to seeing this always is ah this is always a false positive and you close it out but it was actually a true positive and thus has led to a false negative which means it was malicious and missed which happens all the time and also can lead to bigger scale incidents for a company a smaller margin of human error basically what I mean by this is if you as a person go and run I don't know SQL or whatever wherever however you do your investigations and run analysis you as a human are more prone to maybe missing uppercase lowercase or like missing I don't know uh the extension or whatever it may be and so if you're proactive about having automatic queries run that a person can just click and it's gonna run for you you're going to reduce you know the margin of error there over time because you've invested proactively in automation uh you're going to be able to see impact that will allow you to actually invest resources in other maybe high priority things such as onboarding maybe new signal into a new infrastructure of the company that you haven't really touched before so with that with less of the mundane and kind of boring tasks because truthfully there isn't always just exciting things to do sometimes you're gonna have to deal with like low Fidelity low Fidelity rules that are very false positive prone and last but not least and this is not an exhaustive list this is just things that you know I think are quite common uh is detection limitations so sometimes actually during the detection phase that I talked about in the life cycle it's limiting you're not able to correlate a lot of events to actually identify if something bad happened so for example a parent process maybe you know three steps down spawn the process it's quite unusual you're not able to upfront maybe because limitations of most detection engineering Frameworks uh have kind of limitations in how how much how dense of a logic you can run and so you can use automation to actually you know uh sift through things easier once an alert is escalated uh from like a basic heuristic basically so how do I incorporate Automation in my life cycle as I mentioned uh I'm used to kind of a bigger Scale Company where things are made uh accustomed to the environment more often than not so I'm not really familiar with a lot of tools out there but I know that for example Jupiter notebooks are quite commonly used in the industry um for this type of purpose uh commercial commercial tooling are also available uh so a lot of times if you buy EBR or you have a seam or something like that they will have accompanying tooling or like things you can add on to fit your needs in terms of this concept and then also like if you can influence for example partner team themes or being able to hire maybe more of a software engineering background folks to actually build these tools on top of things that's also possible and that's what I'm used to most of the time even though I talked a lot about all the good things about automation because I'm a fan of it there are trade-offs to consider and it's not always the answer so you have to think about what works for you so if you're kind of more on like the starting to build your program and you don't have a lot of resources and you likely want to invest you know your costs and capacity into actually having data and aggregating logs and you know maybe a formatted way and clean way that's probably where you should invest your resources however I think if you are more on like balancing out uh how how all all the things work out and you're at a place where you know you have the capacity to onboard new technology then I would vouch for it you should think about frequency of you know actions performed like you don't really want to automate something that you run maybe or do once a month and it only takes 10 minutes because the reward is not that great so the impact is not that great so having these considerations uh over like ahead of time there will be some maintenance overhead uh you know the code that actually automation runs on needs to be maintained and things like that so just uh wanted to call out that there is also negative things that we should consider about it with that we're at the use case examples um I wanted to talk about phishing because it's one of the most uh you know talked about prevalent things that happen in the real world and we know that it's kind of very it's one of the most common infection initial infection vectors used even by threat actors and but they are very false positive prone and it's really difficult to like for them to be you know meaningful sometimes and can get boring so uh in this case what we want to do basically think of the Playbook basically documentation you have for an engineer or like a Playbook on like this is how you should investigate this how can you write that into code basically how can you script that um and think about things that you Ma very likely will need in in order to investigate this type of alert so for example highly likely that you're gonna need the email contents the email headers send their information if it contained kind of interactive content such as maybe URLs or attachments or other things um if it's not so much like social engineering type fishing where they want you to call them back and whatnot but more so like a malicious email with content you're probably likely going to want to prefetch kind of information that shows if a user interacted with it do they click those links can you see that in your uh Network logs was anything downloaded what happened things like that and then probably likely if you have an infection is ways to remediate that inbox or the user and so the way I have listed these things is that for example you you can very likely automate the highly likely stuff you can prefetch all that information for someone to review and in case instead of them going to run these queries manually they will already be there and probably will take them to the three minutes to actually analyze that if there's anything that stands out in email headers security professional you're probably going to see that right away instead of spending time to actually go do that if you're unable to automate the rest of the process so actually determining if it's bad but basically pre-fetching information so in this case I just wanted to show a case that might be fully automated um there are other things that you might check but this is just some of the checks you might do so basically you want to check if an email is indeed malicious I don't know if I can use this oh okay so you want to check if the email is indeed malicious and maybe some of the checks you would do is check if the email is unexpected unexpected the sender is uncommon headers look off like maybe you know send their addresses don't match SPF is off whatever uh it's trying to impersonate someone maybe uh that you actually communicate with but this looks slightly different the the email has interactive content and maybe you have kind of automated analysis in place so oftentimes companies will have Dynamic analysis available and you can submit something for it to be run in a sandbox or something like that and if you're able to automate that step basically it will come back with something saying whether it was credential harvesting or not or otherwise interesting um the other step is around if actually someone interacted with it because let's say we have a malicious email but nobody did anything with it we don't really care other than maybe adding those iocs to some sort of tracking if you do that or some sort of block listing or otherwise known as the blacklisting uh if you do that so in the end if you have for example it looks suspicious something is off with the email our automated analysis shows that it's actually bad and no one interacted with it then we can just run mediation and remediations actions such as for example deleting the email adding iocs so for example domains uh to block listing or tracking or if you do thread and build tracking of some sort and then close the case as true positive because it was indeed malicious even though it did not lead to compromise the other case I wanted to talk about is malware because it's also very prevalent like if you work in an inside in any type of search team and you're responsible for a company's assets such as user laptops you're probably going to run into a lot of commodity malware adware and general unwanted software like pops um it's gonna get quite quickly depending on the size of your company a big volume of alerts and basically in this case you again want to think about how you can codify your workflow or the Playbook that you work with so being able to flag any type of event uh proactively at uh before the time of of alert is raised for manual review and this might include how prevalent is this event in your environment what does anomalous look like more on the host than maybe the network depending on you know your environment you might want to grab suspicious events during the time that are related to downloads or network connections again maybe you have Dynamic analysis for binaries in place and you can leverage that or maybe you can vouch for that to be implemented in the future maybe you know you're flagging against hashes matched with uh you know things marked bad in VT but it's actually outdated indicators and you want to improve that but you don't necessarily you want to put it in your thread Intel uh place where maybe you're gonna get a big pile of indicators you can't really manage to update regularly so you might be able to use automation that gets more reviewed and tested over time to allow lists to some extent so there is a lot of um things you could do this is just kind of some things to think about and then in case of a true positive maybe instead of because not everything is automatable and we don't necessarily want to automate things because you need to scope an incident out you can't just oh remove that binary and close it out instead maybe you can leverage automation to prefetch these informations so points and analyst or investigator in the right way to like okay you actually need to contain this incident what does that look like maybe you want to isolate the host maybe you want to quarantine the file only whatever so leveraging that in any way to save time basically so the recap to everything I talked about is that in my opinion automation is quite crucial especially to scale uh an environment at least you should try to incorporate it as you're building out your program uh and whatever however your environment looks like there are pros and cons to consider and there's not everything will be automatable so there is still a lot of Need for human expertise when doing it especially incident response or once something is bad and figuring out to what extent and like how to remediate and contain that and with that thank you so much for listening I know it's quite late thank you that was great thanks so much does anybody have any question Maria I mean first of all thank you so it was really interesting talk especially seeing it from my other perspective basically I just wanted to ask because you mentioned automation a lot is there like any cases where like automation has actually like fired back because for example like you automated something and um like it happened that actually this was not a false positive for example or something similar yeah definitely and that's why I say not everything is automatable and it's to some extent so I I will share a case where uh just going back to the fishing and like maybe the limitations with that you you could have automation check whether there was any interactive content in the email or if it was just uh kind of like plain text not really interesting it was more of the social engineering side and then our automation thought okay it's fine the person didn't respond so we can just close it out and there wasn't anything flag malicious so we didn't delete it but then the users saw it later and responded and then fell for a social engineering attemp

Automating Threat Detection and Response at Scale - Egxona Ferati

Related talks