GT - That Escalated Quickly: A System for Alert Prioritization

Name: GT - That Escalated Quickly: A System for Alert Prioritization
Uploaded: 2022-09-04
Duration: 46 min 29 s
Description: GT - That Escalated Quickly: A System for Alert Prioritization - Ben U Gelman Ground Truth @ 14:00 - 14:55 BSidesLV 2022 - Lucky 13 - 08/09/2022

BSides Las Vegas46:29126 viewsPublished 2022-09Watch on YouTube ↗

About this talk

GT - That Escalated Quickly: A System for Alert Prioritization - Ben U Gelman Ground Truth @ 14:00 - 14:55 BSidesLV 2022 - Lucky 13 - 08/09/2022

Show transcript [en]

the talk right now is that escalated quickly a system for alert prior prioritization and it's with ben gilman of sofos so please welcome him

alrighty quick audio check hopefully everyone can hear me okay hi i'm ben gelman and i'm presenting that escalated quickly a system for alert prioritization uh we have plenty of time here together so take a seat sit back relax enjoy the show so first let me just acknowledge um all the people that worked on this project uh it takes a lot of people with a lot of different specialties to put together a project that has a lot of moving parts and get it working in a realistic scenario so thank you to everyone who worked on it and let me just start by talking a little bit about myself i've been a research sof research scientist at sofos for a little over the

last year and most of my work has been focused on integrating machine learning in a way that analysts and real humans can utilize and improve the efficiency of security operations centers or socks for short and we want to see this machine learning really get used by real humans and that's kind of the goal of this research and i'm going to talk about a lot of that today but before that i did five years in government-funded research and development and this is kind of where i started in the intersection of machine learning and cyber security so the first things i kind of did here was using machine learning to analyze source code we attempted to categorize

it explain it and segment it in logical ways and eventually we took those techniques which were pretty successful and started using them on binaries and then even further we started combining those two projects and using it for reverse engineering combining binaries and source code and creating an explainable system for reverse engineers to use i worked on a variety of other miscellaneous tasks there as well but before that i did two years of post-grad research at academic institutions i didn't do any cyber security here but some of you may recognize some of the labs that i worked at they looked kind of like this no offense to any academic researchers who are here so all of my schooling was in computer

science with a focus in machine learning but anyways enough about me let's talk about the talk so i just want to go over everything i'm going to say today so you kind of get a general idea of where we are during the presentation at various times and so you can kind of keep everything in context because there is a lot of moving parts here so the first thing i'm going to do is talk about the problem context i'm going to tell you how our system that escalated quickly or tech for short uh operates in a real world sock just so you kind of see what it actually does and then we'll go back and describe um

why we need this and what are the main problems that are affecting socks and why we need the solution after that i will go over the actual system and talk about its four main modules and how they solve those problems that we reference in the context after that we'll talk about experimental setup so how we actually got the system up and running the kind of data we used and how we plan to evaluate it in an effective way and finally we'll talk about the fun part which is the results on the key takeaways okay so let's jump in right to the problem context so many of you are probably already familiar with how a sock operates but

i'll go over it from a high level so we at sofos operate a sock and we have a lot of customers that we're trying to protect and from those customers devices we're constantly collecting events using different sensors and those events go to a single central platform which we call sofo sofocentral so of course in its name this is where we collect all this information from the customer devices uh to perform analysis so sitting here in sofocentral is a whole array of handwritten rules that domain experts have used in order to determine how severe these events are that we're collecting from our customers so this is a little bit of a simplification but basically these rules these handwritten

rules can assign a high medium or low severity to these events or in other words these alerts and what happens is that when a high severity alert occurs we open up this thing that we call an incident and an incident grabs all the alerts from around that time period and packages it up in one little box every single one of these incidents needs to be manually inspected by a cyber security analyst in order to resolve the potential malicious behavior that's happening on the customer's devices the main issue here is that the rules that assign these high severities are very generous with how they deal them out and the reason for that is we don't want

to miss any actual malicious behavior that's happening on the customer's devices so we assign lots of high severities which opens up a lot of incidents and that leads to some busy analysts now what we're trying to do here is use our system data security quickly or tech to sit in this incident creation process and what tech does is it re-evaluates the severity of these alerts and just to give you an example here there may be a high severity alert that comes in tech looks at it and realizes this is not actually something important and that results in an incident that no longer has any high severity alerts and this means that the system can deprioritize this case heavily it might

not even be worth it for the analysts to look at at all but not only that tech is also able to look within a single incident and help point an analyst to specific alerts that it thinks they should look at some incidents you just have to open there are actual threats that the analysts need to work through but even if that's the case even if they really have to look at this incident at least they can get through it faster overall this leads to some happier analysts and i don't want to spoil all of the results just yet but just to give you a reason to keep listening to the talk we see a reduction of about 47 percent

on average in these false positive or useless cases okay so that's how the tech system fits into a real world sock now why does this matter some of the reasons are obvious right if your analysts are more efficient they don't waste their time on cases that don't need their attention that means you can have more customers and more customers means more money and by more money i mean the analysts can work to protect the world against cyber threats and help the people who need them so it's not just for business reasons but also the kind of work of having to go through cases that they've already seen before um is frustrating it's demotivating and it's tedious

it's bad for job retention and it's bad for job satisfaction of these analysts so letting them do work on cases that really need their attention is important for their work life as well okay so now that we see where tech kind of fits in and why it matters we want to address four main problems that cause these inefficiencies in socks and there's four main issues that we identify that tech attempts to solve these are sensor diversity false positives evolving threats and human integration so i'm going to go through every single one of these in detail so that we really understand where the problems are coming from so the first issue here is sensor diversity so sensor diversity at first may not

sound like a bad thing socks employ lots of different sensors they catch lots of different behaviors that's a good thing right um the problem is that these sensors are not standardized uh every sensor captures a different piece of information focuses on a different thing or outputs the data in a different way and that creates challenges both from an engineering perspective and a machine learning perspective every time you need to add a sensor that means you're going to get your data output to you in a new way um so somebody has to make sure to program that data so that it flows into a database and is tabulated in a way that a machine learning model can

understand or even just for analysts to understand and also there can be lots of missing or changing information as you add new sensors remove sensors tune them or just the fact that sensors look at different things so this table is a subset of uh two fields from different alerts across two different alerts here uh just to kind of give an example of these issues so here in alert number one it's a sensor that picks up a powershell command and if you look at the command line field there is indeed a powershell line there is indeed a powershell command line here however for the file path feature which needs to exist because other sensors do look at file paths there actually is no

information that the sensor picks up so it's just a piece of missing data and then an alert number two which is a sensor for mimikats it's not able to pick up any command line information but it is able to find a file path that points to a powercats.dll file so this missing data is going to be an issue if you want to actually train a machine learning model for this you're going to have to eventually deal with it somehow but that's not the only issue with stands for diversity the other problem is tuning so in this graph we have a set of our most active sensors and the proportion of alerts that they generate so the most active sensor generates 20

of alerts now that's not necessarily good or bad it's just the way it is maybe all those alerts are really good but that's exactly the problem is that you don't know until you test it and what that means is that if you have customers that have different requirements or you just want your sensors to be in line with each other you have to constantly tune these sensors based on how things change and sometimes the cyber threat landscape changes and the sensor tuning you used before no longer works so this is a constant manual effort to battle with the threat landscape and your sensors it's an expensive process okay so the next issue is false positives

so the best way to kind of understand this is through a little story let's imagine you've got an analyst and they see this very and this an alert generates this very first command that you see at the top here so the analyst looks at the command they go to the customer's devices and they realize oh this isn't actually a problem so what they do is they decide to whitelist the command there's no reason for them to ever look at it again it was just a waste of their time then over the next week you see these next five commands come in and these five commands are pretty close to that first command but it's not quite

the same the analyst looks at the command they go to the customer devices realize it was the same issue they had before and say okay we actually need to expand the white list to capture lots of different near duplicates of this command so what they do is they write a regular expression on the white list and now that regular expression will catch all the similar commands right so weeks and weeks and weeks go by and everything's great the regex the regular expression is capturing all of the commands there are no issues things are good now weeks have gone by and the analyst has kind of forgotten that they even wrote this regular expression and they're not maybe they even forgot

how they resolved the case in the first place and then finally this command down here comes in and it's just a little bit different than all the other ones that were around so the analyst looks at the command goes to the customer's devices and while they're resolving the incident they realize wait a second i've done this before in fact i even wrote a regular expression for this and then they realized that it wasn't enough to cover this near duplicate either so once again they have to rewrite the regular expression just to fix the white list now i uh flew in here from washington dc and if i had a dollar for every time i missed the backslash in a regular

expression then i probably would have just kept flying to hawaii and that's to say that this is not a very fun task and it's extremely tedious to constantly have to update these kind of white lists and the reality is that it's not even feasible there's going to be tens of thousands hundreds of thousands of circumstances where you get near duplicates like this and maintaining this forever is going to become more and more costly and just to really hammer that point home here's a graph of our 20 most active sensors and how precise they are how often they actually generate true positive alerts as you can see the numbers are not 1.0 which means that it's not good enough

okay the next issue is evolving threats so this kind of command line at the top is kind of a good example of an evolving thread so what this is is a wall bin which is a living off the land binary which is a type of attack where the attacker uses a binary that's already on the system in order to do something malicious now i know that logans are not brand new they've been around for a little while now but at some point they were new and this caused an issue because a lot of the binaries you thought you could trust because they came with the operating system are now being used to perpetrate attacks and that can cause a significant shift

in the landscape and down here is a graph of alerts for a customer over a variety of months so you can see that in november and december there's kind of like this baseline level of alerts which is very small and then in january february and march we see a giant spike in what we think is malicious behavior tons of sensors are firing there's new threats on this customer and then it dies down in april may we go back down to our base level these kind of things can happen unexpectedly and the landscape is always changing and that's something that needs to be accounted for in a sock okay the final issue is human integration so the reality is that people get used

to working a certain way and for better or worse um they get good at it they get used to their workflows and they stick to them and they don't like to change uh you know i like to say that if the entire world only had computer scientists left then half of the world would be vim users the other half would be emax users and then the nano users would live on the north pole um but what i'm trying to say is that people don't like to change and when you try and enforce a change even it's even if it's likely to improve their efficiency it's not easy to integrate there's a lot of research on like active

learning machine learning systems where there's this constant feedback between humans the machine learning models numbers statistics and their original workflows and the research shows it's great but if nobody adopts it then the benefits that you get out of it are exactly zero um so that's an issue that needs to be solved when you want to try and improve someone's workflow okay so those are the four main problems that we're facing and now let's talk about the system that escalated quickly and how we actually address all of those problems so here's an overview of the system and i know there's a lot going on here we're gonna break it down piece by piece but let me just start with kind of a

high level overview you get alerts they go through a feature extraction process they go to machine learning for training they go to a triage system which presents that machine learning knowledge to analysts in a super lightweight and effective way that doesn't affect their workflows and then the analysts can use their actual workflows to resolve those cases and we use a very simple feedback loop to bring that new knowledge back to the alerts and then the process repeats again and again constantly improving the machine learning models okay so let's start breaking down what the system actually does and we'll start with this feature extraction module right here so the feature extraction module is designed to deal with the sensor

diversity issue and it has two main components the first is the automatic featurization framework and what this does is it tries to turn the contents of the alerts that we saw a couple slides ago with those example alerts um into features for machine learning models and the way that it does this is through an automatic futurization framework that takes semi-structured data and automatically attempts to understand it so what i mentioned that was earlier was that one of the main challenges with having too many sensors is that you need to integrate them into some sort of database in order to avoid that challenge completely we just take sensors and whatever data they can possibly output and dump them into a json

which may seem a little chaotic from a database perspective but it ends up really working well for this machine learning automatic featurization because what it does is it goes through these jsons just auto parses them and auto flattens them and just tries to deal with whatever it can find and it creates statistical distributions over what does exist so even if you have some missing data like these nulls here or if the keys from these json are missing entirely it doesn't matter it just creates statistical distributions over everything that's there and it uses those distributions to determine if those features will be good for machine learning it also deals with changing data types it deals with missing data and even

things like long tail distributions so i won't go into all of the details of the algorithm here but we have a paper and submission that explains it all um the next component of this is the temporal feature computation and the best way to understand this is with an example so let's imagine you have two different customers uh and the first and these two customers get an alert that's basically the same it's basically the same alert for these two customers however customer number one has seen an alert that looks like this one thousand times and customer number two has never seen an alert that looks like this on their system so knowing that context about the

customer changes the way you understand that new alert uh that alert for the customer that's seen a thousand times probably not an issue the customer that's never seen it before there may be a new attack vector that's being perpetrated on this customer and you may want to treat it seriously that's the main idea is to generate context and we use a lot of different predicates like customers machines sensor types etc again i won't go through all the details here but that's the idea okay so the next component of the text system is the machine learning module so we have two different models that deal with the two types of featurizations that we have the automatic featurization framework

that grabs the contents of alerts goes straight to what we call the content model and that contextual data goes to the context model i know excellent naming here what we do is then take an ensemble which combines the scores from those two models to generate one final alert level score and this is that idea of re-evaluating the severity of the alerts and this helps deal with both false positives and with evolving threats one of the issues was with the false positives was all those near duplicate cases and machine learning is one of the perfect use cases for that because it doesn't require manual intervention to write rules that find near duplicates just using statistics and a large amount

of data it will automatically generate that knowledge that will detect near duplicates and for the evolving threats this machine learning helps in multiple ways the first is machine learning models generalize at least a little bit for the most part and that means that certain threats will already be detected just by nature of that generalization but even if the models can't detect every single zero-day threat which they obviously won't [Music] we can get new data and train on it without any other human intervention just by re-running the pipeline and the machine learning modules will adapt to that new information as it comes so we take a lot of the manual effort out of evaluating these new evolving

threats okay the next piece is the triage module and this is what attempts to present scores to human analysts in a really lightweight fashion so it takes the alert level scores from that model and computes incident level scores and then presents both of those in three main ways the first is suppression which just can which which can just block out cases that we don't want the analyst to see um the next system is incident prioritization which ranks all of the incidents that analysts need to look at and tells them which ones to look at first and then we also have within incident alert prioritization so even if you can't de-prioritize an incident you can look within it and point an analyst at

certain pieces of information that they should look at to make their jobs faster because the system is so lightweight which you'll see later it deals with this human integration issue um and finally we have the feedback loop so we do ask the analysts to do one extra thing in addition to their day-to-day workflow and that is to basically just check a little box that says if the incident was worth their time and we call this the actionable label so the analysts as part of their normal jobs just need to mark whether the incident is worth their time and what we do is we propagate that label back to all of the alerts that make up

the incident so all of these alerts now also get an actionable label and because the alerts have an actionable label we can now feed this new information back into the system and retrain the models to make them better and the system goes around and around and hopefully this diagram now makes a little more sense okay so that's the entire system now let's talk about how we actually get it running but before we do that i do just want to talk about this actionable label in a little more detail because it actually is extremely crucial to the functionality of the system again this actionable label is attempting to answer the question is an incident worth an analyst time

and there's three main scenarios where we see if that's the case so the first one is incidents that require any kind of manual remediation if an analyst actually has to go through and manually fix something then we consider that an actionable incident and here's an example on the right from a real incident where the sock team investigated a detection on the host for proxy shell exploitation and lemon duck malware and they escalated to the client do the due to the persistence of the lemon duck malware and they got that incident resolved manually the next obvious case is incidents that are triggered by false alarms so there's not actually any malicious behavior happening at all that's going to be non-actionable it's

just a waste of the analyst time here's an example where the sock team received a no vbs extension alert for the hosts and upon investigation it was just related to a pdf driver no action is required so this was just a waste of their time and cases like this happen very frequently now finally here's a kind of circumstance that you may not expect but we find is actually also important to the system these are incidents that are triggered by true positive alerts but they were successfully contained by an automated defense infrastructure so i know this might sound insane but human analysts don't have to resolve every single case manually we do have systems that automatically

provide protection and here's an example on the right here the file for which the detection triggered has been cleaned by the antivirus solution and the file was not executed on the host so now no further actions are required clearly there was a malicious there was malicious activity but the analysts didn't have to do anything there's a reason we have automated defense infrastructure and that's to save analyst time it doesn't make sense if they have to constantly look at it over and over and over again so these kind of incidents are also not actionable okay and just as a reminder we take these incident level labels and propagate them to every single alert that makes up the incident

that's going to matter for the results here but before that let's talk about the data um the data that we use to train our model in this prototype uh is a six month data set where we have over 3170 customers over 14 600 endpoints and 2 400 sensors and it's really important to note here that we use a time split on our data so we use the first five months of data as training and then we use the last month of data as a held out test set it's really really important in these cyber situations to split out your your test set by time because you need to be able to measure um if there's new threats and the fact

that you can only ever train a model onto the data you've seen before so if we wanted to deploy a model right now we can't train on data in the future so the con so the things in machine learning where you like shuffle your entire data set to create and train and test doesn't work here okay so what we end up getting is a label distribution that looks like this for alerts we have 250 000 alerts and for instance we have approximately 29 000 incidents the blue bars here are the amount of non-actionable items and the yellow bars are the amount of actionable items for the training and the test sets uh just the main thing to note here is

that generally there are fewer actionable items than there are non-actionable items okay so now let's talk about what kind of machine learning strategies we have to actually utilize this data um for the machine learning module we use three main algorithms for both the content and the context models and these classifiers are logistic regression random forest and xg boosts and then for the ensemble model which combines the knowledge from these two models we use for we test out four different strategies the first one is called unified which is a term that we made up which is just a model that uses features from both of these models as additional input and these other three are just simple

aggregations maximum medium and weighted sum so for example if the model if the content model outputs one score the context model outputs another the ensemble could just take the maximum of the two scores just a simple strategy and this ensemble model again outputs just the over level score but what we want is a score also for the entire incident remember that an incident is just a group of alerts so if you have 10 alerts within the incident you now have 10 scores and what we use here again is just simple aggregation maximum mean and median we just take the max for example of these 10 scores and that will become the score for the entire incident

and finally for evaluation we use rock aucs precision recall auc's and then we develop a deployment situation where we're able to see the difference in what an analyst day-to-day work life would look like if the model were there the whole time and that's evaluated using these three triage capabilities okay so now let's get to the fun part and actually talk about these results so first i'll present the kind of machine learning numbers here but i'm just going to go over it at a really high level because it's hard to contextualize these numbers we mostly just use it for tuning the models and making sure that we output the best ones but basically at the alert level we find

that the content model performs best with a logistic regression model the context model performs best with best with an xg boost model and that using the maximum ensemble strategy just taking the maximum score from each of these models is the best and outperforms each of the individual models so once we see that this ensemble model performs the best and we output these alert level scores we now need to see how the aggregated score performs so here the average and the maximum of the alert level scores perform approximately the same and they tend to be the best strategies for the incident level results again it's really hard to contextualize what these numbers mean so let's actually look at the triage

let's look at the triage results so we can see how these numbers manifest in the real world and the first triage system that i want to present is this incident prioritization as we go by the name of the talk so the way this kind of works is that whenever an incident comes about whenever an incident is created it gets added to an analyst's work queue and the analyst needs to go through their queue and resolve their incidents [Music] and generally the analysts have discretion at which incidents they want to take but for the most part they just kind of take them as they come new incident comes in they take whatever was there now if oops now if we look at this graph

this left side is this baseline where the analysts kind of just take the cases as they come this blue bar is the amount of time [Music] that non-actionable cases spend in queue and this orange bar is the amount of time that actionable cases spend in queue just to do a little math for you this is approximately 20 minutes so it takes about 20 minutes of a kind of actionable case just sitting there festering not being seen by an analyst until someone finally gets to it so every single one of these false positives comes at the expense of these real actionable cases sitting in queue so what we do is we take the tech incident level scores and we resort the

analyst queues and what we see is a 36 reduction in the amount of time that actionable cases spend in that queue basically the orange bar goes down so what we see is that analysts are getting to those actionable cases 36 percent faster than they would have which means that customers get attention the attention that they need much faster now there's a natural extension to this incident prioritization which is if you really rank every single incident what happens if you just remove the least important ones and that's the basis for the suppression system now i know that there's a lot going on in this graph and it's super important and it's really cool so i'm going to go

over it super slowly and every little piece so you can see how it works let's look at this very bottom left orange bar right here this dark orange bar is the amount of actionable incidents that the analyst had to deal with on a single day this light orange bar is the amount of false positive incidents that the analysts had to deal with on that day and now these blue bars are comparisons with what would happen if you use the suppression system with the tech model so this dark blue bar is the amount of actionable cases that are left that day after using the suppression and the light blue bar is the amount of false positive cases that are left after

using the suppression so the really striking thing you'll notice here is that there's a huge difference between the light orange and the light blue bars this is the reduction in false positive incidents that day and as you can see this trend kind of continues throughout the month so like on this third day there's no reduction in actionable cases we don't miss any but we still see a tremendous reduction in the amount of false positives if you average this over the entire month you see a 47 reduction in false positives while still retaining 97.5 percent of the actionable incidents uh yeah so the final piece of triage is within incident over prioritization so in order to test this

what we did was we took four incidents three of which were actionable and one of which was not actionable and we provided two orderings anonymously to actual cyber security analysts and the two orderings were chronological which is just the alerts in the order in which they came which is how it's normally done and then the alerts ordered by the text score and we asked the domain experts to take a look at these cases and we didn't tell them which ordering that they were seeing what we found or what the analysts found was that with the tech ordering they had to look through 14 percent fewer alerts to determine whether the case was even worth their time

of course in order to resolve the case the chronological ordering is still useful and this doesn't account for that time but just to determine if they should even bother taking the case they could look at 14 fewer alerts um okay so that's the last piece of triage um and here's just a couple examples that i like to go over with what the model scores actually look like just for a little fun so we can see what cases look like in practice so this top case over here the model scores a 0.99336 whatever and i'm not going to read over all the analyst notes but i'll just summarize it basically what we found is that there's

this metasploit framework.latest.msi file that was constantly run on the customers on the customer's device and the analyst had to go through and manually fix the issue here are some of the role model scores um and the analysts obviously were kind of fed up with these cases because the only things that they wrote about them was benign activity known false positive known false positive there was nothing to do there etcetera [Music] now it's worth noting that of course the model is not perfect this is a machine learning system there's always going to be errors and there are some false negatives like you saw from the suppression graph we still missed a couple percent of actionable cases

and in these instances we have the model scores are relatively low but not low enough and that's why i don't know if you were here this morning but my colleague josh talked about machine learning itself is not enough to operate in the real world you need to constantly have guardrails and humans and other techniques that adjust for the inadequacies of machine learning so even though you the machine learning may miss certain incidents we make sure that we don't using other guardrails okay so the key takeaways here are that tech is designed to fight alert fatigues root causes we had sensor diversity false positives evolving threats and human integration and we think that the system delivers on

that promise we see 47 percent of false positive incidents that are suppressed while maintaining high detection rates we see incidents spending 30 percent 36 percent less time in queue for actionable incidents and we see that analysts have to look at 14 fewer alerts in order to determine if a case is worth their time and that's all i have for today thanks for listening and open to any questions comments discussion etc [Laughter] uh yes over there i might have trouble hearing you so oh we have a microphone i think yes

i have a yeah about this process [Music]

and i was wondering if that results in some sort of

yes okay so let me rephrase the question so uh colleague here was asking about the feedback loop system which says um which could and he was asking if it's potentially biased because cases that are suppressed or that don't make it to the analysts are not going to be re-reviewed and therefore only a biased sample of cases are coming into the very end in that feedback loop that's a good question thank you and the answer to that is that alert data doesn't actually disappear so it stays in the alert database and it is continuously retrained on um but generally this doesn't cause an issue because if we really determine that those cases are false positives and they never

really need to be looked at again then it's worth it to just keep the labels that we already have on those alerts changing the label back to actionable is possible if you want to manually adjust them but is generally not necessary because a confident false positive usually stays a confident false positive hopefully that answers your question oh the false negatives [Music] oh oh yes okay i'm sorry i misunderstood okay so you're asking about these false negatives yes okay okay my bad let me rephrase the question again what he was asking was uh what happens with these um false negatives like if you decide to suppress the model or if you decide to use the suppression system and these

cases never make it to the analysts then the labels don't get fixed and the answer to that is the reality is that we mostly focus on this incident prioritization system rather than the suppression so we actually try not to use the suppression at all and currently it's still in a prototype and it's not being used in the real sock but basically when you have the prioritization system [Music] the cases that are deep prioritized as if they would be suppressed the analysts still actually get to see at the end of their day so those particular cases do have a slightly longer queue time but we do make sure to have a guard rail over those false negatives

so that there is actually a person still looking at them yep yes right here

oh sure so you're asking about the uh okay yeah so the question was how are we featurizing these different things like command lines and file paths uh and right now um there's a couple of different different strategies so the first easiest way and the first kind of attempt at this was to treat everything as a categorical feature so basically just check the uniqueness of the strings but something that we're actually prototyping and working on now is an automated natural language processing module for determining which natural language processing techniques are best for each type of string because the amount of diversity in the strings is immense there's hundreds of different fields some of them look like actual

text some of them look like command lines and file paths so we have the so we have different parsers for things like command lines to deal with slashes whereas for text you would use a normal text processor and that's actually some of the research that we're working on now is to do this without having any sort of intervention just have a system that figures it out on its own yep

uh yeah

[Music] um [Music]

right okay so to rephrase the question um [Music] sorry let me let me just think about how to rephrase that um what you're saying was that the alerting the alerting rules uh i'm sorry can you actually just repeat the question

uh okay uh yeah the question is is there a lack of expressivity in the alerting rules themselves um and the answer is yes that's fundamentally one of the problems with uh the sensors is that the sensors are written by people who are looking at very specific things and they're trying their best to use their domain expertise in order to capture malicious behavior that's kind of how the sensors are built but there's a limitation to what humans can find the reason this feed and the reason the feedback system works even though the sensors and the alerting rules don't change is that the machine learning is able to work at a more complex level than specific human decisions on certain

fields because maybe you can look at 10 or 11 or 30 different human written fields and the combination of those fields can uniquely identify a new threat this is one of the reasons that manually adjusting all the sensors is infeasible a human can't really take into account a hundred different features and write a specific alerting role that's really that focused or if they can it's going to be extremely expensive whereas the machine learning learns that kind of stuff statistically through just a lot of data and that feedback loop yeah thank you great question

um okay if there are no more questions thanks again you

GT - That Escalated Quickly: A System for Alert Prioritization

Related talks