AI and Machine Learning in Network Security - Igor Mezic

Name: AI and Machine Learning in Network Security - Igor Mezic
Uploaded: 2022-10-23
Duration: 53 min 12 s
Description: Igor Mezic surveys the evolution of AI and machine learning in network security, from rule-based and statistical methods to generative and hybrid approaches. He argues that first and second-wave AI systems—rules and statistical learning—struggle with scalability and contextual reasoning, while third

BSides KC · 202253:12411 viewsPublished 2022-10Watch on YouTube ↗

Speakers

Igor Mezic

Tags

CategoryResearch Technical

TopicAI Security Detection Engineering Network Security

StyleTalk

About this talk

Igor Mezic surveys the evolution of AI and machine learning in network security, from rule-based and statistical methods to generative and hybrid approaches. He argues that first and second-wave AI systems—rules and statistical learning—struggle with scalability and contextual reasoning, while third-wave contextual AI addresses these gaps by combining perception, abstraction, and reasoning to detect anomalous behavior without excessive tuning.

Show original YouTube description

AI and Machine Learning in Network Security - Igor Mezic The use of machine learning (ML) in network security has grown over the last decade, from academic research to the levels where many vendors are enhancing their products with elements of ML. There is also much discussion of artificial intelligence (AI) in this context. I will describe the relationship between AI and ML, and discuss the waves of AI development that progressively increased its utility in many fields. I will also discuss why many attempts to use the so-called first and second-wave AI in network security, while providing benefits, introduced additional issues or could not be scaled to provide a fuller solution. Finally, I will argue that the current efforts in third wave - generative and hybrid AI - solutions have provided approaches that solve these issues. Igor Mezic (Chief Scientist at Mixmode.AI) Professor Mezic works in the field of artificial intelligence (AI), dynamical systems, control theory and applications to network security, energy efficient design and operations in complex systems. He did his Ph. D. in Dynamical Systems at the California Institute of Technology. Dr. Mezic was a postdoctoral researcher at the Mathematics Institute, University of Warwick, UK in 1994-95. From 1995 to 1999 he was a member of College of Engineering at the University of California, Santa Barbara where he is currently a Distinguished Professor. In 2000-2001 he has worked as an Associate Professor at Harvard University in the Division of Engineering and Applied Sciences. He won the Alfred P. Sloan Fellowship, NSF CAREER Award from NSF and the George S. Axelby Outstanding Paper Award from IEEE. He also won the United Technologies Senior Vice President for Science and Technology Special Achievement Prize in 2007. For his work on analysis and control of complex systems, he was named Fellow of the American Physical Society, Fellow of the Society for Industrial and Applied Mathematics, and Fellow of the Institute of Electrical and Electronics Engineers. He is the recipient of the 2021 Crawford Prize, awarded once in two years to a researcher in Dynamical Systems Theory. Dr. Mezic is the Director of the Center for Energy Efficient Design and Head of Buildings and Design Solutions Group at the Institute for Energy Efficiency at the University of California, Santa Barbara. He holds 10 US patents. He founded Aimdyn, Inc. in 2003 and is the CTO and Chief Scientist of Mixmode.ai.

Show transcript [en]

thank

you wow this is an incredible venue I'm very happy to be here uh it's uh one of the first ones after a long long break and I have to come here and play with my band one time we'll see all right so umor mic I'm here to talk about uh AI in cyber security if you can't hear me back there or you can't see something please let me know I'll try to explain a little better uh what's over there I'm uh still a professor in engineering at at University of California in Santa Barbara and then uh I'm Chief scientist in C at at at mix mode and indeed we have strived to uh include some of the aspects of uh

artificial intelligence AI in cyber security so that's what I'm going to talk about today uh I'm I'm going to largely talk nonspecific to the kinds of things that we have uh applied but if you have any questionss on the product or anything like that please ask me later this is going to be non-vendor specific I'm going to try to give you a an impression of where AI is today in general uh where it stands in cyber security and what types of AI we might consider uh in cyber security and the topic seems important the reason being that there is a lot of discussion of artificial intelligence in general there is also a lot of discussion of artificial intelligence in

cyber security in particular may I say there's a lot of hype and get a lot get some Smiles so I'm I'm going to try to to go you know to to circumvent that hype and try to kind of describe what the approaches are Maybe point out to you some of the things that work and don't work and I'm going to start by telling you a bit about the plan of the talk and then and then uh and then describe kind of the situation you you might have encountered this uh acronym spin the situation problem implication needs so I'm going to try to go through this to connect the AI needs that we have to the situation that we

have in cyber security today so I'm I'm going to talk about that now I'm going to describe the artificial intelligence versus machine learning versus deep learning uh many of you in the room might already know the difference but it's it's something that is sometimes a little bit confusing I'm going to talk about a rule-based system if if you're already working in cyber security if you're an analyst you have already encountered those they're not new to you um then I'll talk a bit about statistical second Way Machine learning in a very very important topic which is false positives versus false negatives what are they and how can we in general avoid them and you will see that the solution seems to lie

in something that's called contextual Ai and then something that we see on our screens all the time in cyber security but maybe not think about as much which is time serious and how the data kind of um evolves in time the reason why this is important is that any regression type approach cannot really capture causality yes I said it so if you have if you have any graphs that are showing a relationship between this variable and that observable they they show you the correlation the fact that I woke up at you know today at 6:00 my time and had coffee those two uh those two things are correlated but the fact that I uh woke

up at 6:00 and the and the network traffic on my computer started up because because uh and I had coffee at the same time having coffee and increased traffic on your computer are not correlated right they're not so correlation is not cation I'll I'll speak about that a little bit in in this context and why we need time series that the reason I'm spending a little bit of time on that in the intro is that it is important to what we do today we understand that cality can only be captured if we understand the time flows and that's an essential part of the approach that we have and then I'll talk about generative models which are

fundamentally based on these time series and then some fun in games I I'll talk about how can one infer a network structure um automatically from uh from uh from this type of AI approach and then something that is not necessarily immediately related to network security but it is something that we have done in the past and that is an application to online games so I hope some of you might enjoy that all right so uh let me kind of lay out the situation the way I see the the the the way um you know people that I also work with um SE and talk a little bit about cyber security in the context of uh situation problem implication

need um what we see today is that the software space is dominated by Solutions providing data collection for compliance and that's great we we absolutely need that but but to a large extent there is no pattern extraction there is no underlying intelligence that is actually getting that data out um at at a large scale um all of you here in this room are cyber Security Professionals and I'm sure you know the number we are lacking millions and millions of people in this field um worldwide and so that's not going to go away um and and then I'm going to describe pretty carefully the first and second wave Solutions in AI [Music] um one of the aspects of the situation

is that our tools use the first and second a way AI tools rules and statistical Solutions I'm going to argue hopefully successfully that we need some of this um third wve contextual reasoning type tools and once again I'm going to describe what this means in in some detail some detail as I go so that's the situation what's the problem you could say well you know if the amount of data collected is is relatively small we we wouldn't really have a problem but that's not true the amount of data collected is absolutely massive um in your experience you've probably already seen that the amount of false positives and unfortunately false negatives false positives are really taxing I I'll give you some statistics

but false positives are uh are are very very damaging false positives are also damaging in tax all right number three is quite important threat actors are not only starting but using AI in their own activities more and more that is going to make a prediction exponentially increase in the next few years we are in an unfortunate situation geopolitically in which cyber security is become I'm going to become a little grave now um because I I do think the situation is is bad um the nation states have started putting massive amounts of resources into threats that was not the case before what that means is they can use any AI that their countries are producing to hurt us that's just what it is and

it's going to continue and it's going to be really bad and the fact that they're using AI well you know the famous proverbially we can't fight that with sticks so we need we need to have the tools that enable us to to to fight to fight back you might have noticed that very few products out there have systematic ranking of threats that's a lack of AI really so humans are extremely you are extremely good at uh ranking threats we're extremely good at sifting through what is dangerous to us and what what is not we don't have tools out there that um that uh that do that for us uh yet and then I talked about regression and

correlation uh we we really what would be ideal in in cyber security right ideally you could tell an attack by just some kind of a network behavior that is happening out there without the the threat actor ever coming into our Network right even coming close that's ideal so prediction is really ideal if you think about it everything else is secondary I'd argue that we have very close to 0% application of predictive capabilities of modern science that we see in products right it's it's a problem so what are the modern threats APS this is related to nation states uh Hackers from uh organizations like that like to hang in our systems for a long time and then do damage when when they

when they desire to do so zero day the unknown ones I would actually say you know the statistic is statistic so I think 80% of successful attacks from from zero to day but that also means that we took really good care of the ones that can be detected by signatures right if it's 8020 but that's the place where the cyber security products need to be and and in a place where they can detect zero threats to reduce that margin and and and APS um cyber security themes because of false positives and other aspects of what I've what I've talked about are definitely overwork trying to over com inadequacies of tools that's one of the major kind of um

themes threat actors are actually gaining ground because of all the lack of of uh of analysts and and the lack of tools and uh and uh the detection of causality is actually not something that is systematized we do it when we go after threats right we we we we look for causality and we don't seem at least I haven't seen many tools I know obviously we're trying to develop some but I have not seen many tools that actually give us the ability to to to to to with high Precision detect cality in in in in in in threats progression all right so I'm going to summarize here and say what what are the needs um I'll talk about supervised

versus unsupervised AI so there are a lot of words here for those that haven't seen artificial intelligence aspects before that that maybe don't mean too much but I will Define them as I go so unsupervised AI that just means that it can work without human intervention it just kind of goes and and learns on its own uh because the data is massive that's one of the problems that I identify the data is massive we cannot label it all it's it's not possible to label all the data um AI Frameworks that can actually detect causality that can detect progression of of of threats the solution is that perform automated threat ranking would would do things things simpler simpler for us and then

last but not least many machine learning algorithms out there can be easily spoed and that's a problem because in that particular context we're going to if if our algorithms gets sped we might get the uh the false negative rates going up all right so that's the summary of of where we are but let me now go to to some of the technical stuff except if anyone has any questions at at this point as far as the summary is concerned if not I'm going to happily assume that all of you agree with what I just said and move on okay so there's often a question as to what is AI versus what is uh machine learning versus what is deep

learning and very often deep learning today is conflated with AI so it's it's thought off as being being the same but this I found to be an interesting set of descriptions that correspond to what I think um so artificial intelligence is just a technique which enables machines to mimic human behavior many of you you know about touring test I don't particularly agree with the touring test which basically says well if you put something behind a curtain and you talk to it and it responds like a human then it's intelligence it's just maybe artificial intelligence we can do touring test machines today relatively easily they you know they can be very sophisticated parrots because we can just process very

large amounts of data so I don't think that's all so there a true a true test for artificial intelligence would be really that the reasoning is also capable of handling curveballs and you know doing all kinds of things that human do very very well recognition of of of deception and that sort of stuff which is much more important in cyber security so I'd argue we need a new touring test but uh putting that aside and what is machine learning machine learning is a set of mathematical algorithms that um that use methods to enable the machine performance to mimic human behavior so machine learning is the underlying set of algorithms now I you know the 30 years

of career in in science so i' I've been building algorithms for a very very long time and this whole thing about AI happened in the last 10 15 years of course there were researchers even in the 80s when I was in you know when I was a student that that were doing this and I'll show you some history um of that but the we tend to call almost everything AI today and I think that's that's a bit unfair to the general scope of algorith algorithms are very very very important even if they're not used in AI I'll make a little pitch for that uh deep learning is a subset of machine learning methodologies and it's a

particularly successful one for recognition of images if you use your iPhone recognizes your face you know a variety of other um static recognition tasks or even even in in in speech although a little bit less to tell you the truth the good old F methods are still really good in in speech recognition and are still being used uh but deep learning is a particular set of algorithms that are very very good and I like them for various particular technical reasons but they are a subset of machine learning there are many ways of putting these together in artificial intelligence and uh and deep learning is one of those I don't think you'll be able to see this very well in the in the

back there is no way so I'll tell you it's the history of AI it's kind of the the timeline between the the the 1940s when the war effort sparred a lot of the development algorithmic development that would call AI right after the war there was a a massive amount of funding injected into these kinds of ideas and they basically went along two lines one was rule based that was Marin Minsky at MIT and the other one was perceptron and what turned out today to be the neural network approach guess which one won originally in the 60s anybody rules the neural network rules rules one and there was a very nasty fight it's actually kind of fun for all of us that

are in this field to take a look at what the what the history was it was a very very nasty scientific fight as to you know what is right and what is wrong now of course today today no one no one would suspect that cuz deep learning just dominates everything but the idea was that human intelligence is is about logic and logic is about rules and therefore rules is what intelligence is and so we had expert systems somewhere somewhere in the 80s but the people on the other side on the neural network side they didn't stop either although they didn't have any funding so they invented something that's called back propagation back propagation is a nice

way it's going to sound funny and simple to you but it's a nice way of differentiating you know how far can you get with knowing kind of first the uh junior high school year calculus in sici of course it's much more complicated than that neural networks are are you know deep networks and you need to differentiate the cost function something that you want to optimize in order to find out how to update the coefficients in it and so if you have a massive network differentiating with respect to all these different variables that you have is a very very hard thing so please don't my don't tell my friends that to back propagation that I that I joked

about it a little bit but it is really a very very sophisticated method in in U in differentiation I'm also trying to point out how interesting you know the aspects of mathematics that go into this are and who you know what are the kinds of things that went into this kind of this kind of development so the the whole back propagation idea exploded in about 2005 uh Jeff Hinton and and Russell ainov um wrote this particular algorithm published it in nature I think and then the thing exploded absolutely exploded about between 2006 and today you have seen just the win of the deep neural network approach to AI so deep neural networks as as as Ai and here we are

today a lot of debates everybody talks about it in positive or negative terms we are talking about uh defining general intelligence and whether it's dangerous to us and all these things I'm going to stay completely away from that from that discussion because I think if we apply it correctly in a particular area where our adversaries are applying it in a in a in a in a threatening way uh we're going to stay on the good side and and and get some uh benefit out of it all right so I've already mentioned the the three waves um a lot of This research has been funded by uh DARPA the defense Advanced research project uh agency uh that also of course gave us

internet and transistors and so this is their classification as to what the waves of of AI really are so what you're seeing on the graphs on the left is the different type typ of um uh features that an AI system should have and they are perceiving that is really kind of taking the data and finding the patterns your eyes don't you know your eyes don't take into account every Photon that comes into into it there massive numbers of photons that come but try to select the edges and the features so that's perception right the the the initial uh the initial um um kind of processing of data then learning and we learn the relationships then abstraction can the learn

relationships be abstracted to maybe even get transferred to another place where the learning is going to be made easier and then reasoning reasoning really is about logic about rules and that's what Minsky originally thought um um the right thing might be and you know if you take a look at the at the right side right side over there one reason why rules were so attractive early on is that we didn't have a very large computational capability so writing if then statements and N statements and or statement xor and all all of this was not foreign to the way we would do Computing at a time right so the learning right so statistical function approximation and

those systems the deep deep uh deep neural network is on the right what you're seeing on the left is the inputs X1 to xn and then the outputs y1 to YN so and in the meantime there's this processing thing that tries to connect the input and the output one example could be in cyber security large outgoing file right is that a threat or not so you give it large amounts of data that shows some files that are that are actually part of extration and other parts that are not part of exploitation and uh and and the Deep neural network given massive amounts of information about which one is threatening and which one is not is

going to be able to recognize with some Precision what happens next as I and as I have described this process you have probably been thinking about problems in it and already already see it what if my network changes and suddenly somebody is sending files that you know they didn't send yesterday but it's completely legit because maybe they are part of another organization and the false positive comes and that's a problem the the the method that I've just described is called a supervised method you give the system a lot of inputs and connected outputs and you're expecting it to learn on its own the relationship between inputs and outputs it does have to extract features this is what these intermediate layers

are are for and uh it's very good at perceiving therefore and learning but it has very little ability of abstraction and reasoning right and so to a certain extent I've described problems with both rules are very very rigid and they're just at the reasoning level statistical methods they have very good very good um ability to learn but they can't really reason there is no reasoning aspect to them so this bottom layer here is something that DARPA called the third wave AI in the meantime invested a really serious amount of money in it and we are getting somewhere with that uh investment and that cont contextual adaptation and me just go to large outbound file um um

alert if a 5 Meg file goes out of an organization in the middle of the day that is very very different than the same size file going out at 3:00 at night in an organization that shuts down at 6:00 in the afternoon and people largely go home context is important right and and and unfortunately the first in the second wave um uh systems have little ability to recognize recognized context I could say well I could make a deep learner figure out what the context is right I could label the data by saying day and night sure yes you can but that requires a lot of your work and so how about another piece of context and another piece of

context and start counting Dimensions right of those various pieces is it the weekend I mean start counting Dimensions they pretty soon in large numbers of dimensions and you can't well forget that you can't compute you you can't gather aggregate the data clean it up ETL right and put it in in order for it to be processed so the the third wave systems are really in this diagram on the left which I'm sure also is pretty dim um so perceive and learn leads to development of a con contextual model so that's an algorithm in the middle that then enables the system to abstract and and and reason and in the best of all worlds this happens without

human help right so without any input from us without any input from analysts now we don't want to waste time on that we just want to give the AI system the ability to learn and get the reasoning out of it as to what exactly happened okay so let's talk about rules and see that some rules are of course really really good there is there there is no problem with many different rules that I actually call masks right if you are trying to put in permits um uh for incoming connections to to to a VM and that's a particular stop mask that you want to put on that's perfectly fine you should do it there is there is in principle no

reason to have a a self-learning AI do that except that you you might want the self- running AI to watch whether the permits have been given correctly or whether any of these connections is doing something weird zero trust right so I I'll go to zero trust for a second it's very interesting to me of course things that you know dual factor appication and things like that are making our lives better we feel more secure I mean joke about it a little bit well I just said it right we feel more secure like taking off shoes at the airb remember a massive percentage of attacks are actually coming from inside threats so if you think about

it there is there's very few things very few real threats that this dual Factor tication actually captures in a sense so what makes us feel secure the real zero threat system should tell us when something that's going on on network is unusual not normal even if the person has all the credentials in the world so I would pause it maybe this is a little bit radical but all of the things that we are hearing about zero threat today out there are really about authentication and those kinds of methods fine I'm never opposed to providing another layer of security but let's think about what really helps in this situation what helps in this situation is trying to

weed out the abnormal behavior and that perception of something unusual happening which is for many of you that are analysts exactly what you're doing right so the human intelligence again is very good at this aspect of perception we should get AI to help us more in that aspect that so that we we we we can be more more effective in it anyhow so there are good rules there are also bad ones I covered the vendor because I don't want you to know who it is but look at this rule large outbound transfer by highrisk user okay so you have to determine who highrisk user is fine and you say detects an outbound transfer of 200,000 bytes or more or a

highrisk user what is 200,000 bites to you how does that get determined well okay it's a threshold we can move it back and forth but every time the network configuration changes that's a thing that needs to be somehow adjusted adapted so the maintenance the amount of Maintenance that that one is having in these systems that we're getting is forcing organization to hire more and more machine learning people which great you know machine learning person awesome but think about it's uh it's not core cyber security is it right the the the the data analysis it's becoming a large part of it but of course we need an overlap there and of course again we are going to have a

problem with uh needing to educate more and more people that have background in both and there are not many to come by at this point in time I'm I'm sure many of you in this room do have it what I'm saying is out there that is not common right so that's the problem that we faced with with rules so machine learning um this is uh from from a a really nice paper I I if if you have't interest in this I I really urge you to to read it so the top left if you uh want to find it and can um send me an email and I'll I'll point it I'll point point it to you so the

statistical machine learning gets split today in what is called the shallow learning that is not um it's not trying to put well I don't know maybe the inventors of this word actually did want to put it down but shallow learning these methods are actually really really good for some aspects and then deep learning I've already described a little bit deep learning deep learning is because consists of layers you put in some inputs and you have some outputs and you want to connect them through sequence of layers that extract features so they proceive right so you know supervised and unsupervised shallow learning supervised would be this situation in which we give the machine learning some inputs and outputs we

label them and then it learns unsupervised we don't give them anything so for example unsupervised would be you just get a bunch of data large file small files and you're trying to determine what is a large file based on some threshold what is a small file and you cluster them in two groups so clustering is one of the methodologies in in in unsupervised learning you have deep learning and I'm not going to read through all the different different approaches that are on the right there are there are many they're all algorithmic um but I will say something about where they have been applied in network security right so this is for specific cyber security um

um cyber security threats and here we have intrusion detection malware analysis and and and span detection under intrusion detection I have Network bot Nets uh domain generation and you're seeing so the the the pattern that you're seeing in this table is that shallow learning has been deployed quite a bit for all of these from left to right but you seeing is that deep learning has not yet right so it's a newer newer methodology and it's getting getting deployed um as we as we speak so this is a little joke on on on deep learning but it's actually a factual example so one of the aspects of deep learning that is not so uh Savory for

for us in cyber security is the fact that it can be easily spoofed this is a well welln uh I'm not trying to trash it or anything it's a well welln feature and this is an example that actually happened so uh a a deep learner network was supposed to recognize a panda and it recognizes Panda superbly right no false positives and then you add 1% noise this thing a really tiny amount of noise on top of this and and the network and and the AI says given with with 99.3% confidence there's a given on the top on the top right so this is something that we call the instability in machine learning in the sense that with very very small amounts

of input we can perturb the AI system to give us uh a in this case a false negative something that would be very dangerous very dangerous to us our our uh our um you know analogies would be a a nonthreatening actor versus a threatening actor in this case all right so that is a problem so I I spoke about false positives and false negatives so let's try and Define them a little [Music] bit and see where the problems are so this table says true and false and POS postive and negative so a true positive is there is a threat existing and you detect it a a false positive is of course um there is no threat and you still detected so I'm

I'm hearing this commercial on the radio these days that you this a person of my age uh thinks about these things a particular cancer gets detected by by a uh by a test and they say 92% of the time it it it tells you whether you have whether you have cancer it gives you a result it's like okay what is it really because if if people walk by me and I tell them you have cancer you have cancer you have cancer you have that 100% I'll catch everybody who has the disease you know although there might be only one or two people that have it all of the rest are the false positives and unfortunately this happens

to us in in cyber security a lot the alerts themselves contain a lot of a lot of false positives so here is some quote from Bit Defender and says close to half of security analyst teams battle false positive rates of 50% of high so people are spending what 25% of their time just trying to figure out something that is um uh pretty much a rabbit hole right it's not a good a good good situation all right so mathematically there is a little bit of math uh mathematically what F how how can you compute those so let's say let's go with large outbound file transfer and say okay if I have a large file and it's

malicious I'm going to give it a one y is equal to one I have a large large file it's not malicious I'm going to give it give it a zero so distribution of files and you're looking at them and every time a malicious one comes up you say one and so you have a you have a distribution over the size that's on the horizontal distribution over the size of these files right so if it's 5 megabytes this is how many malicious ones I have if it's 50 megabytes how many malicious I have and then you calculate you put a threshold this thing here is the threshold and you calculate how much do you have to write of that

threshold and that is going to give you the true positives right now the other distribution is well I have a file of this size and I I said it's uh it's malicious but it's not so that's a false positive right so that's another distribution you say okay how many of those do I have well here is another distribution just integrate to the right and say if if that amount is very tiny that's that's great I have very few false positives here's the problem large outbound file transfer right a very flat distribution like if if the file is going out at 3:00 at night as I said it's 5 megabytes you know probability might be the spish or at least might be larger

than at 5:00 in the afternoon I'll literally prove this to you with data on the next slide so so the trouble is that these kinds of systems have well this is maybe maybe too much you know no skill but they have this very flat curve on the false positive true positive um true positive graph and you really want to be up here where your true positives are 100% And your false positives are zero the rule-based systems because of the nature of the network because of the lack of context day night all these things have a very straight line which basically tells you that if you randomly picked randomly whether this is a malicious thing or

not yeah you would get you would get the same same result now once again it is better than that straight line I'm not saying it's exactly that straight line but unfortunately it's pretty low in that quadrant on the top on the top left um here is some data so on the vertical you're seeing the number of bytes on the horizontal you're seeing time and what you're seeing in uh in h green I don't know if you can see this well but it's the large outbound file transfer alert on a particular system so as a human analyst your eyes immediately tell you that during the Saturday Sunday I have no alert popping up as soon as Monday hits

Tuesday Wednesday I have large outbound file transfer alerts popping up like crazy the the alert is telling you yeah the alert is telling you nothing it's just telling you that the overall that the overall volume of data cuz that's on the vertical that the overall volume of data is larger that's all when the overall volume of data is small it's not giving you giving you analy so that's what happens uh this does a little bit better this is from some work that we have done on what is called generative models so what the generative models try to do is capture the patterns in the data and then they capture the deviation from that pattern so what you're seeing is that you know

that orange thing is now the trend through the data and so what is the trend people come up come into work in the morning at 9:00 traffic goes up stays relatively flat for a while as you've seen from the previous data and then drops at 5:00 or 6:00 in the afternoon right so that's a trend and then the rest of it when am I sending emails out when am I getting emails and files and transfers from somebody else that's pretty random so but I need to I need a way to qualify the random right I need that spread that's shown in green but if I can I have a very good way in formed by context context in this

case being time to determine whether something is in normal or abnormal abnormal Bond so for example if it's outside of that green which tells me you know the random deviation around my Trend then it might be high risk might go hunting be great if the AI gives me also all the information around that so I can know what happened there so it does some reasoning but that's basically the type of model that we need to that we need to um keep developing uh it has perception so it figures out Trends from the data it learns those Trends right it abstracts them out because the way it does it on one stream of data it can do one another forget

large outbound file transfer I can just measure the bytes coming in and out of the system right an inbound or outbound um transfer I could I could give it a string of data that that measures you know authentication attempts no difference the abstract template of the method is exactly the same and then it has ability to reason over that well if that happened if a user was created and ATT track the user and the user is doing something very very unusual well then you should really go and figure out you know what files are being taken out of the system and and all of this other contextual stuff so some reasoning to get me at least to the point where I can

do my job very very effectively if I'm looking for for some uh some some level of intrusion so um there are other things you can do again this is very dim but this is IP is on the horizontal and IP is on the vertical and this is one of the patterns that the AI extracted from the network and you can clearly see that the IPS on the bottom right talk to the IPS on the top on on the vertical left right and vice versa this is automatic detection of the of the talk patterns between the parts of the letter in fact you can quite effectively get the the ranges IP ranges in there automatically and the

the human analyst can can um I'll go just a little bit maybe a couple of minutes so so all right uh this is some math that enables you to do that I'll skip it if anybody is interested uh I will I will talk about it it is something that is from our own own internal research and I will uh I will uh um challenge you here and say hey the top is what our system detected uh in in a particular uh set of data to be the normal behavior horizontal is time top is data right that's the normal behavior happening with the trend which you can probably see with your eyes right in the middle

right kind of oscillating and then all the noise around it the bottom has uh has uh an injection of malicious data where was it injected so the top and bottom I know the top and bottom are actually different where was it injected hopefully perfect yes yes that is that is the correct answer I hope you couldn't you know you couldn't figure it out with your eyes actually what our director of engineering at the time figured it out and I don't know how I'm thinking it's a wild get so juli actually figured it out but it's in uh intervals four and five and what you're seeing in red is the risk scores for intervals four and five

on this particular set of data they are nine 8 and 10 for everything else the risk scores are on the right for everything else the risk scores are very very tiny one two and the meaning of that actually by the way it does a semantic meaning so it means that the AI system that is doing this had 90 plus% confidence that it's in these intervals right so it does have a semantic meaning and uh from that perspective those are the things that need to be kind of coupled to other aspects of the AI and uh reasoned over and uh and hunted and so my pitch is for these kinds of uh uh approaches where we

get data create a generative model I've shown you one a reason over it well get data again do we have success did we correctly predict did we correctly match we correctly diminish the threats if yes well we could label it we could say this is what I've just seen But note that there is no labeling in advance and that's a little bit more like parenting isn't it something walks by and a toddler knows well it has four legs you know head whiskers it's something they call it something in their head and a parent comes by and says a cat and then a tiger walks by a cat yes but the dangerous one different classification but the point is the true

artificial intelligence is is about you know labeling after the fact not prior to the fact and that's causing a lot of other problems besides the false positives false negatives and everything else uh you might be uh amused that we have actually applied this to anybody playing this game out there all right so you're not going to be happy that we that we were able to win so let me let me say something about this uh um a a lot of um hype has been uh associated with the fact that today you know chess players even cheat there if you're fing there is a big bhaha right now with Carlson um where you know somebody else

is supposed to be cheating in this and that because they use they use uh they use artificial intelligence but those are actually think about it you're all in cyber security those are easy games they're not chess is predefined there is a set of rules and if you can compute seven steps ahead you're going to win and computers can compute seven steps ahead that's it there's no re okay I don't want to be dismissing but no there is no intelligence artificial intelligence of the type that I have described right that actually constantly gets data constantly reform it constantly adapts a new player might come in a player might free will player might decide to just do you know and put

in another or or rather upgrade their their Weaponry or whatever online games are much more interesting that way and so we uh we we we played with this one and when we uh get the generative model what what you're seeing on the bottom is kind of a Criterion for the goodness how how good our AI was in recognizing the patterns and on the vertical you're seeing um you're seeing U so sorry on the vertical is how good is is the prediction from the the the pattern recognition Ai and on the horizontal is from from playing the game so you see that the line is pretty diagonal right so the two were the same so the model was able to capture what's

going on on on the on in in the game in the same way it captures what's going on in the in in the network and I couldn't resist let's see if this is going to play maybe not if the mov is oh I couldn't resist to show you this soft robotic arm that we also train so this is not network security but I uh this is from my lab and so we we just kind of randomly perturb it like a small child sort of Wiggles their arms you know to figure out what they can do with them and then after a while we gave it the task and we gave it the task to follow a stick so

this is this is even Beyond recognizing and reasoning this is about controlling and making the system do a specific task in our context in cyber security for example actively playing against The Intruders inant in there of threat act and maybe you're seeing this but there is a stick being shown to it and it's happily just moving around and and and and and following following this thing so you can do quite a bit with these generative models I hope nobody is developing any attachment to the soft robotic arm because humans are known to do that to inanimate objects when something like this happens when it behaves like um like a species but anyhow that's uh that's sort of the the final of it uh

hopefully I've convinced you that we do need um a particular brand of AI in order to uh Advance our posture security posture hopefully I've convinced you that this um this includes getting in some some Noel types of algorithms and AI that that avoid some of the problems that we had in the past is very interesting to me that the algorithms that we are talking about are actually constantly doing predictions as to what's going to happen they get data they compare that and they say oh this is this is normal because I've already created a model that tells me that's normal verus oh this is abnormal I'm going to give it a risk there's a book

that I highly recommend it's called a thousand brains in which the author who by the way was the Palm Pilot uh uh designer if anybody remembers that there are many many people here and I'm very glad about it don't remember that because they're way too young to remember it but a few of you might um there's this interesting book that says that our brains do that all the time and so you know we've stried to develop these methodologies that do a very very similar very similar thing and then you know just just last as like what kind of question should there be for anybody saying I have ai in my in my system that is actually helping you

out when the system is figuring out normal versus abnormal is the Baseline dependent clustering labeling human intervention who is responsible for the for the for for the training and for for for maintenance are you going to have to hire a lot of people that are responsible for the maintenance how does the system behave if the rules are completely off does it actually have any perception and learning abilities without that and uh and uh and how does it respond to zero day zero day attacks I think those are kind of the important things to to think about in the context and as I said I hope I convinced you that we do have some solutions either

already there or coming up that the AI Community has worked on and now is deploying in cyber security that uh that all of you can either develop further or use so uh that's all thank you for your

[Applause] attention

AI and Machine Learning in Network Security - Igor Mezic

Related talks