AI & ML In Cyber Security—Welcome Back To 1999 —Security Hasn’t Changed

Name: AI & ML In Cyber Security—Welcome Back To 1999 —Security Hasn’t Changed
Uploaded: 2017-05-17
Duration: 53 min 14 s
Description: RAFFAEL MARTY MARCH 13 // TRACK 1 // 9:15 TO 10:15 We are writing the year 2017. Cyber security has been a discipline for many years and thousands of security companies are offering solutions to deter and block malicious actors in order to keep our businesses operating and our data con dential. But

BSides Vancouver53:14483 viewsPublished 2017-05Watch on YouTube ↗

About this talk

RAFFAEL MARTY MARCH 13 // TRACK 1 // 9:15 TO 10:15 We are writing the year 2017. Cyber security has been a discipline for many years and thousands of security companies are offering solutions to deter and block malicious actors in order to keep our businesses operating and our data con dential. But fundamentally, cyber security has not changed during the last two decades. We are still running Snort and Bro. Firewalls are fundamentally still the same. People get hacked for their poor passwords and we collect logs that we don’t know what to do with. In this talk I will paint a slightly provocative and dark picture of secu- rity. Fundamentally, nothing has really changed. We’ll have a look at machine learning and arti cial intelligence and see how those tech- niques are used today. Do they have the potential to change anything? How will the future look with those technologies? I will show some practical examples of machine learning and motivate that simpler ap- proaches generally win. Maybe we nd some hope in visualization? Or maybe Augmented reality? We still have a ways to go.

Show transcript [en]

[Music] good morning everybody welcome I'm super excited to thank you here really loud in your ears last time I've been up here I think was about ten years ago for a first conference I haven't been back since oh I'm glad to be back unfortunately just for today they already called me back to some other off-site I have to run back to but yeah super excited to be here and talk to you guys about artificial intelligence and machine learning a little bit who has been to RF aid in here show of hands a few of you I guess the Rex is aware of the RSA Conference in San Francisco crazy events lower consciousness companies are showing their security

product I was walking around on the show floor and got really upset because pretty much every company had machine learning cell player in their marketing and half of them were talking about official intelligence the problem was as soon as I actually went up to the booth and absent what they're doing and looks behind the curtains and I've looked at a lot of those companies what they do it's really not machine learning it's not artificial intelligence what they're doing so this is how this all came about I'm like alright let's just talk about this topic and putting together the title I was going to be a little provocative is that you know what if we look at security in the last 15 years

not that much has actually changed so this is the topic that I want to consider a little bit with you guys in the next 45 minutes to an hour I have to say this this is what I'm talking about is not in my capacity is any way associated with so cause it's my day job this is more of a fun thing I'm doing it has nothing to do with the company so forth at all so Who am I some of you I next before Moses you probably don't know who I am at the currently I run security analytics for so force which you probably know the company I have spent pretty much my entire career supposed to state analysis nice

person hyper security that work that I did research where we work on the first sim product really multi building risk managed not sure if anyone remembers the people II line of product or still had to used it sorry I worked for our society but it's Blanc they started my own company log Li where we basically built a cloud-based log management services not sure maybe some of you have used are using locally and then in last five years or sort of four years before so forth I was running a lot and a lot doing consulting for large organizations working on big data deployment working on data science problems with large global companies and figuring out how do we get access to the

data how do we make this data understandable that we're collecting in and soft and so a lot of the things that I will talk about today also come from that so I have a little bit of an axe or visualizing security data in visualizing data for probably about the decade at this point a bunch of visualization tools and open source things and I will show you some examples in the top here also outside of that I have a little bit of an interesting then meditation so if anyone wants to talk about that I'll be almost for excitement and security all right so if this license back up by a little bit of provocative premise for to talk it is

installing the subtitle is that cybersecurity hasn't really changed in the last 50 years right I know this is not necessarily true we have definitely made some inroads and we've been able to come up with new technologies and things like that but I will show you a whole bunch of examples where I think we're really not that different from 15 or 20 years ago we still have a lot of the same problem as I will show you and looking at sort of what other discipline has had it done and then trying to see that all these companies are trying to do machine learning and artificial intelligence and I don't know what if we look at other the slides of afresh

rushing up if you look at other disciplines like weather forecasts or forecasting election results for example those are two areas where I would think after hundreds of years of weather forecasting and thousands of years of trying to predict election result it would actually be a little better off than predicting who is going to win elections look at the presidential election to us right people were saying that 80% chance that Hillary Clinton was going to win and then what happened so at 80 percent chance he was not like 60 or 65 no he was 80% we got it completely wrong and it is a problem where I think we understand the inputs and what we're looking for but we still can't predict

things now companies out there in the security state are saying well we have a much more complicated world where we're trying to model how each of you use computers and machines and interactive systems and then we find anomalies through some kind of algorithm called artificial intelligence so if we can't tell whether and elections how we're going to solve security you're just going to miss a few slides here I'm just going to talk to them before have we don't have so what I want to do quickly is look at a few area I think security and be a little provocative and show you that we haven't really changed that much we're going to look at firewalls and IDS's and so on

and I'm going to be very provocative steady nothing has really changed then I will briefly talk about machine learning and artificial intelligence give it a little bit into visualization and then want to look a little bit forward on what I see the field of interesting technologies into means interesting approaches in cybersecurity that I think are going to publicly make a little bit of a difference out there the next years you here we go so first topic firewalls if you have configured a firewall lately and you have configured one maybe 15 years ago it's still pretty much the same right you get the box out of the packaging you install it and then you have to start writing rules well it's

really still very much the same as back in the day and you can see good IP tables or PS or something like that you have to figure out what if the right rule said suddenly the rules get really long you have problems that there is rules in there that are never triggering because you've rotated the wrong order you have rules in there that I fail because or sub-project you have to be an exception so there's entire products that healthy audits firewall rules it to eliminate all this this waste that you have in there so really not that much has changed in terms of writing rules when I ask people hey you have a firewall how do you analyze your log

files I generally get blank stares you're like look at the logs every now and a negative problem but you proactively analyze your logs show hand who proactively looks at their loss there's a few of you alright but then I dig a little deep okay what do you do well you might look for certain IO sees you might do some command line stuff maybe you have a product you put it into maybe like a Splunk or an arc site or something like that but I hardly ever see sophisticated way of really understanding what's happening at a lot of times there's an example that I've been using for many years now on visualizing some firewall log because

this is called a tree map there this is showing you the traffic of I think about it a on a small soluble there's not that much going on here but you see sort of the red areas are the final blocks and the green is the traffic that has tapped through and there's certain patterns in here that you can sign very quickly and it helps you sort of focusing your attention to the areas that might be of interest in that firewall like the big red block here in the middle and always challenged by students when I teach lots of black head-on visual analytics and I ask them what is this there's traffic from some IP address going to a

multicast on 44:27 and I asked people what is 427 anyone and I have analysts in the room that do this all day long looking at log 5 for 27 is Bonjour on on the Mac it's this beautiful protocol where the she basically says hey up here who wants to talk to me vitam 8 this is my name and this is my capabilities that I have do you want to connect to me so fortunately that was blocked on the firewall it wasn't life wasn't light out of it but that's probably an engine configuration of a system right there so this is a way that we could look at logs but I don't really see many products that have

something in them if you look at IDF IPs we still have a ton of false positives we have a hard time prioritizing alert so if you get data out of spirit of the idea well which are you going to work on which ones really matter SSL is a problem right you have to grip the traffic so you can't really look into the packets you have to do termination if it's a lot of CPU cycles it's not that easy and then signature fighting is still a very very manual process so you have to have very skilled people writing your signatures they have to be optimized for speed and all these kinds of things so really the same that is a

long time ago correct intelligence all right we didn't have that in 1999 we didn't call it that right but if you think about it right intelligence is nothing else than a IPS signature like if you look at emerging threats and you download the snort rule set there's a whole bunch of rules that it just lists of IP addresses that are known botnet addresses or whatever domains that are blocked and things like that so it's really just a way to share some of these indicators a little easier some of these lists get pretty big today right you might have 100,000 inches of the veins you want to check well I challenge you how do you do a matching of all your

incoming block traffic so you have 100,000 events a second how do you match that up with a list of a hundred thousand elements actually a pretty hard problem to solve you need to have some interesting algorithms may be something like loop filters and some pretty BCA hardware to do that I have companies that told me that it would take pretty good money if I could solve this and there are companies maybe one of the sponsors that do something like that then also the next thing I usually ask companies Trek seasick well you ever age out your old indicators like an IP address doesn't stay bad forever usually the infrastructure comes up exactly for under five minutes ten minutes an hour

maybe a video to date but then it's cycle so the IP address is going to be good again so in Amazon it's going to be a legitimate service so you have to H these things out and anything that's often the companies I talk to if something is older than five minutes the IP suspect the Intel is useless so how do you share something very very quickly almost real easily prevention another one you ever look at one of these systems under the hood and figured like on the command line some of these rules it's some of the stuff is horrible they probably have gotten a little better but in the end they're just ideas engine yes there's

some interesting technology waking registered documents and it will actually look at the text and it will find it even if you put it from a Word Excel document and submit a telogen hashing and things like that but the NDP is really not that exciting are they adverts right we're still using passwords we use them forever and it's still one of the biggest problems we have sure there's some two-factor authentication stuff out there there's an interesting product that are trying to solve the problem passwords but in the end we're still we're still have a very big problem answered there are zero knowledge protocols if you guys have studied cooked through a little bit right there's protocols that actually password

is never sent on the wire but you can still make sure that the other party actually knows the secret I haven't seen a product really implementing that and we have this technology for four years against the problem is the underlying public infrastructure we haven't really solved that problem of being able to really identify someone and make sure that you're really the other person but that's something is probably should from the government or something like well speaking of councils as higher roles in a Windows environment you can generally you can go up to a machine and you can log in with your ad credentials on my machine or I can go to your machine whoa this is suddenly it's not occurred maybe

okay from a security perspective but if you want the analysis sighs suddenly things that really interesting because now I have to correlate the IP locks all the flows to the user it's actually on the system so I need to get the ad logs correlate all that and that gets pretty complicated why not just I can no one should use my laptop like this is my machine I probably well don't want to have any of you log in it here with your credentials then there's a guest account on here also which makes things pretty bad as well if I let you use my computer and then there's some bad traffic or not we do some net flow

now suddenly I'm going to be convicted for the stuff he did right why don't we stop that and then why the hell is your laptop allowed to talk to mine right now this network there's no reason he should be connected be connecting directly to my network or to my computer so there's a few things that are just kind of interesting or something like DCP 38 filtering on the border firewall or devote a router to prevent spoofing is something it's not that hard to explain there are some large ICTs that tell me that it's not that easy but in general this is sort of hygiene of the Internet and we're still not doing it we don't

have to do it and most people are just lazy or they don't have the incentive to really do it what are bidding scanners do we call it longer BT management now right very gently look at those interfaces have you looked at a vulnerability scanner lately and looked at their interfaces it's horrible but it's list of vulnerabilities there is a different open-source tool they are pasted in one of my screenshots but this is a visualization of some vulnerabilities of different machine I have not seen something like this in many places a lot of there's entire companies that do nothing else that built interfaces or sort of build on top of all these different scanners right there's a couple of companies out there

that consume does that scanner data I'm going to be mop it up differently why start capability not back into vulnerability scanners themselves I don't get it sim was using a same Splunk oxide cuba whatever you have bunch of you geez we had thousand 99 right we still struggle with parsing data one of the biggest problems one of the things a lot of you is you actually use a sieve you probably have the issue of source of destination being the wrong way around do any analysis and your source is not the source but it's actually a destination if you look at net flows humongous problem figuring out what's the real sources of connection what's the destination we haven't sold that yet we

have a hard time writing like how do you write detection rules correlation rules across the different data sets to really come up with interesting data we don't know how to prioritize alerts you have listed list of alerts and having to go through and it's just a pain it's a lot of work and we have to work on a lot of low priority stuff just because we find out oh that's right that was Joe accessing the internet again it's okay don't worry about it well then we have security analytics we definitely didn't call it that back in the day but if you look at security Alex broadly said it's correlation it's ueb a user and entity behavior

analytics they're just basically monitoring what is the Machine through an internet in the network it is it's suddenly changing it to behavior it that it looks suspicious at some point it's a little bit of orchestration automation maybe Incident Response we had all these pieces guess what yours can actually do monitoring of behavior of machines I thought about archetype is that's what I know that but every other team concluded to use a data monitor you can actually monitor the physical to the number of bytes translated or the volumes of traffic and you can set triggers on that so then I spelled it to security analytics companies first it takes the sim cannot do this and like

yeah let's sit down let's do it I can show you right now and then I like well but we make it easier okay it's maybe that's a value proposition but it's not really changing any all right so that's just a few examples to maybe be a little provocative let you guys discuss this in the next couple of days also and I'm I want to be clear there's definitely things that have changed and have gotten better but in general there's a lot of things you can still build so there's still a lot of opportunity here so let's talk a little bit of a machine learning and artificial intelligence I will give you a brief definition of what that actually is and

I want to discuss a little bit if it is really going to solve a lot of our underlying problems security should we really focus all of our attention on it because it's sort of what's going on in the future so here's a couple definitions statistics it's about quantifying numbers we all have done statistics math right you look at average ah's and the deviations of things like that ADA mining is about explaining pattern I will show you some visualizations in a second where it happens on graphs and data mining can help you understand what these graphs are about machine learning is about predicting with models what does that mean well a data scientist goes ahead and builds a model based on his

understanding of the domain and the data to build a model you say here's what I'm trying to understand here is how I model the world so you have to give the system an understanding of the world and then the Machine goes in and optimizes that and helps you figure out okay in this world set up what are the rules how does it connect together how can we make it work and then already feature intelligence I'll give you a different definition in a second it's really something that kind of reasons about ourselves right okay I'll show you some examples so if you look at machine learning and data mining there's a number of different subcategories and

here's seven of them six the first one is anomaly detection where you're trying to find the anomaly in your trafficking whatever you're looking at right well when I was at lung this is now a few years ago I went to my engineers and as a hey I want you to help me in finding anomalies in this kind of data but I had is a basic array next step on every machine every hour or every minute and I looked at how many open ports do I have and I just wanted to see if the number of open port is changing right so I said hey I have this data help you understand what is normal in here or help me understand finding i

tottaly auntie-ji okay define normal for us like okay it's a good question if the service has restarted if I can figure something differently I add a new service these are all normal things but they will end up changing the number of open ports at the moment in time so think about whenever you're trying to do anomaly detection can you actually characterize normal if you can how can you find a novel ease in there and this is one of the core problems that we are running into this most of these students itself there's something called Association learning there's clustering where you take a whole bunch of data and you're trying to find areas that are similar I'll show you an example in a

second application regression summarization the bottom right example there you have a very complex system that has lots of interactions and you want to simplify you want to summarize it you want to get a higher level view for example so these are all topics inside of a sheen learn so I give you an example here I'm in the data mining space but you see here on the left-hand side these network traffic and it's it's an abstract space so the x and y coordinates don't really mean much but you see these clusters very clearly there's different areas of traffic in here at least the algorithm think so now the first question that you probably got to come up with is neat but what is that

pink stuff right what is this traffic here this is the first problem that you can have which data mining for example how do you understand what this is that came up with how do you understand why does whole clusters together if is the web servers and these are by mail servers usually it's not how that happens there's something characteristically in the network traffic that we looked at here that clusters itself together and discovering what it was is actually not that easy the other thing I did here I actually ran two algorithms one of them arranges the thoughts here the other one gives the color these are just two algorithms and you see that they actually don't

agree with each other very much you have glue dots or a blue dot here where the rest is out here and there's some outliers over here so you see that sometimes they agree but sometimes they don't there seems to be three sub groups in here whereas this group seems to be closer to this one so why do they not agree with each other why are not closer to each other these are all things you have to start understanding well there is areas where machine learning is actually quite useful but in general you need a pretty large corpus of data to do machine learning and one of them is analyzing malware or you want to have a finer agent you understand

which ones are malicious and which ones are okay we have large purposes of malware out there I really have collections where we can really learn what are the features what are the characteristics of maliciously behaving binary so we have a big big corpus we can actually learn and start building these models and creating a models better and then classifying malware versus non bungler so here's a screenshot of a tool called dynamics that in Vince here which is a company we just thought has developed which is very interesting it actually visualizes your binder in a lie to investigate how these binaries are similar and classify them out where that way it works pretty well for sustain detection I think if we if you

look at their mix I'm not sure we have really sorted but if we didn't have spam filters our inboxes would look very very different so generally I'm pretty happy with the spam filter except for my email account that I've had for 20 years or something which getting flooded with all this spam because I just subscribed to everything on the internet apparently um but then let's look at topic like network traffic analysis we've been looking at network traffic like next flows or even firewall wall-sized a few of you it's not that easy right I I've been trying all kinds of things it's not this like you cannot you can try to throw all kinds of algorithms that's a problem in

the end what generally happens is that we don't have enough context what I mean by that if we look at flows because IPA talks to IPP well how do I know if this is normal or not if you're in a bigger company you might have no idea what these machines are and even if you do we like are these two machines supposed to send email to each other or they're really a file share that this other machine should access so the problem doesn't come down to an algorithm that says over here something funky but it's coming down to a matter of well it had intended to do this what's the context what's the machine about and that's

really hard if you want to find malicious traffic if you're interested in doing packet analysis and figuring out well do we see attacks in here well you need label the data or you need a data set that actually has a lot of attacks in it so that the system can learn from it and that has been a very very hard problem in 1999 there was a project called lariat from MIT Lincoln labs where they build a test setup to simulate a sort of real traffic in a smaller network we then use for anomaly detection algorithms if you're interested all of that the entire proceedings from conferences conference calls raid recent advances in intrusion detection that has been looking at the

problem of anomaly detection for network traffic and intrusion detection for for many many many years if you look back in his proceedings you'll find some stuff in there but they created a pretty good data set there's all kinds of researchers started studying their algorithms are measuring the algorithms urn it turned out that there was one feature in the PCAST which was a TTL that gives everything away so you could look at the TTL and determine that if something was an attack or not so it is really hard and built each data set even if I keep your company how do I know that it is clean and how do I know where the attacks are

in there so it's really hard to build on how do we build systems without having that data we can just make assumptions and then go out there and test them that's what most security products do but if you don't have to touch the environment how do you know your algorithms really work so let's the directors let's address the elephant in the room artificial intelligence well I spent quite a bit of time in the last couple of weeks reading up on okay what's actually happening out there in artificial intelligence what are people doing what does it certain fields doing what's doing what's happening in security and I think this is very obviously very obvious and maybe Mark

nodes just calling about something artificial intelligence doesn't make it artificial intelligence right so I was trying to come up with a definition for this and what I said is well a program that doesn't simply classifying things like a machine learning algorithm but it actually comes up with normal knowledge security analyst finds inside school I think that's a pretty decent definition right I want to have a system it looks at some data says look here have you noticed this you're like oh that's interesting I didn't realize that before if you apply this definition to product to systems out there you I don't think you'll find much that will actually really match this if you go into the literature you will now find

that people distinguish three types of artificial intelligence there is a ni which is artificial narrow intelligence which basically means that an algorithm is able to solve one very specific problem and that's that's almost that's really falling into the machine learning stage as you build a model you try to characterize the problem and enter machine salsa these are all kinds of things like self-driving cars it's like a Google go the computer that played go it's deep blue all these kinds of things that even IBM Watson I guess is mostly in this space it's maybe touching a GI which is the artificial general intelligence and this is really what we probably understand is like oh yeah this

machine is smart I can reason with this thing and it comes up with interesting answers for me and then what's interesting if you read a little bit about artificial intelligence you enter the space of the artificial superintelligence and if you have looked into this page there's probably a name that you came across which is Ray Kurzweil which is a futurist he works at Google at this point but he has written a lot about what happens going forward when systems get smarter and smarter and then get to the science fiction rolls I have wow the systems it's so smart you can outpace us what happens then here's a graph that I found that tries to explain what's going

on on the left hand side is sort of our view of intelligence right note how there's an ant a bird is like twice as intelligent a chimp is a little more intelligent and if you look at where we are with AI today we're close to actually mimic a chimp or a monkey you can't read the label here this little thing says here Oh adorable the little funny robot that's monkey trick right so we kind of laugh at what's happening right now in artificial intelligence if you're like well we're not quite there because you still have to traverse all the systems to get to a dumb human right and then as quite a bit more to get to

Einsteins level of intelligence but this is sort of how we think about it and some people argue that really what is happening is we're over here where we we have the end we passed the bird and it's actually quite a big jump and tell you the chimp is also very hard but it's really not that far away anymore from in human and people predict that whoa this is really going to increase and if you read up on it it's quite interesting what they predict is going to happen in systems that are going to be very very far in intelligence but the right-hand side is definitely not where we are with security at all and here is a couple examples machine

learning artificial intelligence if you just want to be a little sloppy with definition lose it a lot of these different cases I've looked at data network traffic all kinds of low clouds for a long time and we attract so many alligators we have like social network analysis we have looked at entropy and we have here's the some like self-organizing maps here to try to learn network traffic and find what's normal what's not normal classify things it is really really really really hard generally what it works is simple go back to simple statistics you can go out there and ask security analytics product companies what's the algorithm that works best for you and generally they will say it's a simple statistical stuff

that works because the other complex algorithms you have all these exceptions and then you don't understand what they actually learn looks take your simple it generally works so here for example what I'm showing you here is a graph of network traffic where different machines are communicating with each other and the bigger the thoughts the more traffic they have transmitted now the question is in this network you find any data leakage I don't know right I mean you can start running clustering algorithms because are doing all connected to social network analysis if you compute to between days and sensuality for all of these nodes with all kinds of algorithms PageRank and I don't know what there's

entire companies that focus on graph algorithmic analysis of traffic to find anomalies in Italy it's really just counting how many different machines does this machine generally connect to on how many different ports you take that over time and you look a little bit at the seasonality and you can come up with some very interesting things that way the complex stuff generally is not necessary but what you find here is same time that I try to explain earlier if you're laughing you're lacking context you don't necessarily know what these IPS are right just because there's a big dot here at one of these IP addresses has talked a lot I don't necessarily know if that's good or bad I need to

know what that IP in to make a statement about it like I did a project for a fortune 50 company we took 100 gigabytes of network flows and we try to just dig in hunt visualize analyze run all yours and find what anomalies in here we found 10% of traffic that came from high port numbers over 10,000 invent to high port numbers over 10,000 generally that's not what you would expect like you would generally expect traffic to go to low port numbers if you have a source destination confusion going on where these things are turned around you might see the difference right like a low port number going to high port numbers but then you can just sort of accommodate to

that turn around but trafficking from high ports to high port is not something you expect generally and he was over 10 percent of all of the traffic he was not like a little bit well ok there's all the Hadoop stuff that talks on the 50,000 I know who told them how to use ports they're fixing that by the way the next versa to finally go down under 10,000 so they're trying to respect again as definitions there's some window stuff like exchange will do weird things in sort of negotiate ports Skype is pretty horrible if you ever looked at Skype on a network traffic Jesus those guys so it's hard to figure out what's going on without having contacts if you don't

know what these machines are and what are they what they're supposed to do if no way to figure out what's normal any machine learning algorithm to try whatever they want you don't have the right date what did work in that network if we abstracted things we said okay instead of looking at individual IP addresses let's look at groups of machines like look like look at network and if you got how these networks talk to each other and suddenly we were able to see something so we were able to say this is a cure network here's the VPN anyone coming in a DVT and should never go directly to the secure network site so you can start separating things that

way and start finding anomalies that way but then you find your exceptions that you always you have one of everything in there you always had the thing that came from the VPN that went to the secure network someone's ability to standard or something or some network management station as they were testing something or there's always something happening so he without absent management actually having a ledger that this is what should be going on again you have no chains give you another example of something where simple really works this is an example I stole from some friends that work for Etsy what you have here is the number of sale or password resets over time right so I have a website how many

password resets do I see that's what reset could be used for an attacker i think someone can try out email addresses and try to get an email back and see the economics or something like that so this is the traffic that you see the number of password reset over today it's very cyclical right now if I ask you all right what are we going to do is I want to set up an alert if something anomalous happens here in the volume well we could try to set a threshold here right we could say well if it goes over whatever it is number 0.1 here is then I want our alert right well it will alert here but if you look because of

the seasonality it's only this little tiny piece here that it actually sort of broke out from here if something and all this happens at this time it has to be very anomalous to actually hit my threshold pretty bad specials that way so what are we going to do well we can use some simple statistics to basically put a curve or fit a curve in here and there are games that can do that and then what we do is we measure the distance from the curve and set up an alert on that so if we do this again on the left hand side I have my curve that I said I measure the distance and your right hand

graph I just plot the distance to the actual curve that we learned so now I can set up a threshold on the right hand side and that should actually tell me if something really inaudible this happens you can use this this is an algorithm called whole Twitter's that you will find in every library out there to just use you can use this for all kinds of things where you have seasonality where you don't want to just have a static threshold a number of things happening but your how many how many user log in into your system probably very sickly all right maybe even network traffic is kind of cyclical so you don't need very complicated systems to learn something

like that but you could use something like a whole Twitter to do this a lot of because even security analytics companies are using something like this to learn traffic is it's foolproof absolutely not right if the whole volume starts going up and changes this curve the algorithm will learn that new curve but if it's just a couple of outliers it's actually really good at finding it so in the realm of simple what I found and I'm a little biased is that visualization can really help us looking at data and understanding data this gentleman here Edward Tufte which some of you might know he's a very famous guy in the information visualization he has written four

beautiful books on information visualization does he ever have a chance to look at his books that are really beautiful at history of data visualization and he said something that really resonated me he said how can we see not to confirm what we already know but to learn so how can you look at a visualization of traffic of security data to learn something from it so here's an example of that project I told you about the fortunate ISTE company network traffic where you see here these port numbers from 0 to 1,000 so this is a thousand here so this is this should all be well-known ports that are being accessed you see here is a week worth of

data now you can look at this and you see all kinds of interesting things ie there's a lot of traffic here and then for it there's all kinds of horizontal lines where we would expect certain well-known services are being used a lot like you have some weird brown lines here maybe a blue line there where temporarily time something else happens across all the all the ports there very very very very noisy you can buy algorithms to find this stuff very easy but it's so voluminous that's not hard but when you look at this visualization what you might notice is what the heck is going on over here there's in the first part of the week none of that

shows up suddenly two words I'm not sure if that was a weekend there suddenly over there something weird shows up if there's this carpet of traffic there and writing an algorithm to find something like this is actually not easy at all note how it also starts I don't know it's like 200 right it doesn't go all the way down there's something funky going on the visualization uses you right away there's no question something going on here we need to investigate this but an algorithm you wouldn't even notice is in your data if you haven't written the algorithm to find something very specifically like this so visualization can be very very helpful for these kinds of examples there is a

at RSA I had an opportunity to play the hololens and look at network traffic's is a tool here visualizes a whole bunch of network traffic and you can interactive it you see me there trying to click so think it's a thing or you can actually take stuff and click on it and it was really hard you can see I'm trying to click on yourself and it doesn't react and that's ugly ten windows pops out but the author guaranteed that this was a very quick prototype it was kind of fun but I don't know like the analytical value even if the tool we had more time to work on something even equals better I'm still questioning

the use of 3d and the whole virtual reality to walk around in data I think it's a great gimmick but faulty radius goes well in my last couple of slides here I want to talk a little bit about well I've been bashing security pretty badly right now so I probably should make some statements on what I see is interesting or what I'm getting excited about going forward I think some of the areas that I would like to look at a little closer and I think have some promise is the first one is we have treated security for a long time and sort of this blanket statement about its IP is that if the traffic looks like

this it's bad so we always said to zero and one and convicted the whole world phrase well what about looking at to the environment specific things looking at your network and figuring out what anomalous in your network how are things set up for you and that's really what the whole hunting thing as people call it today is about right you want to have hunters people that are very experienced to understand your network understand what's happening in your network and then understanding what is anomalous in there so you need to give them the right tools and start thinking about it that way and that has a lot to do with real-time threat intelligence sharing I think the future inside intelligence

sharing is really in real time it's about companies committing to when they see something either automatically or maybe by the push of a button release that threat intelligence baby to a traffic community maybe to the whole world and the companies I talked to have most success of threat intelligence for example in Germany there is a group of 20 large organization including Deutsche Bank that exchange threat intelligence intelligence in this way they have soldier edge set up they have a trust relationship established between all of them and they even pay each other for threat intelligence that release very quickly and they find incredible value in it they don't look at all these intelligence feeds out there that don't

really apply to them context I mentioned is a number of times now I think one of the fundamental problems we have with data analysis we have not enough context we don't focus on asset management so I have each and every one of you do you have a handle on the machines that are in your network all of the machines that are in your network a couple maybe but generally is one of the biggest problems you will find and this is nothing new we've known about this for the longest time we just keep ignoring it like we've picked that later right to asset management understanding what these things are have you ever seen a tool doing a decent

topology map of your network if you actually understand or something is happening and not just like oh there's an IP and maybe mapping into its which will be kind of cool there's some tools that do that user base policy what a great idea right it's just kind of obvious instead of having machine based policies what about looking at users and restricting things based on users right firewall rules based on users what an idea that is security things like again not the stadium like blanket statements for everything but you see a threat level going up for a machine my machine gets more suspicious what about changing these or adapting to security posture for testing well if it does a whole

bunch of shady stuff maybe you want to enforce some more policies maybe you guys are doing deep packet inspection on everything the Machine does so to the dynamic security for policy decisions and then a whole other topic of capturing expert knowledge I think we're doing a whole job learning from you guys that really know how to stuff works and bring it up back into the systems that we have if you leave the company today there's a whole bunch of knowledge that walks out the door how can we I don't want you to write documentation all the day that's stupid but how can we build systems that while you use them they learn from what you

know and absorb that it's better there's some very interesting stuff with each other and forget about 3d visualisations for now and focus on the 2d view but not work to do what will change security a lot of this stuff is already being started and being done right now but I think continuous authentication I log in once and then the system is just trusted well why not ask me again to do login or you know second factor authentication when my machine start to do we Excel I'm okay with that if I'm not asked like ten times a day maybe once or twice completely find us adding my second factor again right dynamic policy decisions and I thought about that micro

segmentation super interesting where you basically there's different approaches but what if you do is you put something into the network kernel and every time a network connections being tried he calls home to a central controller and appstate this application this user and he go to this type of website right now we're told us yeah cool he's in a secure location he just authenticated with two factors his machine is clean he could go there there's a very very interesting concept central controller all kinds of contacts that you can use there real-time try and tell human assisted machine learning systems don't just focus on making smart algorithms but how can you feel algorithms that humans can interactive

who can be again that expert knowledge can be factored into the algorithms to help the system through outsourcing maybe we've tried is try be the log management state it's blank we try just one base not sure I think still called that but today it's the host applications on there the idea of long fades back in the day was we had the interface where you look at log files you could click on a log file and say hey I don't understand this thing what does it mean and you will go to this website and explain the type of alert and that people could go in and say oh we have seen as well by the way

it means this and you can do that to fix it but this whole idea of people contributing to the community didn't take off it's just hard to build a community like that right why do you go in there spend hours commenting on all these different locks you should have no incentive to do so so that's really hard but I think it's an interesting inaudible and then there's some systems of ideas 'm involving the end users instead of always escalating stuff to the security analyst it is soft but what is what is the stocks gonna do if they see someone logging into my server with a weird user account like I don't know Chester walks into my server

now and he's never locked in there and the security team yet to get to notification like that what are they going to do is this normal or not they have to pull me or have to call Chester to figure out whether it's actually legitimate or not well why can't we send an automatic email to be into him and say hey by the way that was just login you guys figure it out and he finds a cheese I never told him to log in here I'll call him at Ricky what the hell do you do then if he says well I need to descend like alright cool but if he says whoa whoa who's not me well then we can

ask you later to talk and then they can go in through the work right so how can we push these things off to the users for certain problems not everything but this is a big problem I think that which is overloading the software stuff that they should never really have to look at and then someone please just sold his fishing problem for me I keep clicking on those things and be happy Journal test so back sort of full circle to the title of this talk I at least we talked a little bit about machine learning artificial intelligence I mainly told you that yeah what's out there is really not that complicated and not that sophisticated at this point so

how will it help well we need good data right garbage in garbage out we can't if we don't have it it's going to be really hard we're still in the we have the problem that we need these are data formats we don't have they don't have logging standards oh my goodness CS c e4 miter the common information model from our from from Splunk please try to standardize these freakin log files so many times but it's hard by getting everybody on the same page in confirmatory slower performance but it's a fundamental thing we have to do in order to do anything on the data analysis side if we don't know what the data is we can't write any algorithm so

let's solve some of these well probably someone is going to ask about deep learning so let's just address it well people learning is an algorithm out there it's a machine learning algorithm it has gotten tons of it and potential all kinds of fields have been quite ok revolutionized image recognition sound or voice recognition has made humongous leaps with deep learning but it's nothing that you like we had Google Networks a long time ago they've been around for I don't know how many definitely 20 years a people learning comes up with some interesting ways of making things work and they're great for things like malware classification again we have large corpus of data some of the

basics of deep learning I am NOT going to claim that an expert it is at all but one of the things that it really helps you sort of eliminating feature engineering sets so for data scientists under you this is one of the benefits but applying deep learning to just generic security problems it's actually not easy and in a lot of cases not possible it will not help you analyzing your network traffic or your your flows much better to understand where others anomalous communications going on if you're interested in that kind of stuff sort of native science I gave a talk a couple years ago at ktd the knowledge discovery and data mining conference where I talked about some of the

inherent problems that we have in cybersecurity with machine learning and there's all kinds of things that happy to do steps over here or coffee something like that so where should we use it it's great for classification problems let me have corpuses of data because know what is good and what's bad they can train these systems there's a lot of very good work out there and there's some very interesting stuff that I think one of the most exciting things to me there's a couple of startups out there and actually IBM Watson is trying to do this as well in automated level one analyst so basically the people that get all the alerts and rollers from the

systems and you have to triage would say this look is kind of interesting a false positive false positive so there's some really really good work a couple startups I know of people that I really trust and they know what's going on they're getting really close of automating that kind of stuff so that's super sexy to me and look for a system to have humans and loop any system that just says oh we have the perfect I'll get rid of the detectives usually I walk out of the room all right so a couple more things if you're interested in visualization there's community executives go browse around contribute if you have anything if you're interested in more on to the big data

analytics visualization I teach a class at blackhat every year third you come join me for that and then the data will open it up to any questions

how do we do this do we people come up here

no hard questions you had an open source visualization tool for your vulnerability management you thought that was called and feet no no was it can't be that's this visualization something like that I can find it if you don't find it tweet me or email me here I'll find it other questions you all want to get to the break hips for coffee alright I'll be around as well anyone if you want to order me all right with that thanks for being here thanks Elizabeth

AI & ML In Cyber Security—Welcome Back To 1999 —Security Hasn’t Changed

Related talks