GT - Deep Learning Neural Networks – Our Fun Attempt At Building One - Ladi Adefala

Name: GT - Deep Learning Neural Networks – Our Fun Attempt At Building One - Ladi Adefala
Uploaded: 2017-08-27
Duration: 38 min 50 s
Description: GT - Deep Learning Neural Networks – Our Fun Attempt At Building One - Ladi Adefala Ground Truth BSidesLV 2017 - Tuscany Hotel - July 25, 2017

BSides Las Vegas38:5086 viewsPublished 2017-08Watch on YouTube ↗

About this talk

GT - Deep Learning Neural Networks – Our Fun Attempt At Building One - Ladi Adefala Ground Truth BSidesLV 2017 - Tuscany Hotel - July 25, 2017

Show transcript [en]

welcome to our next stop we're not laddies talk to us about deep learning moral networks right that's him as you can tell we got crowded room at East East Knicks anybody any open seats we got one open seat right here looks like okay well there come back quick all right a couple things get cell phones please do not use them in here we are streaming live hopeful lot of time for questions luckily he is leading in the lunch so we can give us talking some time for questions after that we got think our sponsors real quick we have a ver sprite fro tivity tenable amazon source of knowledge without all their help we couldn't put this together or

the help of the volunteers all right without that what's gone on supposed to clap for this this is gonna be weird because I'm gonna be obstructing the view from people so I'm gonna try to stay away what happened to my music geez unbelievable unbelievable you know wondering so I apologize first of all I really really have to apologize for the bad guys bad dancing so please don't you know just please just bear with me on that my biological neurons just fire off anytime they hear music so I can't really help it I just kind of move and groove what is it about music that makes us want to move & groove anyway what is it about

music that makes those biological neurons fire off anytime they hear it is it the words of a song does it the the rhythm of the song of the beat of a song is that the memories of the videos I'm not sure what it is but I'll tell you this the guys that Spotify and YouTube they figure this out they can predict to a reasonable degree of accuracy what songs I'm gonna jam sit for example they know I'm also gonna jam some good old country with Kenny Rogers some of you you know a country has come hah you guys oh they know I'm actually gonna jam some rock and roll Bob Seger people that know Bob see I like that old time

biological neurons are the inspiration for artificial neural Nets which is a topic of our conversation today and before we get into it though what I want to do is give a brief intro about me and then walk you through the agenda for today so my name is La dia de Falla I'm a security strategist with 40 guard labs and forget the cut off of my hair there I'm security strategist authority on lab 2008 Labs is the threat intelligence and research arm of fortinet for those that are not familiar I also serve as an adjunct faculty at Webster University where I teach a couple of masters level courses and cybersecurity malware reverse engineering being one of them

that work forensics critical infrastructure for the agenda today what I'm gonna do is I'm gonna walk you through my practical attempt at building a neural net that's essentially what it is and then at the end of this I'm also going to remind us all of knowing the importance of why we do what we do I don't know about you sometimes I forget knowing the why let's jump right into it so I mentioned earlier we're using artificial neural nets to provide video and music recommendations right solidify the question then becomes can we apply neural nets to a more complex problem problems of a more personal nature such as the one that any web was having several years ago in her early thirties

she was looking for mr. right so what did she do she went online and signed up with an online dating website the dating algorithm paired her up with will call him Steve the IT guy Steve the IT guy and because they had a common love for music they had a common love math and data Steve the IT guy takes her out on the first dates to one of the most expensive restaurants in Philly and during the course of the night he orders multiple entrees lots of expensive bottles of wine as the night was wrapping up Steve decides to go to the bathroom and leaves her there with the check for 1,300 bucks true story I kid you not for $1,300 one

$1,300 for $1,300 so what did she do she built her own model using the data from the website to be able to classify good and bad dates but she had to manually define the feature she wanted in that particular date some of us will call that hand engineering features that was before neural nets I will say right now I would argue that if she had that problem today she would have actually applied artificial new nets to that problem because we're seeing them being applied in even more complex problems problems such as self-driving cars that's a complex problem we're seeing them more applied in cybersecurity so with all of this what I thought is I'd like to get a really good understanding

of what's going on under the hood and the short way to do that was to go attempt to build one and I've got Gabe acid here to thank for me he's the reason for all of this so one of the build one but to do that first I needed a problem so the problem I chose was the first problem that came to mind which was if my clicker works oh don't worry about it I got it I got it I think I hope yes I got it so the problem I wanted to focus on was the URL classification problem what is the URL classification I wanted to see if I could build a neural net that would

classify a URL as either malicious or benign but solely based on the characters of the URL nothing more no additional context just characters coming in classify it clean or dirty that was the problem the reason I was doing this again was I wanted to make sure I understood what was going on on the hood what's the process behind building a neural net and what I discovered is this five key elements to building a neural net the first research this is an active space there's always active research going on so you got to figure that out so that's a first the second is data the more data you have the better the performance of your model

the third the model architecture how are you gonna stack up these Lego blocks to build your model the fourth you got to go implement it you got to build the code and then the fifth is you got to train your model so that you can improve the performance I'm gonna walk through each of these relatively quickly so we're gonna jump right into the step one research the central question I had it's probably the same question most of you guys have is what in the world are neuro Nets went into research thinking about that here's what I found out new nets are actually a collection of artificial neurons that are capable of learning from example or experience just like we

humans learn from example experience that's what these things are and they're typically organized in the structure that you see here where you have an input layer you've got one or more hidden layers and then you've got an output layer the input layer is where you suck in all of the input which is in our case the url's the hidden layer one or more hidden layers is where the where I like to call it where the magic with magic happens that's where all the computational stuff actually goes on when I'm gonna go into the math but that's where all of that happens and by the way the more layers you have in the hidden segment the deeper your network

is hence the term deep learning the upper layer just focuses on producing the result the result in our case would be this URL is either malicious or benign are you guys familiar with American Ninja Warrior and the American Ninja Warrior fans in the house we're gonna do okay great great show at the heart of what they're doing at American Ninja Warrior is they are classifying athletes as either ninjas or not ninjas sometimes you do these things you never know what's gonna listen to laughs that's the heart of what they're trying to do ninjas are not ninjas and the way they do that is they have athletes go through a variety of obstacle courses correct they go through a variety of obstacle

courses those courses are organized in stages so you go through stage 1 stage 2 stage 3 stage 4 until you get to a point where they can say with a reasonable degree of accuracy this dude is a ninja right okay what I'd like for you to do now is this think of the layers in this neural net as the stages of that American ninja competition and within the layers we have artificial neurons the circles that you're looking at think of them as obstacle courses that your data has to be able to go against to get past it and oh by the way as that data is moving from one layer to the next you're asking

it's asking the question hmm do I am I pretty sure that this is malicious or not maybe maybe maybe I am maybe I'm not but I want to be more sure or surer if that's a term then I move it to the next layer and it keeps going on that way speaking of American Ninja Warrior here is a picture of isaac caldiero if my Mac would only stop coming up with alerts here's a picture of Isaac caldera he actually won American injured 2015 phenomenal story if you guys haven't heard about the story of this guy he was a busboy then became one of the best climbers in the world and then went on to win American Ninja Warrior what

you're seeing him here is he's on one of those obstacle courses that I said this one's called the body prop the body prop requires a lot of core strength as you can probably imagine I couldn't do this I don't know about you guys but to give you a better sense of some of these obstacles I'm gonna play a short clip so you guys can see what goes on with American Ninja Warrior our first runner of the year

[Music]

oh look how she says I was so discombobulated I'm like that's not a word I use often from the rolling log just like American video has different types of these obstacle courses there's also different types of neural nets and there's not all just one plain vanila there's a whole zoo of these things but I'm going to touch on three of them for you so the first most basic type of neural net is called a feed-forward neural net and what it is is it just data just moves from one of those neurons to the next there's no looping around no fancy gymnastics the next one and by the way you can think of that as a basic obstacle right you do ten

push-ups whatever that is the next one is called a convolutional neural net and I'm not gonna lie guys this one's a little convoluted no point intended a little convoluted its convoluted because there's so many terms that are associated with it there's max pooling there's pooling layers there's filters the strides there's all these things I'm like what in the so think about it this way to simplify just think about the this particular neural net as a high-intensity neural net let me give you an example use a workout analogy again you know sometimes if you're working out you just want to focus say on your strength training or you want to focus on upper body or cardio or

whatever but you filter out everything else what these neural Nets do is they filter out parts of the input and they focus on the subset of the input with a lot of intensity and they process that and then what they do is they rinse and repeat across the entire input structure that's coming in that's essentially what they do they're very common in image processing you will see these a lot when you run into if you're trying to classify images like the fancy cat or the talking dog or whatever you will see that a lot now I'm gonna talk about the third one the recurrent neural net rnns also known as long short-term memory cells LS TMS

again again again slow down big words let's boil it down simplify think of these new Annette's as ones that have the capability to remember and to forget just like a human does it's got the it's got a capability to remember and to forget this is going to be very important moving into the next couple of presentation slides I've got for you and essentially what you're looking for again in this is you want to make sure that you have a capability to remember past input and how that performed moving into the next one at this point I figured step one of five correct we have research we had data we're going to build a model architecture

we're actually implement our model architecture and we're going to train the model five steps we're step one of five we go to step two data here's what you're looking at from a sample data perspective so a couple of things I want to share with you around data the more data you have I already said that the better your models gonna perform in this URL classification problem that was building I didn't know this until I started building it of course you have to combine both the malicious and the clean data data into one data set file and you have to label them as such so what you're seeing here if you can see the structure of this is

you can see where my URLs right and the IDS that I have for from an index perspective the URL the top half where you see the one those are my malicious so I designated one a label of one for my malicious URLs and then the bottom half is where the designation zero for my clean URLs the sources are there we don't have time to get into the sources but yeah fishtank is a good source come and crawl for clean and you can get that data so that's the first thing the other crucial crucial thing you got to remember crucial you have to designate parts of your data set as training and parts of it as validation in my case I designated

a 75% 25% split I'll explain the relevance of that when we get into the training model here right into a problem all right one of the reasons we wanted to do this was to share with you some of the problems and some of the pain that went through so you don't have to go for it one of the problems I ran into was when I was doing the data wrangling is I use pandas are you guys familiar with pandas on the on the Python guy yeah great awesome Swiss Army knife for data pre-processing love it love it love it but here it came up with a problem around my field so I'd actually jammed in multiple fields in

there and he actually pointed me to that so once I went back and fixed it it was all good I'm like step two yeah we got step two of five this is how I was feeling I was like on top of the world here's the problem anytime you start feeling like this when you're just in step two or five you know you know there's just something waiting in the wings just waiting to bite you you just know I mean from experience you just know model architecture step three or five here's the model architecture for the neural net that I actually built I'm gonna walk you guys through first layer input layer right everybody check check

right we understand what impeller is it sucks in all the input in our case URLs the next segment I wanted to have two layers in my hidden layer that's where all the magic happens and what I decided to do is I wanted to use the filter remember the high intensity neuro net that I described earlier I wanted to use that because I want to see if there'll be patterns in the characters patterns and the characters of the URL that would actually show up so I needed some high intensity filtering for that and then the next thing I wanted to do from an overall memory perspective I wanted to be able to remember some of the URLs

because you guys know how it is with URLs sometimes they're active they're malicious and then 48 hours later they're clean the bad guys have you know dumped them and they're not using them anymore so I wanted that capability the last layer was the output layer just what's the function of the Opera layer it classifies it as either 1 or 0 meaning bad or good this was what I was dealing with the problem with this is for a first-timer this is just not what you want because you know I I didn't have a clue of a house gonna implement this stuff I had no who how is actually gonna write the code so what I ended up doing was this I felt by the

way it was like I really needed some help from from an overall perspective so what I ended up doing was I came across a couple of research folks that have done work on sentiment analysis so what the sentiment analysis do it's got a body of text and what you're trying to do is you're trying to classify if this is a good review or if this is a bad review and I thought this is what I'm trying to do here as well the only difference is instead of words I'm dealing with characters so here I wanted to classify my characters a bunch of characters as either good or bad so what ended up these guys found they had a

model similar to my architecture and the beauty of it is they Hart they had started code that was unbelievable so I was able to overcome that I don't know how I'm gonna implement this by modeling Zhang and laocoön you can see them and another guy called vert fest incredible research so once I got past that here's what I ended up doing from a code stack perspective so here's what the code stack actually looks like we don't have time to go into much of the detail but I do want to say it was fundamentally python-based tencel flow oh so good the best thing in the world I'm so glad I got introduced to that thanks to Gabe

again for that tensorflow is Google's machine learning deep learning library and it's available you just import it into Python it works wonders if there's something you're trying to accomplish in pure Python tensorflow will probably do its let's say it's 100 lines in ten to four would do it in PI like half the code and then on top of that I discovered oh another wonderful wonderful library called Charis Charis will allow you to do if you know dude stuff intense flow is half the time you can do it in a single line of code in Kerris here's an example of one of the code snippets from my actual neural net you can read the details there but

it's just one line of code that defines that convoluted or convolutional neural net that's just one line of code so what step four or five that was like this is awesome right step four or five you guys going on with me you're now I move to step five this is where the abject failure or Matt success that we put into the abstract comes in this is the training process what is training training is a process that you use to improve the learning of your model as you learn through number of iterations and steps the model should improve so that's all you're doing retraining now remember what I said about data I said you got to designate your data as what

training and validation this is where you use the training and validation designation because in the training process the model will take that training data and they will try to adjust the scores and the weight so that you can learn you can learn better and so the more you learn the better it is and what you want to see the goal is to make sure your training accuracy goes steadily up as you learn more you come you become better at predicting and classifying these URLs you also want to see the training loss which is the errors that you get from your model you want to see that go steadily down as you go through this process if you're if

your training and validation phases are good then you can move into the test phase the test phase all it means is you can suck in a brand new URL or a set of URLs that the model has never seen before it's never seen it before you suck it in and then you can predict it with a reasonable degree of actually up in the 95 98 percentile what do you guys think we came up with abject failure or mad success tensorflow gives you tensor board which is a visualization tool all included I'm telling man that thing is wonderful so you don't have to deal with trying to do a separate visualization it's all in one nice package this is a

view of tensor board the only problem with that is anyone want tell there's nothing in it what happened to my data well I should be seeing something going up what I discovered is first of all you've got to make sure you point the logs into something else into the right directory first of all you don't have right director you I'm gonna see anything but tensor board has a toggle button for the runs and that toggle button is actually if you toggle it off you won't see anything you've talking about you see something yeah it sounds funny now but trust me in the heat in the heat of battle this is really disconcerting this is kind of

like what we got I got five minutes okay perfect we're doing well on time we're almost at five right so I'm like what in the world so once I figured out the toggle button oh right I mean you guys laughs trust me you'll be emailing me later saying yeah I ran into that same problem yes I got my training accuracy results and what you can see here is the x-axis is the number of what I would we call the training steps epochs right it's the number of how many times would be training and you could see the accuracy going up going up steadily with each one the first time it saw your are ELLs they didn't know anything so he

couldn't really classify the accuracy was really low and and of course the training loss going steadily down remember right these are just this is just based on characters there's no other context in this URL but the characters this is magical that the model can do this nothing else it's classifying this with this level of training loss remember what I said when you start jumping up and down mm-hmm I go to validation accuracy I see a straight line doop I'm like what in the world is a straight line what so I go research you know what a straight line is well it means you have too little data to live there this is called overfitting in the deep learning world

its overfitting and what those overfeeding mean it means well your model can basically classify anything at this point it's not generalizing properly bottom line is this is not good this is not good bottom lines this is not good so I was like okay what am I gonna do all right so if the problem is I need more data I was messing with the hundreds of data category before I said let me go up as soon as I went to 10,000 URLs boom oh sorry I got excited it was so good he's like some validation logs right it's like yeah this thing is actually doing what it's supposed to be do based on characters I'm not doing anything else

I'm not giving any contacts though just based on the characters this was phenomenal here's what I want to do I want to go is my snow on our face okay so going back to the Amy web story okay she built her own model she was looking for mr. right and she actually ended up finding mr. right they're married now and I think one of the reasons why she was able to do that it's because she said she stayed centered on knowing why she was doing what she was doing and for us even for me up until now the only thing I've told you guys in the inter time we have is what I've done how I've

done it I haven't actually told you why I was doing it and I think most of us in this room we share common why let me explain what I mean there's a shortage of data geeks you can't walk down the street and be like oh yeah I just want Paige Davis scientists please no you can't do that there's a shortage of cyber talent you can't do that either it's not like I have a whole lot of cadre I mean I don't I'm preaching the choir here you guys know this two minutes awesome deep learning has actually gone wide now it's more easily accessible to most of us we need to be able to leverage this and fundamentally

the heart of what we're doing from a wide perspective it's not just to protect our ourselves our families or the companies we work for it's really just even protect the overall community and sometimes I think I don't know again I don't know about you guys sometimes I forget that I want to share a very short video with you that best illustrates understanding and reminding us of why we do what we need to do to to pursue a career in comedy or people no matter what a profession is sure pastor Craig is people who walked up to him on a regular basis say hey what can I do to become a pastor people are always asking what can I do

the key isn't what you do the key is why people always want to know that what but the truth is is all you have to know is your lie when you know your why you have more options as to what you do an example is this I am actually caught two comedic ly my Y is two comedic ly inspire people to walk in purse my what looks like stand-up comedy it looks like movies it looks like being on late-night TV have my own TV show I have a lot of options for what but my Y never changes and when you understand clearly when you understand your why your what has more impact I have an example one of

my Western I'm currently doing is I have a web series now called a microneedle breakdown it's on YouTube we drove Auto Wednesdays and basically what a great time is it's one of my live stand-up comedy performances whatever I'm going to concert anywhere in the country in the middle of my show I will stop doing what I'm doing and just talk to audience members and comedy just kind of happens well we're in winston-salem want to show you a clip we're on winston-salem and this guy I talked to him he said he's a teacher and it's break time we're sitting down he said he was a teacher he teaches music and I was like well you know can you see and check

out what happens so your musical director yes sir all right so uh let me get a couple to get a couple bars of like a Amazing Grace key to the first part and if good in Gray's house we go read by that ruckus a Jose now what's the keeping a person is your uncle just got jail you got a shot in the fact we use a kid on the saying let me see the good version real quick if you know for I'm about to see if 10 exist

[Applause] [Laughter]

[Applause] [Music] [Applause]

[Music] [Laughter]

so the first time I asked him the same he knew what he was doing the second time I asked him the same he knew why he was doing you understand your why your what has more impact because we were walking in or towards her purse she is to understand the why so now I'm thank you so much guys I really appreciate the opportunity have many thanks to gay Bassett here for bringing this alive all right thank you

so here's the fun part though do we have time for questions if it's just a lunch thing no one's coming to use the room because this is the fun part for me because now I can really interact with you guys and answer can I say what are you trying do you're trying to see the dad no I try to sing I actually sing a church yeah if you guys want to ask questions because I went through that relatively quickly man no questions I mean you guys suck what is up with you guys no questions all right we got one alright let's let's talk let's get a question I'm like what in the world I mean this is a fun part for me man jeez

yes I'm just curious about hardware did you have to use CUDA cores was there proprietary technology involved Oh GPU CPU excellent question you see these are things I couldn't cover in a talk what hardware did I use it was important for me to use my personal computer so what I did was my CPU on my Mac is what I used for this work no GPUs no TP use none of that stuff TP use are the new ones from Google if you guys are not familiar if they're called tensor processing units unlike GPUs and they're optimized GPUs are actually optimized for neural nets and I didn't use any of that this Mac I got the whole thing on there so great

and I couldn't yeah I couldn't talk about all that because I don't have 25 minutes awesome thank you for asking that question yes more questions oh you're in luck neural networks are like inherently bad at learning strings did you do any type of pre-processing or did you use like pre-built functions for processing the strings before you put in okay so um let me let me repeat the question because I think this is a crucial question that you're asking so your question is neural nets statement first neural nets are incredibly bad at processing screen input strings right strings okay and so then the question is did I do a lot of pre-processing to organize my string the simple answer to

that is yes so remember my step - of the whole thing where I said data processing or data pre-processing and data wrangling yes I did but it wasn't as complicated as you would think because between what I had intense afloat and pandas it just solve all of those problems go away they just disappear the the libraries in there just they just did they really did I was thinking I have to do like UTF stuff and I have to be able to figure out what the you know because you know within URLs you may have you know percent 20 and none of that stuff it took care of everything at least in my mom okay did that answer your question thank you

yeah I have a question have you poached you this online or have you done any you know publicity the research online that you've done and I know someone's gonna happen about that we even talked about it someone's gonna ask me about that where is the code where's the blah I have not I am debating well it dependent there was gonna depend on how this went people wanna be here on it man I mean there's no point of me posting stuff people don't want to hear about so maybe I'll consider it now are you saying this will be a value if I actually have some okay go ahead so I noticed in the one of the

validation slides you mentioned that you had 10,000 samples as a validation set can you talk about the size of the training set used and how long it took and how many epics you had to run it okay so um let me repeat the question again in terms of just my designation of training and validation set I think is what you want to see how big was the entire dataset and by the way when I said that 10,000 URLs even though I was seen I was using that number to correct my validation error that was the entire set so my initial set was really small because I was afraid my little CPU here wouldn't be able to handle it and plus I

was just focusing on the process let's just figure out what would find out so I was in the hundreds I was less than I was less than a thousand in the first run and then went once I had that problem the validation problem I bumped the my total data set into 10,000 so that means I had 7,500 malicious right seven five percent split and then 25 percent clean twenty-five hundred URLs are clean and I'm probably gonna keep bumping that up until I see where my cpu can max out I don't know but this here's the important thing there is you can obviously go into the millions but then you have to get GPU TPU you got to

go higher from a hardware perspective but the process doesn't change you still got to do it the same thing that we're doing here any question questions question oh you got one do we have a mic ok thank you

[Music] how would you go about implementing these things for other logs that we find in cybersecurity great question did you guys hear a question how can we apply this to other aspects of cyber security such as log analysis for example and the way I was thinking about this looking ahead logs for the most part or just what text write strange characters whatever you want to call right correct well numbers yeah how strange I can string a number right I could string it up but my point is if this worked let's see how we can apply it to not log emmaus and oh by the way you know where that I was talking about memory having

that memory capability right guess what it's gonna be very crucial from a log perspective because if there's a log at time T that said X at time T plus four that's correlated with that log you need kind of that memory to be able to say okay I saw the same thing or something similar that happened a while ago so I'm gonna use that input as part of my classification so I haven't done that yet but I figured if we can at least start here right baby steps then we can start you know building towards that for next year is my god that would be awesome for next year at least trying it out and see what happens I don't think

it's one more question if that's all right with the chair question I'm good looking forward at you know where the future of this work goes to does it kind of scare you what it could potentially do at some point yo well me I haven't thought that far yet but you're probably right we should probably start thinking about that because I mean here's the thing guys this looks like a basic and it is a basic attempt but the power and what we just did here and I use we because we just did it there's nothing else the only thing I saw was characters and the thing okay this is malicious this is not just characters so to your point I haven't

thought that far yet but I'll be sure to remember your question all right thank you very much thanks guys I appreciate enjoy your life awesome

GT - Deep Learning Neural Networks – Our Fun Attempt At Building One - Ladi Adefala

Related talks