← All talks

Machine Learning: Too Smart for its Own Good

BSidesSF · 201825:06147 viewsPublished 2018-04Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
Thomas Phillips - Machine Learning: Too Smart for its Own Good Wouldn't it be awesome to build a machine learning device that ran on tubes, valves, and gears? Terms like machine learning, deep learning, and neural nets are often brought up as if they are a magical cure for security problems. Unfortunately, machine learning systems have fundamental, inescapable limitations. Exploring the limitations is normally done through discussion of the mathematics involved. Instead of using math, we will explore the limitations using a steampunk model. In this presentation we cover the essential elements of neural nets used for machine learning. Instead of using math, we will go over how to build a physical implementation of a machine learning system using tubes, valves, and gears. With the model we will then explore how and why the machine generates false positives.
Show transcript [en]

[Music]

hi I'm tom how you doing please bear with me if I fumble the technology here so I'm going to talk about machine learning I've done lots of stuff in the past I do lots of stuff today I spend a lot of my time worrying about how to engage in neutralize adversaries that have breached a system that's what I'm doing lately I've done a lot of machine learning stuff in the past I've looked at how we can use machine learning today I got involved with neural nets back in the 1980s by that time machine learning was pretty much old school I actually got started back in the 1950s I think and it was studied quite a bit what

really made it take off were the speed of CPUs and how much storage we had the storage devices and so now we can solve systems of equations much faster and we have much more data that we can use to analyze however it's not a panacea as you've seen some things in the news recently machine learning things can break down in interesting ways sometimes in ways we don't like so there's a lot of stuff here it's going to be kind of superficial I'm trying not to make this a computer science talk or a or a mathematics talk I want to focus on the security aspects specifically why do we get false alarms what I'm talking about

for machine learning also generalizes to other types of systems that detect things classifiers in general so with that the things that you should be taking away from this or if you've never used machine learning before what is it how does it work where these false positive things come from what is it involved and for people who actually use machine learning either in production or they're just toying with it to give you some things to think about in terms of feature spaces what should i train it with how should I define what its operating on etc so there's three parts to this talk this is going to be a kind of an experimental talk we're going to

see how it goes the first part I'm going to give a very gentle introduction to machine learning in specifically I'm going to be looking at artificial neural nets and even more specifically one specific kind of artificial neural net the second part of this talk is the Steampunk apparatus for machine learning this is my pneumatic neural network and I've got one set up right here I don't know if you could people in the back might not be able to see it what I want to do is when I start the second part I want to invite whoever is brave enough to come up to the table up here to play with it there's a bunch of knobs

up here so if we go back to that first page you see that picture that's what's up here now and it's actually a physical implementation of a neural net so the thing that you would use to train and and classify things you can do with this and we'll talk about the problems that you can solve with it so the first thing oh I'm sorry and the third point we're going to solve some real problems with it so we're gonna say what can this thing solve what can it not solve and why can't it solve it and what would we have to do to make it so it could solve the problems that we've got so machine learning I'm gonna

go kind of fast so we have a lot of security applications for machine learning for example a bunch of code is it malicious the login times of users do they look unusual the patterns of our network do they look okay for a Tuesday afternoon is the data flow look normal or not does it look like it's we're exfiltrating something does it look like a malicious payload or some unauthorized payload might not might have just been downloaded here's an example of a problem that many of you might have faced personally you're looking at a piece of code maybe it's a binary dump of something and you need to know what is this is it malicious or not is this

just some core dump that ended up on the system or is there some kind of malicious payload hiding in this and I'll give you a spoiler here this is not malicious this is actually a binary dump of a small program that was not malicious this you might have a bunch of log files here this timestamps there's a bunch of stuff in them did anything unusual happen during this time were there any intrusions these are hard problems they're very hard problems they take a person a long time to figure out so let's say let's use machine learning and see if we can solve these kinds of problems so what does that entail so machine learning is basically you've got

a bunch of inputs and you feed it to the your machine learning system or your neural network or whatever you want to call it your AI system and it's gonna spit out a bunch of output well that's okay but we want the right output so what we're going to do is we're going to give it a bunch of input now we're going to give it a bunch of output and we're going to say this imp it matches that output and that's the training phase of machine learning hey and the third part there is you're not probably going to get it perfect so let's just get it close enough and that's what we're trying to do with it

there's some buzzwords here on how you can put together neural nets what they're made of it's not that important right now the most important thing to take away is that there's not a right way to do it so I might preach and say machine learning is done this way or owner all that it's done that way and there's so many different ways that you can put these things together there's so many different types of equations you can use so many different types of training algorithms here's an example of a neural net this is a simple one so what you're looking at the the circles there are we will call them neurons okay and they hold the state is it turned on

or is it turned off and the ones on the far left those are the inputs okay so it may be our malicious code now you're seeing a simple one and I'm gonna show you in another slide what it looks like when it gets more complex but here are my inputs and the inputs define your feature space these are the things that you're actually measuring and feeding into your machine language machine learning system and then you've got another layer where they're feeding in and those feature or those features or inputs on the Left are feeding into another layer of neurons and those become features and their own right not necessarily features that you can understand what they are but their

features and those features will feed into the final output and the final output will give you something like a yes or a No or it's it's blue or it's red for example and there are the more layers you have here it's that's what deep learning is deep learning is when you have many layers of neurons I see that the slides are getting misaligned here oh all right okay so the topology how those neurons are wired together this is part art part science part black magic trying to figure out how many neurons do I have how are they linked together what kinds of inputs am I going to use so here are some examples of neural nets

that have been used the one in the the one in the bottom right was out of a textbook just demonstrating neural nets and machine learning the one in the upper-right was out of a I believe it was a geology paper and the geologists had used this neural net to train a machine learning system to recognize certain types of soil and then the one on the far left was a neural net that was designed to recognize Pascal programs I don't know how many programmers there are but Pascal is a programming language usually not used usually it was used many years ago more so let's see and references if you're interested in that stuff when you get

the slides so machine learning is going to learn these patterns we give it the input we tell it to train itself to tune the knobs and in the second part you're going to see that there's real knobs involved here that in our physical implementation and then it's going to give you the output right and we hope that the patterns don't change right we hope that if this happens then that happens and it's always going to be that way and if it doesn't then that means something's broken right something for example a malicious intrusion so breaking the law so it learns the rules or the rules or the the patterns right and then who breaks the rules the bad

guys the adversaries and so that's the hope the hope is that we're gonna have this machine language I'm sorry this machine learning system that that learns how our system behaves on a regular basis it learns what's normal and then when we see something that's abnormal hey it's must be a bad guy right because the rules have been broken except we'll hold on there's other things that actually break the rules and now we're getting our first indication of maybe something is wrong with this this way of thinking of taking a machine learning system and using it for for the end-all-be-all of our security because there are outages there are upgrades policy changes business policy changes there's employee turnover at any time

one of these things change they change the relationship between the inputs and the outputs the inputs are things like the stuff in our log files the binaries that we're seeing on the on this file system and the outputs are our judgment as to whether it's okay or not whether it's normal or not whether this is malicious or not whether it's acceptable use or not so now I'm going to go to the second part so that's a general overview of machine learning now it's it's deep stuff and there's a lot of math involved and I didn't think that most people they'll kind of go to sleep if you go into all the math so I thought hey this

is beside San Francisco and the theme here this year is steampunk and I don't have a steampunk outfit I'm sorry I see a couple people have got them but the best thing I can do is I could actually build an apparatus that uses tubes I would actually like to use a little steam engine a pneumatic steam engine except I don't I came in an airplane and I don't think they let it on the airplane so right now I'm going to invite however many volunteers are brave enough to come up to that table over there and actually play with this apparatus which is plugged in and running right now are there any volunteers anyone bowled all right

just right up front there right up front so you guys are going to see this thing firsthand everyone else is just gonna see these fuzzy pictures on the screen so that picture is so I'm gonna go through these slides and it and just watch as I go along and you guys can can fiddle with the knobs and you're going to see what all these things are for so we have in our apparatus we have an air pump so we need a source of energy and in this case it's an aquarium air pump it's an 80 gallon aquarium air pump and it's got a little dial that adjusts how much air is flowing it's plugged in so

that should be live we've got four way metal gang valves oh three of them in this case a gang valve for an aquarium system is one tube goes in and four come out and it's got four Lowe valves and you'll see a schematic diagram here in a in a moment and there's tubing so we're not using wires this is all air it would be steam if we could but I don't think that would be practical and then we have a comparator which was custom made by my son out of Legos so the challenge here is that we've got two tubes at the end of our system and I'll show you some diagrams here two tubes in each one's

blowing air and we need to know which one is blowing more air than the other one right and so there's a Lego device up here it's little Legos and then there's a little plastic BB in the middle a plastic ball and one tube is blowing it in one direction the other tubes flowing in the other direction and then there's a bias so you guys see that thing underneath you can adjust it to adjust the slope which will bring the little ball down to one side right and so the ball would naturally fall down to zero unless one tube blows it up to one now let's look at some schematics here so what we're modeling here with this physical model

is a single neural a single neuron really really really really Wow so the one on the right there the output neuron that's the neuron that we're modeling and then there's four inputs and those imports aren't really neurons they're not accepting anything we're just turning them on or off and in this case what we're doing is we're modeling a 4-bit number so we're not going to analyze any malicious code with this with this pneumatic neural network but there are some interesting problems that we can do with it that involve for bet inputs and basically what it's doing is those four inputs its fusing them together right it's it's there's an equation behind this but if you look at

it it's actually a physical commingling of the air of the air inputs so here's a gang valve so the blue circles or what a hose connects to an air hose and the little yellow things those I use those to represent valves that you can turn on or off and you can actually put them in halfway States also so in a neural net in an artificial neural net these would be considered the weights so a neuron is tied to a neuron and then there's a weight so this neuron has so much of an impact on another one and this is a more elaborate diagram of what these guys are playing with right now on the table so

we've got the air pump going to the one on the left the four valves on the Left represent our input our 4-bit number right so you turn them on or off and they're actually labeled so if you look at that in terms of binary you have a bit four eight you have a bit for four you have a bit for two and you have a bit for one there are four tubes coming out of there now in theory well not okay so really what I should have done is put little tea splitters on those and then have for each one of those go to the top and each one go to the bottom but that's more

expensive and it's more stuff to carry my backpack and so instead I just had four hoses and then you can disconnect them and reconnect them to whichever one you want and you'll see that one of those splitters is going to represent a positive bias and one of those splitters is going to represent a negative bias another way to put this is one of those splitters on the right will push the ball in one direction and the other one will push the ball in the other direction and the valves on the splitters on the right just before the green triangle which is our comparator each of those represents a weight so how much impact does its input have on the

output and then the comparator here again it's a little Lego thing maybe afterwards y'all can come up and look in it and it blows a ball in one direction or the other and so now we're going to take this and we're going to these guys are gonna solve some problems with it right so we've got 4-bit inputs so here are three interesting problems to solve one can you make it so that the ball will tell you whether the number is even or odd that's one challenge another challenge is can you make it so that the ball will tell you whether the number is greater than three and then the final challenge is can you make it so the ball will tell

you whether the number is prime or not so a prime number cannot be divided by anything other than one in itself in the first prime number is two so zero and one technically are not prime now we're going to look at the answers here while they fiddle with it so with this simple neural network this pneumatic neural network that we've built the first two problems give us lit what are called linearly separable data right so let's say you're going to ask a question that we're trying to answer is who's who can be heard to the microphone okay and so it's those guys or meet so we're not gonna let them talk so those guys are me right so we can draw a

line right here right there and we can say everyone in that side of the line cannot be heard to the microphone but anyone on this side of the line can be heard on the microphone linearly separable so the even odd problem is linearly separable we can divide these up by looking at the one bit and say well the one bit is on then it's odd if the one bit is off then it's even right and so you can very neatly divide your data into two piles and that's what this thing can do and there's a solution on the next slide we'll see that and greater than three is also linearly separable with this device so what we do

is we look at the first two bits and we say if either one of those first two bits are one if they're set then the number is greater than three if both of them are turned off then the number is not greater than three and so here's a solution for even or odd so there's you look at the bits and you look at that one bit and you're going to push the ball in one direction or the other and we're gonna push it in the positive direction and that's uphill thing of plus one it's going uphill and here it's quite literally it blows it up the hill with the pipe or the tube and the others

have no weights so we actually disconnect those tubes they don't do anything all we care about is that one bit and so if it if it goes it's gonna blow uphill and if it doesn't then there's a bias it's a ramp and the ball rolls downhill it's very simple simple problem but remember what we're trying to solve our hard problems like is this malicious code were the login patterns on Monday were they normal or not so let's make it a little bit harder and see what happens is it greater than three again it's easy to make the data linearly separable we put plus one weights on the four and the eight bits if either one of those is set

it's going to blow the ball uphill if neither one is set it's going to naturally roll downhill because we have a little ramp in the ball rolls downhill and then there's the hard problem so if you look at the prime numbers if you get the number zero through fifty and you say which ones are prime they're divided up here and with the bit patterns and so now we're you know this is kind of like a forensics challenge we're saying here's what we're seeing how can we decide whether it's good or bad how could we decide whether we want the ball to roll uphill or downhill and I left out zero and one I called them

undefined so I can illustrate a point in the next or the next slide about false positives so does anyone have any ideas on how you might separate or these guys maybe do you guys have any ideas on how you would adjust the knobs to tell you if it's prime or not right no okay if they're working on the binary let's take a look at this slide okay so I'm gonna skip ahead so here is a possible solution the first thing that you should notice is that it's not a simple neural net anymore I cheated I added more layers okay we could probably do it in two layers but I'm showing it in three layers right now so the way to or one

way to solve this because there's never a right way to do this but one way to solve this is to train and when I say train these guys are training the system right now so if you ever wonder what a machine learning system is doing when you're training it it's literally doing this it's it's going through and experimenting and going back up and saying I turn the knob and the ball goes would say or that way it didn't work so I need to adjust that it's trying to find local minima and Maxima for each of these pre - these weights so we can train these these intermediate layer of neurons and have the top one decide is

it 15 or not that's a pretty easy one to figure out we can say is it nine or not that's fairly straightforward we can say is it odd or not right we've already done an even odd we can say is it two and if you train that middle layer like that then you can have a neural network that will tell you whether that 4-bit number is prime or not except there's going to be exception and it's I believe it's it's either zero one one's gonna be a false positive right so what this is what we can do with this is we can say look if the number is odd and it's not - and it's not nine well I'm sorry if the

number is odd and it's not nine and it's not 15 then it must be prime except 1 is not prime so one would be a false positive in this it's a why why do we have this false positive well the first thing to note is that we had to pull out extra features there and if we draw a line and we talk about linear linear separable ax T we're talking about linear separability in a single dimension right it's it's over there it's over here here we're talking about something that's linearly separable in four dimensions right so we've got we've got to check to see if it's 15 so that's one dimension we've got to check to see

if it's nine or not that's another dimension you've got to check to see if it's odd or not and then you've got to check to see if it's - or not and then if we want to be really good about it we want to check to see if it's one or not that would be five dimensions if we added another or we could pick that up in the next layer of neurons so as our data become more complex and harder to organize we have to add extra dimensions we have to add extra layers to our neural network which implies us adding additional dimensions to our problem space so now we've got to think in four

dimensions or five dimensions or a thousand dimensions so you might hear about people training machine learning systems using using thousands of different inputs simultaneously that's actually processing a system that's working in thousands of dimensions something is very difficult to geometrically imagine and each of those intermediate layers represents features and features that you might not have thought about but the machine found through essentially trial and error there was something in the news recently about I think it was playing Qbert the video game and it found a solution to the cube root video game or it could rack up the score and it did so by exploiting a bug in the cube root program in in the main implementation of

Qbert that nobody knew about but it found it because it was just randomly kind of going around and exploring and found it so here's our our three main points here higher dimensional problems they require more layers of neurons and that's where this this notion of deep learning comes from more layers require more data for training and too many layers means we can never train enough and that's that's a little bit too much to explain in a short talk but the gist of it is that when you have higher dimensional space you have more possible places where the data could land as outliers and if you don't have sufficient training data you you're never able to train the system on

all those possible outliers and why is that important if we think back to the who breaks the rules slide right we think it's the bad guys but wait there's outages and stuff and we're we're trying to train our system to tell us if Monday was normal or not out of any given year but five minutes left oh yeah out of any given year with is only 52 Monday's right so how much can we train it if we're training it to recognize images of dogs or cat videos that's one thing because there's a lot of those and there's there's no shortage and people will just keep making them but if if it's for your organization and

you're trying to figure out are the machines working correctly on this Monday or not there's not much data that you have to train it on and so in this high dimensional space that creates a problem and so where are false positives coming from there's there's two things two problems either we don't have enough layers and our artificial neural network so we can't adequately separate the data and some data are gonna land on the wrong side of this line or this hyperplane or we've got so many layers that we can't have enough Trent we cannot find enough data to Train it right we may get faster computers but if there's only so many mondays then then

we're stuck with there so the takeaway here sometimes a machine learning works but the system will be fragile at best when an adversary is involved because while it's great for the Natural Sciences is great for problems where your data sets are fairly predictable fairly stable it doesn't work well when somebody when your environment is changing on a regular basis like it is an in a business or when you have an adversary who's actually injecting data into your system trying to deliberately break your system he's still tinkering cool and that's all I've got did I run over [Applause]