← All talks

GT - F Your ML Model

BSides Las Vegas55:47425 viewsPublished 2023-10Watch on YouTube ↗
About this talk
GT - F Your ML Model - Colt Blackmore Ground Truth @ 10:30 BSidesLV 2023 - 8/8/2023
Show transcript [en]

good morning welcome to uh bides Las Vegas obviously if this is not your final destination please deboard the airplane and find somewhere else cuz you definitely broken the time loot continum uh this is f your machine learning model by Colt Blackmore a few announcements before we begin we'd like to thank our sponsors especially our Diamond sponsor Adobe our gold sponsors uh bluecat prre Toyota it's your support that makes this conference possible please silence your cell phones and as a courtesy for your speakers if you're going to ask a question please move to the microphone raise your hand when we're ready we'll call on you and to alleviate some of the time crunch we had getting things set up hand it over thank you all right how's that volume can't hear me can't hear me that's just mean all right we still got people wandering in but that's all right I'm going to meander a bit at the beginning here so I have a theory about why I ended up going first so I I'll attempt to describe it uh I was looking at the schedule this morning and there are by my count uh Baker's Dozen talks here in ground truth over the next couple days they aren't all about machine learning but a bunch of them are uh about a third actually exactly a third and I can only assume that uh maybe some impish organizer made that decision with the implicit understanding that we would start things off with a bang by uh let's say crapping on machine learning from a great height for the life of me I can't figure out what could possibly lead somebody to uh such a belief certainly not the title of the talk uh or or the description uh but as a matter of fact I have nothing but love in my heart for machine learning so rather than do a typical sort of speaker intro I'm going to do an origin story and I'm actually really curious to see if my experience here is unique or if it's actually pretty common with all of you who do data sciency stuff so as show of hands how many of you remember the exact moment that you first encountered machine learning that's way fewer than I thought uh interesting all right let let's Whittle it down still a little bit I'm curious if anybody's going to be left um so those of you who just raised your hand uh if you found your way to machine learning on your own uh like it wasn't a school assignment or a task you were given at work or something like that put your hand back up that feels like more hands than there were before you guys are are are ter terrible audience ridiculous um but that's good all right so we we have something in common uh I also remember the exact moment I first encountered machine learning uh there used to be a website called gamma Sutra it was a video game industry site so not like for fans of video games but for people who worked or at least aspired to work uh in the industry and sometime back around 2009 is I don't remember the year the date obviously exactly but I I remember the moment uh they published an article on this new thing that people were starting to use in video games called machine learning and the only thing I remember about that article is the example that they LED off with because it was so damn cool so there was a hospital in Canada I'm uh almost positive it was the Toronto Hospital for sick children but the article is long since gone from the internet so I can't verify that but I'm pretty sure that's what it was they're attached to the University of Toronto and they were using machine learning to detect when kids would get sick before it actually happened and again this is 2009 right so the state-of-the-art at that point uh compared to today not so good uh it was like a basic time series model the feature space was quite small I want to say it was around two dozen features if I put you guys on the spot right now and asked you to name features we could use for some kind of model like this they were using the exact kinds of things that you would think of right it was heart rate it was temperature oxygen level skin conductance uh blood pressure those kinds of things right so about two dozen of those and with that in place they were able to determine with a reasonable degree of accuracy right 70 to 80 % about 24 hours in advance when one of these kids would become symptomatic right it's not like you're not figuring out that a kid's going to get sick before they're sick you're figuring out that they're already sick they're just not showing it yet and of course by knowing that 24 hours in advance you can apply early care you can reduce the impact of the illness and like the long and short of this is literally saving babies right that's uh I think we could all agree not a bad thing so machine learning uh actually pretty cool um that was to that point in my life as a technical person probably the coolest thing I'd heard of I I didn't have uh any kind of background in statistical Methods at that point I don't think I'd even heard of linear regression for example um so I I didn't know anything but it was an interesting enough example to Dive Right In and start working on this stuff and so uh a year and and change maybe later uh I made my first malware detection model and and it worked uh quite well so this was 2010 and uh well enough uh in fact that about 5 years later when I was working at pal to networks uh we took that thing that ID built 5 years before and uh kind of a stripped down version of it that wasn't quite as good and we shipped that in a couple of different products so again machine learning literally saving babies uh more or less built my whole career on it uh I can't say too many mean things really like nothing for nothing in my heart but love for for machine learning uh it it's pretty great but there's got to be a but right so but machine learning is not the best solution to every problem in fact there are whole classes of problems where machine learning isn't even a good solution and and actually there are cases where you can prove this mathematically so you can look at things like inapproximability results and uh in certain instances you can prove that machine learning is just going to be a terrible approach to a problem because the answer it gives you can't be guaranteed to be more than like 50% of the optimal answer or 60% of the optimal answer so that's just kind of how things are meanwhile there is this big old wide world of AI out there Beyond machine learning often very different from machine learning but sometimes similar that in a lot of these cases where machine learning is not effective can be used to tackle the same problems and can do it better than machine learning can right and so what I've been wondering over the last five or six years uh as I've become more familiar with these other areas of AI is what the hell is going on in cyber security where we don't hear people talking about these other methods we don't see them using these other methods why is everybody so fixated on machine learning and we could speculate a lot of different reasons why that might be the case but the long and short of it is like this is this is where we are um I think a good microcosm of the problem is actually uh self-driving and since I started with a clown slly it I figured we might as well have another clown and every clown deserves a nose so there you go Elon uh self-driving if you ask you know Joe on the street or even probably the average technical person they're going to just immediately associate that with uh machine learning right and and we know that that's not entirely unreasonable machine learning is a big part of what goes on in self-driving but is very far from the only part so if there are sort of three foundational systems that exist in self-driving machine learning is is really responsible for one of them sort of foundationally right so the perception systems the the car's ability or whatever you're driving I guess it doesn't have to be a car but its ability to to see to sense its surroundings to know there's a sign and it's a stop sign or a yield sign or a stop light to see lanes and Lane markers to see other cars all these kinds of things right machine learning drives all of that so it's totally fair to associate ml with self-driving sure but it's only one of these three core systems and the others are equally interesting and we can find ways to apply them like meaningful ways to apply them uh to cyber security so for example planning systems are quite important um if you're not familiar with automated planning or AI planning which has fewer syllables um planning systems create a logical representation of the world and our capabilities within it to allow us to reason about how to achieve things within that world so really basic example I wish I had like an attached mic so I can move around the room to try to illustrate this better but um we we do automated planning or human planning I guess in our heads all day long every day um if I have a goal which is say to advance to the next slide right I have multiple ways I can do that I brought a clicker thinking I might be able to walk around and so if I was over there I could use the clicker to do it uh the other option of course is to be at the laptop and then I can use the keys like that works too all of those actions I could take have their own dependencies I can't use the clicker if the battery's dead I can't press the key if I'm on the other side of the room so my and the location of the laptop come into play but this is what planning is right it's a big logical representation of the world and a system for navigating that and being able to achieve things within that world so we're going to talk a bunch more about that the third pillar of self-driving is Control Systems uh control systems are really where the rubber meets the road right so if planning tells you when to change lanes and when to turn right and left it's sort of like the Google maps of this whole thing uh the control system is the thing that hits the gas hits the brakes turns the steering wheel and these are usually formulate formulated as mathematical optimization problems usually and they have some kind of physics based constraints right so like gas brakes turning the steering wheel sure but if you hit the gas too hard you might fish tail and run into a wall if you break too hard you have problems if you steer too hard you have problems so physical constraints come into play there and you get some some really interesting problems my uh favorite example actually of uh Control Systems from let's say the last decade actually has nothing to do with cars it comes from uh SpaceX another musk company uh and the vertical Landings of rockets which are just a acious uh control system optimization problem so of course Elon Musk wants everybody to think that he's Tony Stark and he solves all these problems themselves we know that that is not the case in fact we know exactly who at SpaceX is responsible for solving this problem making things happen it is another Blackmore no relation to this Blackmore that I know of so Lars Blackmore formerly of NASA JPL uh left he worked on a team there that explored this kind of stuff now he's at SpaceX leading the team there that explores this kind of stuff and he is the uh the main guy who's been responsible for making uh the vertical Landings of the space uh SpaceX Rockets real and the way he went about that and the people he worked with at NASA went about it and the other people at SpaceX way they all as a team went about it is really really interesting so if you're familiar with mathematical optimization you probably already know uh there are these two sort of broad categories of functions that you generally have to deal with one of those categories is convex functions uh convexity is a really nice property for a function to have uh it means that when you look at the uh solution surface for the function get a nice Bowl shape like this so like if you drop the marble in at any point on the function it's going to fall down to the bottom and rest there it's really easy to find whatever Optimum of the function that you care about right so it's it's nice and easy to deal with then you have the sort of hormonal teenager function where it's non convex it is all over the damn place uh you really just have to watch yourself around it because it gets angry for no reason all that kind of stuff in this case if you were to drop a marble in from an arbitrary point in the function you have no idea where it's going to come to rest right it could be a local minimum it could be a global minimum it could be all over the place when it's a an important problem like Landing Rockets wherever it lands like it might be good enough and you land your rocket safely but it also might not be good enough and your rocket explodes um and there aren't people on the rocket so that's not the end of the world but it's also not exactly the goal that you're hoping to achieve so what Lars and the NASA folks and the SpaceX folks figured out is a way to relax the non-convex function of the hellacious rocket Landing problem into a convex version right we call that a relaxation of the function and this isn't uh particularly interesting in itself because the way that you do mathematical optimizations often to find relaxations and solve those and use those to bound the other function just sort of zero in on the ultimate answer but what they figured out how to do was find a relaxation where when you find the solution for the relaxed version it's guaranteed to also be a global solution for uh the original problem which is pretty damn cool so instead of trying to tackle something like this with neural networks where you have no guarantees around the results you have to figure out how how do I even run this in a rocket uh doing things that Rockets do which are maybe not amenable to you know holding Nvidia gpus or whatever uh they found a precise ma mathematical way to uh approach it um and now they land Rockets like you know three times a week or or whatever like it's kind of routine for them so we're not going to talk about uh Control Systems per se today but we are going to talk more about mathematical optimization because it is an important tool in our toolbox for dealing with uh security problems but we are going to start with automated planning which uh is a lot of fun because it's linked to video games so um autom planning is not new it's kind of an ancient inv venerable field people are still doing like Cutting Edge research in it most of that deals with real-time systems so things like self-driving robotics um that's where all the hard problems are because in real time to be navigating a world reasoning logically about it right that's not a trivial thing to do so cool cool work is being done there but the area where I was first introduced uh to AI planning and where I've spent the most time with it uh is is video games because you can do really cool stuff with this uh in video games so the example I want to call your attention to is the game fear this is not a new game if you're curious it's 20 years old uh I think originally published in 2003 the AI lead on fear was a dude named Jeff Orin he went on to do his PhD at MIT and is uh turned out to be quite the kind of AI and computer science guy but back at this point in his life uh he was building AI for for video games and so what he did is he looked at the way that people did AI in games up to that point which is really really basic it's things like uh Behavior trees or finite State machines which have to be manually painstakingly explicitly encoded by human beings it's a terrible terrible approach uh and he didn't like it and the results that it it delivered like didn't like those nobody likes those so he started by taking a system from Stanford called strips if you're familiar with it it's the uh the Stanford Research Institute planning system strips and he uh so to speak stripped a bunch of stuff out of it and then enhanced it with some other stuff to make it work in video games and from there he was able to build a system that basically blew everybody's hair back uh people even today they go back and they play the original fear game and they feel like when they're playing against the computer they're actually they're actually playing against other human beings like it has a a real sort of liike quality to it um it's very Dynamic uh and and interesting right it just feels like there's somebody else on the other side of this thing uh to the point that it even weirds some people out a little bit so the reason we know so much about fear actually is because uh Jeff did a talk like this at the game developer conference he published a paper on it that paper was called three states and a plan the AI of fear I encourage anybody who's interested to go read it because it's it's very approachable but uh the long and short of it is pretty simple um when you boil AI planning down to its core there really just a few things right you have States States can be as simple as basic proposition logic right so you can have variables X Y or meaningful names they can be true or false or you can give them Turner values give unknown put unnown or null in there um you can also be much more specific right you can make planning as complex as you want it to be so um a state could be coordinates in a coordinate system it could be temperature in a room it could be a color it could be really anything you want to uh to reason about right you could you can build it however you want so you've got your States and then you have actions so actions are things that you can do within the world that generally are going to transform one state into another state right so if I want if I want to advance the slide whether I use the clicker or the keyboard uh I take the action to advance it and now I've changed the state from the previous slide to the new slide right so it's just a transformation for uh for States and then you combine these two things together using logic to get these really complex interesting emergent uh behaviors so the way that works is you have your initial state which would be like the state of this room as it is right now you have a goal state which is whatever changes I would like to make to the room then I look at all the actions that are available to me to make those changes and I reason about how to execute from those actions to make the changes real and now I've transformed the state of the room to whatever I want it to be right it's it's pretty straightforward the uh the implementation that they did for fear they gave it an awesome name it's called GOP I like it so much that there's a dedicated slide for it there's no reason for there to be I just spent an hour with mid Journey tring trying to get it to make text and it was absolutely worth it it's like goop and soap put together really clean slime I don't know but I love it so uh GOP GOP is really cool and I I wanted to have a video to show you guys so you can kind of get a a feel for how Dynamic these really simple implementations of AI planing are the problem is when you take like the first person shooter version unless you're the one playing the game it's just a it's a lot of visual data to process right it's not easy to make sense of uh what I found instead which is actually kind of awesome is a some random person on Reddit he like a hobbyist game developer had been struggling to get AI to work uh in the hobby project that he was working on uh had done finite State machines had done Behavior treates had done all these classic things none of them were working really well and so uh this person discovered GOP and did a quick implementation and was just like holy crap this works really really well so they made a video and then they wrote it up on posted it to Reddit it's like you guys you don't understand everybody should be using this was more or less the tone of it it's easy to Google you