← All talks

GT - Know They Operator - Misty Blowers

BSides Las Vegas38:1131 viewsPublished 2016-12Watch on YouTube ↗
About this talk
GT - Know They Operator - Misty Blowers Ground Truth BSidesLV 2014 - Tuscany Hotel - August 05, 2014
Show transcript [en]

All right, next up we've got Dr. Misty Blowers from the Air Force Research Lab. Misty is one of the creators of the Security Conference for Genetic and Administrative Computing. Probably the first person I've ever met doing machine learning work in security. Is that where to go? Misty, you're up.

you have lessons.

there.

motivation, um, drive, and then this is a process that I can look at. These processes are similar

So the motivation of this research is, like I said before, a lot of attention is used to be used.

So, these are some examples. I'm not going to go into any crazy talk. I mean, one of them, but if I'm interested, I actually learned about some of these things in greater detail in a process of the testing of the UCAN, like LACCAT, and the testing of the test to do it with one. So, use wealth of information, even that process of the on it. How many people have heard recently about Operation Drake Slide?

So, I'm going to pick on Cessnet, although some of those other vulnerabilities along that video slide, you can see that as well. But back to the motivation, learns like Cessnet can spread multiple native communities and communities. Whether it was the intention or not of the developer's

So, the only 159 assessment is brought in as authorized and the

facility. There's rumors that it leaves dormant from 2005 to the middle of the year.

In order to carry out that is test, the attacker must have in-depth knowledge of the programmable electrical controller architecture and very specific client knowledge. And so we're going to talk a little bit about today how someone might walk into a mill or a manufacturing facility and being that type of infinite knowledge, in my focus, it just passes on

So this is an overview of the industrial process that I've been working at work. It's located up to New York. And then it's actually a container board for the . Raw material here is 100% which covers paper feathers. Now to think about that, this is an incredible amount of variability and amount of data. So that was very challenging and there was a lot of the character I continued with the process because you know, a week we also made it for 15 years to go through the program as well. One of the kind of unreported events that was training was there was a visual center where the athletes watched the program player come in and this is where the

that were recycling, um, came from Asian markets, they knew that Asianers would have a condition from. Because it can go, it can be used to, it can market to be bought, it can come from. They recycle a lot more than they do here. So, to capture that knowledge is important because when you're trying to characterize an anomaly of new time in that process, you don't want to put on the company right now. So, there were three different things there. It was semi-automatic control, and the operators monitor this process around the clock. It shut down for about 2 or 5 years or shut down to the end. The other thing I found is that it's not always the case that they're a calorie reserve on the schedule

that they're just a calorie reserve on the schedule. But during those cut-downs, we tried to use a lot of that quality. So, the reasons I had available for these were for most of the time. There was a kind of historian, a data historian, and he's coming, referring to his ears, and this was actually a UCF, which was limited time to the outside world. And it was World 3, I think. There's operator logs.

We didn't have that in the electronic format, but we literally had to deal with the data type care because we've come through information and it's coming to that patent is doing it. It's typically, you can also use some ruling and then the information you can do operator laws is actually checked. And then there's a management, patent engineers, the operator actually knowledge.

And then a very important part of the data that we incorporated in the model, I feel, is this online and laboratory product quality control message. There are online sensors that actually calculate quality of product coming through the system. That is important because if you see your system performing in a way that is characterized as good state of health or optimal quality, but you're also continuing to use the regulations, quality. That might be an indication of a complication . And then I'll talk to you about software tools I use that you must reveal on

the neuron lines that are coming to the end of the day level and we do it actually and then the three different semi-sensurized learning systems. The semi-sensurized learning system can also be described by some as a data mining platform because it really goes through every step of the way of integrating different data types together, extracting information from that data, sending that information to the learning system which we've explored multiple different learning systems in labs, talking a little bit about . I'm going to talk multiple things about .

And so, this is a dump that came into the mouth. Kind of intimidating to

go down and look at this and have plenty of developing models and take the raw material. This is a simplified process diagram. As I mentioned earlier, this is a DTF system, which is a DTF system, if you find a patient that is used

because it is contained at this location. However, there are vulnerabilities. There are at the same, as you do, at the same time. And my prior job, I worked as a women's entire pre-sivant team. And I actually advocate I had access to middle-town country to kind of look at both of these and see how they're, you know, talking. I couldn't, I wanted to, because they're a factor of looking at these. So I love this picture here. This is really, um, concept. trying to extract physical knowledge. The planning historian declares information from all the different countries across this process. We initially narrowed a down to some new principles, and this is by least master knowledge and engineering. So, and, um,

master knowledge and knowledge. The granularity of this collected data could go as well as, uh, four pet areas. And so if you think about when you're changing your models to characterize the behavior of these learnings, if you have teachers with the data, it works like an ancient, revealing quite a lot of data. So I use human data, some use the spectrum, five percent that I found to be a young person learning and I can do it. And then, of course, the operator learning that I found. And I saw a lot of things. And there were a lot of things that were not happening. There were some things that were lacking, but

they were not to think that it would work as a couple of things. There were some things that could have been upgraded to be a little bit more effective because we do some variability in the process, but there's so much time there's a process right now, so why is it hard? So you can set something, so you probably won't do it. And the rig and the loo time, you have about two hours of time from all material pumping until I'm part of the space. In this particular model, I was using information from multiple points across that interaction class. So if you're thinking about something goes wrong with the prompt on the furnace, then you have the ability to allow a

buffer of storage so that you can turn them back and fully get the output on the furnace. But now you've got this continuous disconnect between When does this product that is in the storage of this pain fully be applied along the way? So if you're trying to build a model, you really need to take out the trouble and the pain. And the operators are simply aware of different disturbances that are not detected by these sensors. The operators know, for example, that they may go out of the plant, they may do a watch now, that they're going to get this to see that when our plant is planted.

So not to last you to really give you a start here, but with every control loop, you have an output, a key chain of the LMS. You have the process value. You have the step point. You have the status. One of the things that I found interesting about sometimes was when you adopt that status as agreement value, It really says, do I have this manual operation, do I have this automatic operation? But because of the way Sucnatch targeted that programmable logistic controller, it really got into that ladder logic. I've heard that we have a manual operation, so that's why we feel that using information in multiple settings, a caution process, Combined with information about the quality of the product and the research, the effectiveness of the

ability, as the ability to use the type of technology. So again, if you just feel a bit more about the tutorial alignment and area of time, then you've got a very good set of time. So variability is found in,

and it's not just the favorite student of Penn, but I understand that in the children, you might have different blends of pre-olism to really try to optimize the process of test science to learn more and, although it's probably a little bit less free than

And variations of the operating equipment in parts very early. Faulty misdemeanor, faulty sensors, people not calibrated and looked good. We're going to talk a little bit at the end of the talk about operators response. And you go from one shift to another, the operators might respond to you in a way to a different event that's going to happen on that operating floor. So let's talk a little bit about machine learning for baseline needs like . Supervised learning, they never use any . And then unsubscribed learning, they talk a little bit about this on the class class class. So in this phrase learning, you have labels. If you have a kind of character of the car, for example, this is a good example

because they're all the same type of car. But if you use different type of cars, you might see different features. The leads, the cars go 169 miles per hour. Another set of features allow this car to go on 150 miles per hour. And so then when you're presented with an unknown vehicle, it can predict how fast that vehicle can go. Unsupervised learning is learning by a person. So you don't actually give the labels. You just allow the learning mechanism to decide how . So

supervised learning is used to trace work with a variable. We actually were able to identify for these kinds of points this man practices like the very well in the professional process. There's some, um, falsely equipment that involves some, in turn, following the world in the English itself. And the self-worth of the Raman, which is one of the things that we think is what we used to have to do with it. The first part of this was actually taking the different self-report across the process and correlating that time when the state of the alphabet chapter process is four. to do this is a correlation between the output. It basically looks to see for all the different variables

that you're considering, if you're trying to predict this output, would those variables affect the output variable for that? And then if you train your artificial neural network and you have

So,

The investigators attempt to find guidance and ask for the usability, ask for the usability, and in this case we have to be identified successfully where if they rethink that with a more sensitive subject they need to be the usability and the other part is a lot of the bottom line of the value. So characterizing the behavior of the product itself. The software says about And what's interesting about this is it allows a pure event adjustment of health warming. So in a lot of these cluster analysis techniques, you see almost uniform clusters develop. And what we do here is we decide, okay, we'll cluster in the time when we know that the state of health of the industrial facility is warm.

For certain areas that are characteristic of events of high interest, we actually adjust threshold bounds around those clusters so that we're releasing the fossil arm range and we're in some cases instating to detect them for certain event spaces. So this is our picture of Maggie's in the Philippines and we actually have a patent on this. to be able to do this to do a prioritized detection of events because of the stats on the data. And why is this an advantage of the current day of the art? The current day of the art has thresholds down at a specific and perfect point. And when you're doing the construction of the particular process, you might have an alert to causing whether

A single point across that process has exceeded a threshold bound. It is a combination of the different locations across the process of using the history of the whole. It is more global, better than localized view of the amount. So we're going to brief this down into this. But this is basically only to go over to the . We have the operator log, we loaded in, simultaneously with the painting data. The painting data is from the plant-based people's cells. And why did the semantics write it? You do have it labeled to say what is the state of health normal, and then which of these, which of these data set represents times when the state of health is a specific process for. And

along with that, who have information coming through operator logs, which lines with different timestamps, in which the operators have reported information about what they've seen happen when there was a value in the process, or was there a significant event occurred, and they give their account of what they believe happened. And that gets voted into the proper process. All right, so let's go through the .

correlation in all things. We use statistical measures to look at way these different features. We use the census points across these mills compared to the alpha variable which is going to be data helpful to process. And we found it most helpful and mostly robust to use something from biology. from biology they call it a QTAS. And what you're basically doing there is you're looking at comparing the two classes of information. You're comparing when the state of health is good to when the state of health is bad. And you're looking for the minimum and maximum bounds in each one of those classes. And then the QTAS will actually see how many times does the bounds of the one class exceed the bounds of the other time. If

I do have a few, you can have your variables. You start to see which variables are provided with the most class operations. So, let's listen to the data mining. What features are the most important? I'll ask you to talk about control and control in the analysis. So, we tried that and did not find that to be as helpful and as Robop. That's very low in the system as the two classes. And then I think the question came up from the last presentation as well about the pay, how do you determine the number of clusters? We had another component within our software that we call the pay evaluator which is the will-to impact of that cluster and run through the data that

to see how many optimal pesticides you can have. So although I couldn't say it's not defined yet, this is something that a plant operator might use. You see that they're presented with what we call a key relative strength. And what this really is telling you is the teachers that we've been

to be able to make a prediction once they're in the best of the test is going to happen. And what we think is of value to the operator in the particular interface is that the operator can select the top ranking features, but they also can select maybe a point down here that didn't rise to the top

So that they can explore different possibilities. So again, this is some nice exercise learning. We get these initial threshold values for custom classes to make by doing an optimization, a multi-advented option method, where our initial threshold values look to minimize the number of false alarms

So if you can imagine this in mind, we're not just mapping into the small pieces of the system, the time from the state of health is poor, we're also mapping into the time from the state of health is good. And so this is all taken into consideration of the threshold. So this gives a little bit more,

You see, we've drawn rogues around the different clusters. And we show the pie chart to kind of depict the fact that we have a mouse over any particular point within the clusters to tell you which pieces you're selected, so that's where the point, where those pieces of values were. And also, these pie charts are kind of an activation on a per cluster that you use. So if you have a cluster that's forming here, you have cross-referenced using that you've gotten from the operator to make this pie chart. And you can go in and you can look and say, OK, this is the when the state accounts as the current state is, this is what the operator reported as being mentioned.

So, as an end user, you might want to expand the thresholds around that particular region. Because maybe this is super important to you, whereas this region, yeah, I can't already use that that's something that happens and I'm going to minimize that because that's not the end of the function.

So, maximum function is that a third of the software. This is a key-means algorithm. It does, through random initialization, and slowly move the means of the clusters until you get this optimal result where the closest points of frequencies are gathered around the mean values. And again, these clusters in this demonstration here

So again, back to the threshold of application and the selecting data and seeing clusters. The validation staff is performed. The data is assigned to the class and assigned to the class. And that's when we adjust the actual balance. So right now, again, we need to help us out with our interfaces a little bit because it's really much easier to tell me. But right now we have slide rules. And for each cluster that's formed, you can actually adjust the traditional balance in a full slide rule. You can double click.

and pressure number itself which brings up that vision engagement that we share. So incorporating near-threat knowledge was significantly important. It is the operator that calls about it is per event after listening. It's how it really is associated with the positive to the terminology and allows you the analyst to back to the cell phone to the special balance. This is found to be very important for the unneasier to send you to the test of the test. And this is currently with the alpha, especially with the state health. It gives you the event you heard, which is paused by events. and the summary of that product. So in these test scenarios that we have here, these weren't a result of a cyber attack. They were a result of multiple things

that the process is to take the process out. In order to actually simulate a release file from all the copy of the cyber testers, we would actually have to have that data.

So, you can't use entire threat, both intentional and emotional. Characterizing after the talk, we've developed in software tools, determine if you offer your actions, and you have influence given the way So we looked at the different tips, we actually developed some algorithms to be able to help us automate this process. And when an operator actually sees the data from automatic to manual, we could see what actions they could and whether or not that caused an event to make a process. So there are a few little things to do, especially as you do find the system. There might be an operator out there that, you know, needs a little bit more. So in terms of this, this is an

area called to establish boundaries for behavior-based and special emergency system. Tools like this may help to prevent those different interventions in the future. and it's a crucial method for reading after reading the social practices. It allows the center to spend years and localize the total of these schools and gives them variable insight to the

I've had two years worth of plants at Cain.S.

So, you need to have, in order to respect the school, you need to have some historical knowledge to build the model. However, you can't have a key stack of you. which we don't have in the process, we divide a lot of human authentication in these models to help us determine how these clusters were adjusted depending on what their

So that's why I say that quality libraries incorporated into their models are important. So if you were to do something like that in this type of environment, it's not what it is, it's environmental

So that incorporation of the online quality drivers of the actual period of the students become is important to include in the values and to define the development of the users. Anything else? Yes.

So that's why, again, the quality measures are important. And the class ref is there. The class ref is there. The online quality measures to use offline quality measures that you can use in the lab course. So there's going to be a lag there, right? And in the amount of time that you would actually detect that you would have to detect the cost of going off But what you're saying is that people trust that and the fact that they look in that programmable logic controller is giving the information it should be giving. It's giving the accurate information that's back to what it should be doing. So that multi-tiny observation across the process would help We use the

version of something like that with some training to do. And also providing college and I can do college. Did I use the phone? Yes. Have I used what? No, we have not done that. We basically had that would be like an extra-level research to have someone .

Now that your research is publicly available, do you know many of the locations that are located in the area? There's a lot of bugs we need to work out before we can't like hand this off to the industrial system. We work closely with this film till today. We actually have a, after starting up, the lab is using a lot of their technology to use this paper. So we actually have two students that are going to look at basically working on the process of these systems and the AIS . Thank you.

Thank you.