Automated Wargaming Of A Chemical Plant

Name: Automated Wargaming Of A Chemical Plant
Uploaded: 2024-02-06
Duration: 48 min 1 s
Description: A researcher with dual expertise in metallurgy and cybersecurity applies reinforcement learning to simulate red-team and blue-team engagements against an industrial chemical plant control system. By training discrete and continuous agents in a modeled testbed with 12 manipulated variables and 41 sen

BSides London · 202448:0168 viewsPublished 2024-02Watch on YouTube ↗

Tags

DifficultyAdvanced

StyleTalk

About this talk

A researcher with dual expertise in metallurgy and cybersecurity applies reinforcement learning to simulate red-team and blue-team engagements against an industrial chemical plant control system. By training discrete and continuous agents in a modeled testbed with 12 manipulated variables and 41 sensors, the study reveals that adversaries can force plant shutdown in under three minutes—faster than operational response times. The work highlights gaps in anomaly detection, the challenge of multi-parameter visualization, and future directions including stateful neural networks and particle filtering for threat prediction.

Show transcript [en]

thank you very much um as you say welcome to the last Talk of the day well done you've all got through a lot of talks I'm sure um yeah I do have to temper some expectations the r start of this it was only ever a model of a chemical plant just in case you ever read that and thought wow I got to play with an actual plant I'm afraid I didn't they wouldn't let me I did ask but such as life okay so who am I so for the purposes of this work anyway I'm a doctor researcher at Brunell University um that's fancy way of stting postgraduate students that's kind of part-time now because my full-time job

is a security Analyst at threat Labs we have some of my colleagues at the front we do um red teaming sock pentesting All That Jazz it's very fun come talk to us um and in my prior life rather than being um doing computer science stuff I actually used to be a metallurgist at a chemical plant so that's where this has kind of come from in a way it's informed some of the studies but I also likes computers so that's where I made the move um yeah just FYI if you think the job of metallurgist is hard for the most part it's not you look at that and say that's dust that is there you go there's

your degree um that said one kind of Link there is actually quite a lot of um materials for that goes on in these things you might think everyone everyone's kind of nice but if you start looking for fake stuff in expensive equipments I guarantee you'll find it so if you're thinking a career change the other way something to bear in mind okay first some history because this is my talk and I want to do it and because I think it will be useful in informing our current approaches so really this whole thing started with the indust Industrial Revolution we worked out how to utilize coal and steel and steam to replace manual labor that was a huge thing um I

feel like we don't talk about it enough but there you go um so we did that and then swiftly after that we went even further we invented um flammable liquids we invented the processes of moving them around and using them we started with Baker lights and other petrochemicals we started with combustion engines and yeah as you can tell that's completely revolutionized Society but it did come with a downside and I'm not talking about carbon dioxide for the purposes of this talk I'm talking about the general concept of risk so with these things you do get some fairly funky interactions um my specialty was acids rather than um hydrocarbons but you do get aromatics which do God knows what um you have very

interesting atmospheres that before all this stuff we just had air or hot air now you have like 100° Vapor phase equilibria of acetone and other weird things and working out what that does is hard um and if you really want to scare yourself become acquainted with the concept of a boiling liquid expanding Vapor explosion which is when all those atmospheres start to meet each other um this little picture in fact isn't uh blev uh anyone know what what that was yes how close were you at the time yeah yeah it was quite fun I mean I was a few miles away but it's still underneath the cloud so yeah previously when all this stuff was done with manual labor it

couldn't go that wrong when we introduce all these chemicals and all this hyperscaling just like computers really it has the potential to go wrong at a larger scale the interesting thing however is for all that for all the amazing number of ways you can die in a chemical plants they're actually really safe so the place I went to the biggest haard by a long way was 50 plus year olds spontaneously dying of H attack so I think that's something to be proud of so in relative terms so how did we do this well we start out with control theory this has analog or Origins um with something like Maxwell's paper on governers um this is a fairly simple

device it's just a kind of ental fial clutch type thing um if you're device is spinning faster the balls go out that moves some Springs and reduces the inputs that's kind of our first automatic um negative feedback thing um possible fun entomology facts that may be the where the phrase balls out comes from take them out you have no feedback so the governor's just going full speed um that paper's actually criticism of this mechanism but never [Music] mind however in the 20th century that's where we really start to get into the meat of it and it's where it starts to become relevant to our conference because we start working out that these computer things can do useful stuff so

we started talking about llast transforms and complex numbers um something called an increased stability Criterium which um anyone got an idea what that funky looking neist plot is oh that's the pitch response of an F16 fighter jet um so what we did is we just thought of it like another plant that has an input and outputs and if you know how to read these diagrams you can say whether you're machine is stable or unstable um you can then do all the really complicated maths necessary to keep it stable on something like a PLC great um we discretize things we have some um advances in Matrix based control theory we have this great paper in 1978 called doils on

guaranteed phase and stability margins um I don't much like the actual paper but the abstract is great it is there are none it's basically saying you can't prove safety in the concept of arbitrary noise which I would have thought intuitive but there you go 1978 is when we proved it so we got all this stuff and then once we have these plcs we start integrating it into the higher order business processes so these things will maintain a set point but sometimes you might want to change that you say you um it's winter we need a different fuel formulation or a Refinery somewhere else is over producing or under producing so we need to coordinate that across

multiple sites that leads us to integration and the kind of platonic ideal of you may have heard this in the talk earlier the puu reference architecture so it kind of looks like the OSI model in a way almost maybe um so you start out the very bottom with actual physical devices sensors and actuators things that move they might not even have computers in then just above that you have fairly lowlevel things plc's um Safety instrumented Systems which are always fun um base stations things that probably are running real time os's and doing a fairly limited subset of things in bite code and they're just following higher level orders you then have human machine interfaces and Engineering workstations

that help to um present this to users um if you're interested that's not a mistake there do seem to be some disagreements to where engineering workstations should go these engineering workstations hmis they're where you might start to see the transition to you know just like windows boxes um and things that you might consider within the normal realm of it but because they're in a plant they'll be at least 15 years out of date I feel like that can just be taken as red now anytime people say o to your Control Systems they go ah so it's 20 years old um and yeah and then as you go up you go through a DMZ and then into your

business processes and the kind of wider cloud and squiggly stuff okay so what should we also do as current defenses for safety at the high level so you may have heard these terms again before if you were in the train talk um this is UK specific regulation obviously different jurisdictions have different names for it but they tend to be kind of the same thing they're basically saying reports to the government um have a process to enumerate your risks much like cyber risks really um we do something called a hazan which is just very methodical for each parts what happens if it does too much of what it's supposed to do not enough stops entirely and honestly a couple of other

things that I forget but it's going through every single component and then we do something called a safety case which is a really long document um 10,000 pages and 10 million would not be unheard of for these things and all that is a piece of paper that's arguing why you're safe to operate so these are kind of legal instruments um one thing I do want to bring up is for that you can in safety talk about something called The Swiss Cheese model which is basically saying we have a layer of Defense here and we accept that it's not perfect but beneath that we have a second layer and for accidents you kind of need these

metaphorical holes to line up I know it's a bit abstract but yeah we what we're saying is that all of these different layers of defenses static defenses rather than intrusion and respon type stuff can still be useful um that to me is the interesting point because that's different to security where if you just have lots of seic defenses that might slow your attacker down but they will learn how to get through all of them um yeah cool so after that I'm just going to go through a couple of historical attacks and incidents sometimes one that turn was claimed to be the former turns out to be the latter that was actually quite common um these will be necessarily brief it pains

me to have cut a couple of them but such is life okay so probably the most one of the most famous in ic circles that we hear about is maruki that was um just a sewage plant in Australia and it's a very classic case of a contractor who was fired they didn't revoke any of his keys or tokens I say so because I think this was pretty certain this was telet so it's 1999 um so yeah he logged in started kicking out legitimus operators opening valves really nearly and just wrecking Wrecking the place um in total there was about 250,000 gallons of sewage released directly into the environment the fine then was $600,000 Australian dollars um so it's probably

about double that in total external costs to society and you'll have to adjust that for $223 so probably about Triple L okay the most famous of course no talk can be complete without is Stu Nets so this was a definitely Advanced position threat actor famously got through an air gapfire using infected USBS um they also if you've looked at the source code they Implement a lot of safety checks to make sure they were only targeting um Iranian plants particularly the the tant plants um so once it got in the end effect the thing I was interested in was how it did damage and this is a lovely bit of physics what you do is you cycle these

things up and below so repeatedly past their resonance frequency at the same time you report to the control room that everything's okay this does seem to be quite an interesting Communications difficulty with other chemical Engineers don't seem to get the idea that that's if someone has arbitrary code on your PLC you can't ask the PLC and expect it to be truthful it'll just say things are okay um yeah that's a practical tip for you if you're actually talking to chemical Engineers that does seem to be one of the difficulty points um but yeah whil this does sound very um scary and dramatic and in a way it was one of the key lessons from this

I think is that human operators were present in the actual rooms where these were and they noticed the side channel of the frequent changes in noise so he said I don't care if your thing says it's fine it's clearly not and the end result was whil this slow the it richment pro process it didn't actually stop it so it's actually arguably it's pretty debatable whether um the ACT achieve their goals in this one let's move on to the oil sector um we had a big refinery in Saudi Arabia infected by Triton malware in 2019 this was quite scary for me personally because the safety instrumented systems were directly hit and there's something that in comparison

to regular plcs really doesn't need to talk to the outside world all that much um yeah the interesting thing about that was whilst that happened whoever did it didn't bother to put in the payload um we still don't know really of course I'm only going off the same public information you have but yeah they went to all that effort I mean you can take some guesses as to who it was and then nothing happens um another great one that made the news of course in the oil sector was the colonial pipeline that was actually only the it being hit but and here's a very common thing here the in OT operators will be very sensible and they

will say if we can't guarantee that our stuff is unaffected then of course we'll shut it off that sounds good but alarm fatigue is a thing and that did mean that this had a real like um eight figure roughly cost that could have been avoided it's still the right cool but yeah okay so moving on to Steel um some people have decided this is into a chemical plant I disagree um there's some interesting and conflicting reports there was one that was supposedly a German steel mill hit in 2014 um this Source came directly from the German government and or a subdepartment of and if I'm honest I think this may have been a translation error because there's no other records

of it there are no private records um and there's no obvious dip in output from Germany at the time which you would have expected the report is great it reads like a great description of how you would attack the plants but I'm not sure it's I think that's what is actually was um however various steel plants in Iran have been hit um this is a steel from um something released last year by um group predatory Sparrow um they quite nicely waited for people to leave the room and then they dumped the entire contents of this steel Billet on the floor so that will be fun for the operators I mean that's probably I honestly couldn't

put a dollar figure on how long that's out of action for and the cost but it's going to be a lot okay finally water um so again this is another concept of an attack that probably wasn't an attack um this was SAR there was um an apparently open team viewer session that was just accessed by an unknown actor and the um sodium hydroxide concentration set to a very high value honestly from later reports for a start just because the set point was made very high doesn't necessarily mean the system can actually achieve that effect um the calculations I did You' probably only be doubling the concentration and it would be nasty for the 50,000 or so people drinking this

water but not like the oh my God everyone's dead thing that was going around the news at the time um to pour further cold water on it as it turns out that was probably almost certainly a legitimate operator who was momentarily confused um we did have a slightly more recently have some activists defaced some plc's in a pumping station in Pennsylvania this was a smaller plant only about 3,000 people affected and once again actually it was a success uccessful case of reverting to a manual backup so there were people there checking um the pump rates and so forth without using the PLC it still worked it was fine so yeah so um what next in this um field

so we already have some um systematic desktop exercises and talking to Regulators what I want to do is complement these with probabilistic method so what we can do is try a lot of different situations at the same [Music] time um and yeah go beyond this swe cheese bottle because like I said if it's an attacker it's not coincidence they will deliberately find the worst ways through okay so now on to the meter it's some automated War gaming so I'm afraid some maths has to happen I've tried to keep it to a bare minimum but it's some so if you're a Bunning researcher these days and you want to do a computational method what are you going to do you're probably

going to use machine learning right um in particular I'm going to use a subtype called reinforcement learning now this one's interesting because it doesn't have a static data set to learn off instead what we have is we have a mock environment so in our case that's the chemical plant and it will send to our agent which is the thing that's doing the learning and trying to control something um an observation and it will also send you a reward which is just a scaler number higher rewards better the agents because we're academics we like Greek letters implements a policy that we call Pi that just takes that observation sometimes called States it will vary and says given this state what

do I do so it's just basically a giant if statement and turns that into an action the action sent in the environment the environment computes a Time step what the agent wants to do is it wants to maximize the long-term rewards so you often get this criticism of wood and agents um do the best thing even the best thing was figuratively jumping off a cliff the answer from a naive perspective is yes that's why um what the agent will do is learn multiple runs of long sequences of rewards and it will apply a discount Factor say what I really want to maximize is the reward I get now plus the reward I think I'm going to get next

plus the reward after that with some form of discount factors say wait the ones in the future less um you also have some form of noise term just to make the agent occasionally do random things that's just to let it explore what it it can do more okay so here's my setup we're going to go for an assumed breach of a particular test bed um the extent will depend on the scenario I'm going to do multiple scenarios it is I'm afraid going to be protocol agnostics so initially I did actually put in some mod bus emulators um but they don't really change the results because um you just get the same conclusions of well the

attacker can do anything they can do anything regardless of whether they're using mod bus or sematic or similar so it's just a maintenance headache so I took it out um so yeah we're going to through multiple scenarios with this test bed that's the test bed it's actually believe it's or not a fairly simple plant um it is kind of made up it's a very common one to use as an example of can you control the plant in general um but the internal chemical details don't matter all that much mainly you want to look at these um three main vessels the reactor the separator and the stripper um again what they do is actually not particularly

relevant but the levels and sit um situation inside them will be key it's got 12 manipulated variables so that's the things that you can alter and 41 things that you measure so that's your kind of inputs and outputs and what you're trying to do is produce a lot of G and a lot of H again these are actually made up products but they're roughly speaking an organic chemical of weight about 100 um the other last thing I want to point out here is my version actually already has a traditional control system implemented with plc's Okay so this being War gaming we can't just have one agent we have two um so actually conceptually that's probably

one of the easier parts we just go instead of the environment sending its thing to One agent it sends two observations and takes two

actions and we start with our scenarios so I'm defining those as what can the blue team do what can the red team do and what does the red team want to do so we are trying to mimic some different threat actors here and different levels of blue team competence um The Cheeky bit is this capability also has a huge impact on how you would program these agents I'll talk about that in a minute um so yeah for the red team the intent is going to change what its reward is for the blue team it's always going to have the same reward it's going to try and keep the plant operational and running and producing and yeah the red team by default is just

going to oppose that so if I don't say that's what it's trying to do but it can have different sub goals instead okay so action spaces um okay capabilities so there are two different ways to program these agents there is a really simple way which is saying I've just got a menu of things I can do for each time step so in this case I can reset a PLC and run it as a m kind of manual backup for an hour that would give the sock time to you know remediate a smaller infection say or I could shut down the whole plants incur a huge cost but hopefully give your it guys enough time to completely remove an

infection or I could just do nothing which is what you'd like to do um the red team if they're a fairly low capable actor they're going to be doing obvious things they're not going to be be subtle so I either going to change things to high values or low values um I've given them the ability to alter any of the measured variables any of the manipulated variables or any of the set points so this actor is not sophisticated but they do have pretty much full access to the plants um for a more sophisticated one this is where the blue and red team both have like fine grain control so they can continuously change a variable make like

small changes rather than massive ones um the blue team has full control of the plants the red team does not but it can spoof signals and it can change sh points still um yeah the reason for doing splitting it this way is again the discrete ones I've called here um AK kind of low capability fairly basic ones when when you just have a manyu things to do that's fairly easy to program because you just go well what's the value of each of these options great pick the one that has the highest value right easy um The Continuous agents require something called Deep deterministic policy gradients it's a form of gradient descent and I bang my

head against the wall for hours trying to get a slide that could explain this well it's very difficult so we're just going to accept it for what it is for now if you want to have a go programming those you're welcome to try okay we're also going to do one last variant for The Blue Team where it the blue team will be assisted by its own process model so hopefully what it can do is instead of learning from the past it can ask this twin what is going to happen over the next 48 hours given the measurement that I've just seen and hopefully reacts based on that okay so finally the reward functions so um overall the blue team

will get some fake money for producing stuff provided it produces a product that is good so it has to be within a certain balance of chemicals it will pay for utilities that it uses during that time it will pay environmental penalty if some of the purge that it dumps into the environment has I've arbitarily decided that um G is going to be extra harmful um so you get a penalty for that um you also get a fixed cost for the CO2 so blue team is trying to optimize for all these things but for the most part it's probably not going to outperform the traditional control system um I said the red team will just have the inverse of all these things or

a subtype so the red team is trying to say don't produce anything or produce it out of spec that turned out to be quite common or shut the plant down or ideally if it can over pressurize the reactor at which points you would get a huge reward for the amount of damage done okay so this was the initial set of hyperparameters I used for training these networks um I think that's the latest sets as well it might be a couple of generations old unfortunately this part does seem to be mostly trying and error still um once I will say you can get good results even with a fairly small Network or at least what you'll

find is um networks will often coers a lot of their parameters to zero so just adding more layers does not necessarily help okay so on to the results so this is our low capability scenario um this is before training so as you expect at the start they're just acting effectively randomly so they have this noise term which will Decay as training goes on they're just exploring what happens here they are um baby fresh agents they have no knowledge of the world they just acting randomly and seeing what happens um what happened in this one is that the production was out of specs so the blue team has gotten essentially no reward here um apart from occasional

spikes where I guess it must have drifted inet in Spec um the control system was foed so the control system had its set point Changed by the red team and went oh I'm supposed to make the stripper level really low and so it does so um in this variant it actually seems to commonly fail by draining the reactor in order to do that um but it still manages to fail within about 200 seconds which is about 3 minutes which is frankly not enough for a um incident response team to do anything about it um a common feature we've seen actually is this blue Trace which is coolant so that's um basically gone to zero so very frequently in these

scenarios the co reactor coolum was also switched off okay after training the summary of it is the red team one um so at the start we had it going down in about 200 seconds now the red team can take it down in about 20 that's definitely not recoverable um so it's actually using the same sort of actions in particular it was changing a set point for the reactor level and saying to the control system you should drain the reactor the control system then does that that's easy enough again it changed the coolant so it decided we shouldn't have any coolant either um unfortunately for The Blue Team all this agent learned to do is basically to do nothing so it's

actually at the very bottom here and it's choosing an action that is very safe so it won't cause the plants to fall over but it won't effectively responds to a red team activity either um that said the blue team does manage to produce some products for 20 seconds this one so it gets a small win I guess um one thing I found is that sometimes the red team would go hey well just even if there was a situation where I had no blue team involved at all just the red team so this was something after training it was supposed to take the plant down as we saw previously um but in fact it's managed to miss that

strategy and and has just it's done random Haywire over here but it's not really been very effective at taking down this plants that's a shame um I would say it has managed to burn a lot of carbon here I think so it has been rewarded for taking away a lot of the operators's money and producing no products so that's kind of a good goal if you're red team um cool system large trace this is actually the same um not quite the same one a very similar one I just want to show what was happening to the control system so again this had a set of plc's and mimics plc's and this error is saying how far

away from my set points am I so what we see is when the set point is changed the error suddenly becomes massive the control system freaks out and basically saturates that's predictable for the um one but was interesting when we get on to the continuous one um the other interesting thing thing from this one is we started seeing a oscillation in reactor pressure at around about 165 seconds and that was constant across a lot of different um scenarios so that's not something I think anyone's everever seen in this test bed before okay so we moving on to the continuous models um so what I was trying to get these agents to do here is find ways to um have a big effect on the

plants by doing lots of little actions together and particularly one thing I was trying to mimic was fatigue losses so in a real plant what you can do is just turn a valve up and down a bit continuously and that will wear it out eventually and that's great because a lot of the current defenses say things must stay within a certain range right they must stay Within These safe space safe um States but they can transition between them as much as you want and that means yeah that's a potential Avenue um to evade current defenses so I thought might be interesting to see if the red team can learn this um one thing we found is that whilst the red team can

do anything um can do every single action simultaneously these agents really seem to collapse on doing just one thing um I'm not sure why that is exactly it seems to be quite a common feature of neural networks but there we go um so yeah this um was pre-training it's still mostly noise and the red team manages to break the plant in 80 Seconds which is quite fast um what it didn't really manage to do is learn how the plant works so this dotted line over here that's um roughly speaking a loss it's saying how accurately do I know what the plant is going to do next lower number is better as you can see it start

it's actually Rising so it's becoming worse predicting the future um after training it does settle down but it's still at like 10 to the 9 which is quite a high number it's arbitary units but it's still High um so that showing there's still quite a lot of confusion in this agents so if you're good at programming your networks talk to me um if you can get a better one great um so this one learns just to do one thing it still hav managed to completely overwhelm all of the control system just with this one error just with this one change so that's um The Purge gas fraction which should be a fairly minor thing that the chemical

equivalent of a blue team wouldn't necessarily look at but yeah they managed to kill the plant with it um again we start seeing an oscillation at 165 seconds never seen this before in this plant but it's quite cool and then finally the version I had that involved a twin um this is pre-training so it won't be very good yet um The Blue Team won't be again we see the same oscillation um which seems to be linked to the loss of coolant it's saying even if there's nothing going into the reactor if you lose coolant I guess you will start this oscillation um yep and one thing we found is that was the um the total loss so the inaccuracy of

this agents um blue team agents was slightly lower than the fully continuous one it was still quite high but it also often tended to immediately Spike just before failure so that's quite a good indicator in the future that if you say if your twin is no longer accurately predicting what's happening that's good of saying things are going badly right normally your twin should be making accurate predictions okay so my conclusions from this um most of the red blue engagements with the exception of the low capability ones actually converged on a shutdown in about 165 seconds so 3 minutes that's not really good enough for a blue team they need a bit more time um I did get a few that extended

that to about 10 minutes that might be enough to do some things but probably not enough for a full it response um discreet engagements could be even faster the things I learned the red team didn't pay attention to its reward function all that much and The Blue Team mainly learn to do nothing so these agents could probably be a bit better um yeah like I said if you're interested in programming new networks talk to me and finally visualizing 51 parameter control schemes is a real challenge um we did have a live updating version of this one but unfortunately papers need static things and even managing that as a dashboard there's still a lot of information um which is probably why we

normally um subtype it and have teams doing this sort of [Music] thing okay so for future work I'm looking at particle filters which is a fancy way of saying do all of those twins in parallel so the way that works is instead of doing one you run a bunch of twins at the same time and then you do some um probability logic um some patient stuff to go what's the most likely explanation what's the best fits then assume stats and work based on that prediction um I'd like to integrate some better prediction engines into this so for example all of these newal networks were stateless again that's the thing that's um I think would be a great Avenue for

further research because real chemical plants of course aren't they change States slowly in particular if you're trying to detect an anomaly you can have a piece of logic saying alarm if you're outside of these ranges but in reality a plant that's oscillating wildly within those ranges is still alarming and indeed probably you're having spoofed data um i' be tempted to integrate more layers of the puru model particularly alarms because if you go to real plants if someone is if an alarm is blaring you don't go in that has a real impact on operations and also that leads to alarm fatigue which I'm sure at least some of us are aware aware of maybe integrate some of the more human reactions So

currently this was just like defaulting to an open loop control which is fairly primitive humans could could help that a bit better so I'll be interested in integrating that and finally some more complex plants like particularly the vinyl acetate model which has been used as a test bed for security before okay that's all thank you thank you and are there any

questions thank you um thanks for the talk um I just have a question about um do you think that that any of these models could potentially accurately predict any Targets that any threat actors might have within a real life PL and and you think that it could be maybe Incorporated with a lot more variables like you said earlier that sorry could you repeat the first part of the question oh yeah um uh do you think that that the models that you created could accurately represent um a threat actor targeting a plant and do you think that it could potentially um reveal like a potential um aims and targets that that that threat haor might have for the plant in

particular um I think I hope so so my main goal for this was to um make the test sped right so the um it's based on open AI gym actually and it's um yeah just adapted to do blue team red team stuff I think open a gym themselves have done some odd things in that space so I'd have to recode it but such is life um potentially yes with a good enough neural network um again that's why I put these in as I hope they're good proof of Concepts but you can basically spend forever tuning neural networks um does still seem to be a bit of a black art so I think ultimately yes they could but as

with a lot of um machine learning type questions whether that happens in two years or 20 I couldn't tell you that for [Music] sure yeah yeah um there's a microphone coming up thank you uh and I I do have a question because you're using reinforcement learning if you could um for example in your method is it possible to apply um multiple agents on both sides for example blue team two two three blue teams competing for the reward to prevent the passivity of the uh blue team and for red team also potentially to prevent the fact that reinforcement learning sometimes just picks one thing and just keeps repeating it to get the the same reward because it knows that

the same reward is going to keep going it definitely falls into the Trap of yeah repeating the one thing that gets a reward yes um I interest interesting idea I'd say I wouldn't do that for the blue team what I would be very tempted to do is have multiple State prediction engines in it so I'd have the same um state to action mappa but with lots of things saying what do I think the state of the plant is going to be what I think the reward is going to be um and then you could do something either picking the you could do something with some Vasan maths in it I'm not sure what to pick the most likely of those that's

that's basically what the particle filter would be um multiple red teams is even more interesting I I suspect that would just be even more

chaos thanks so um these systems that you're setting up are pretty much at the level where an attacker is already in the system and is able to take these actions so to sort of use that metaphor for an IT situation this is when the attacker already has domain admin and is about to execute the ransomware pretty much yeah these at that stage there are ransomware mitigation techniques but they're really a best effort situation so is this model you're looking at to try and work out best effort actions or is there something different in the sort of OT and Below model that means it's it's better than best effort um yes I I'd say that's roughly

correct yeah I I would have said when I was going into this I could probably tell you at least some of the results which are yeah if you've got the equivalent of domain admin as we saw the red team has a very decided Advantage um we have some very very very rare runs where the blue team can respond and take the plant to a whole hour so we effectively decided if the blue team can run it for an hour that's considered a win um yeah that's not a long time in chemical plant stuff really um I did also have one implementation where the red team had control of just one PLC um but yes we abandoned that after a while

I'm not really sure why in retrospect but yeah that might be an even a good um test bed for assumed partial breach hey um thanks it was a really interesting talk and really incredible what you put together um so the neural networks that you used were like quite shallow and quite small um and there were a lot of like optimized for reinforcement learning networks out there and I was just wondering if that there because the networks were quite simple that what you end up is that chaos ends up winning because the networks's all so simple that it's harder it's more complex to fix the problem than it is to just break it um and that sort of Maybe by like really

working on maybe more optimization on that side or a different choice of model you might get different results um I'd certainly partially agree so my reason for being cautious and having relatively shallow networks is they do tend to um collapse quite quickly I'm sure if you've seen this you can have a network with billions and billions of parameters and if you don't get the architecture right you end up with almost all of them doing nothing right I don't know if you've seen that um that's quite common so I wanted to avoid that um I put yeah the softmax layer in to um try and convince it to do one action because that was easier what I did find

is even without that layer in it was still um tending to do just the one action even if it could do 15 different things um what was the second part of the question again oh is chaos wins right um yes I'd certainly agree so the real challenge for these networks is they need to understand what's going on before they can properly remediate um also if I'm going to be honest um this was done on a fairly low spec desktop so there was limited bandwidth for training large networks so it's going if this took three times as long to train but didn't produce dobly different Behavior go back to the simpler one okay um I think we need to end now

um thank you very much for your talk Martin and a big round of applause and if there any further question I think Martin would be happy to answer them in person yeah

Automated Wargaming Of A Chemical Plant

Related talks