← All talks

BSidesCharm -2019 - Hunting for Threats in Industrial Environments and Other Scary Places

BSides Charm49:028 viewsPublished 2021-05Watch on YouTube ↗
About this talk
Hunting for Threats in Industrial Environments and Other Scary Places Threat hunting in Industrial Control Systems is a proactive tactic that can be employed by network defenders to gain familiarity with network terrain and to seek out malicious behavior, presence of vulnerabilities, or otherwise unknown activity. Unique constraints in operational technology environments present significantly different challenges than more standard computing environments. This presentation provides the audience with an inside look into challenges that ICS threat hunters face. Presenter: Nick Tsamis and Marc Seitz (@SubtleThreat) Nick Tsamis works as a Principal Threat Analyst within Dragos' Threat Operations Center where he focuses on hunting malicious activities on the world's critical infrastructure. He brings real world experience hunting on production systems to continuously improve threat hunt execution. Nick is passionate about automating complex workflows to increase analytic efficiency and relevance. Marc Seitz works as a Threat Analyst within Dragos' Threat Operations Center where he coordinates industrial control system cyber test lab functions and performs threat hunting services in ICS networks. Marc is a specialist in designing and implementing innovative simulated industrial environments for the purpose of providing a safe and realistic training and attack simulation experience.
Show transcript [en]

like to get this thing going so uh welcome to uh welcome to the the startup after lunch here um my name is uh my name is mark seitz this is uh this is nick samus we are from dragos dragos is an industrial cyber security threat detection company we have it we have an intelligence team we have a through operation center we actually have a software product as well so that's kind of the background on us this is not a vendor pitch at all what we're going to go into is going to be talking about threat hunting and industrial environments some of the some of the constraints some of the differences some of the similarities on what we see on a daily basis and and

sort of the methodology some of the approach that that we like to take um into into how we how we actually threat hunt here so as you can see this is get this is what we want to be a fun presentation we know it's right after lunch second day of the conference uh only only 20 or so slides here we will kind of make this interactive uh leave time for some discussion after the fact so uh let's uh let's actually just get into that so the first part being um what what exactly is ics right a lot of the times people people think about that is that's electrical power and that's that's oil and gas right those are kind

of the big ones that come to mind um but what else is there right so we have water treatment facilities that well that's kind of a big deal um you know manufacturing so here that you know that one that's still manufacturing right they're gonna roll out uh you know the cold world steel do some of those things but what else pharmaceuticals right a lot of a lot of industrial equipment a lot of uh precise measurements a lot of a lot of bashing that needs to be done in order to make sure that we that we get the right drugs through there also uh food food processing food manufacturing those are all gonna be parts of you guys

have all seen like the how it's made videos that's all gonna be done by some sort of industrial controllers right doing the same repeatable tasks also transportation right shipping [Music] if you guys have a chance to check out some youtube videos on some of the the newest uh you know shipping ports being built in in shanghai and singapore right making artificial islands just because it's it's faster more reliable the automation processes to get ships in and out again all done by by industrial equipment and controllers trains another great example right those those all have to run those often understand exactly where they're at on the tracks and and what they're moving around and most importantly

uh beer right you know champagne and beers obviously you guys all all want a piece of that that's all going to be done by industrial equipment right these are industrial environments critical infrastructure arguably most important critical infrastructure there um so yeah so so just give me a firm reference of the wider scope of industrial environments when we were talking about that [Music] a bit so generally everyone's a special snowflake and that's some of what we'll cover in this presentation we'll specifically talk about the constraints what we're going to really hammer at is what makes ics threat hunting different than just traditional or i.t systems as we call it we want to make sure if if you come away

with anything else the fact that a one size approach one size fits all approach between both of those computing environments is not viable and will present some of those challenges and that's a really good point you bring up everyone is a special snowflake in that domain where we do see similarities in vendor and protocols per industry vertical for instance food manufacturing typically relies on the same kind of vendors every every environment has nuances and we'll talk about some of the approaches we use to get around that um so mark mark gave a quick background of a couple different industrial environments by a show of hands is does everyone when we think of ics do we all think of those

environments is that is that normal or is that revelatory to a couple people that's normal okay so i just want to kind of kind of gauge we want to give that high overview we also want to give some technical details when we say operational technology who would say i'm familiar with operational technology in this room familiar with it okay am i not loud enough no repeat the last question oh the last question got you so the question was how specific are our industry verticals do we see the same vendors do we see the same protocols um and again the answer is everyone is a special snowflake we do see trends between manufacturing between transportation we're going to see some

some trends there um if i say and last question if i say um modbus function code analysis by a show of hands i i sort of understand what that means okay cool so we want to give a an introduction and and show how these are some of the challenges we deal with and how we get around that a brief background we're not going to get into intel we're not going to get into specifics of threat hunting but just a brief background on ics tailored incidents in the past couple years i won't go into detail on any of these but the main takeaway here um and i'll direct you to to a lot of the research that we have in the the

public space on our website we do a lot of good information to get out to the public on what happened in these incidents and what are the main takeaways and the main takeaway from a couple of points that we see here is that in the public domain there's a small sample of evolving threats um as you mentioned these threats that we're seeing are tailored because everyone is unique in some kind of capacity the landscape that we need to look at is very specific to the kind of domains that we're looking to protect the kind of systems we're protecting you want to talk about what is threat hunting absolutely so so before we really really get a chance

to talk about the constraints or limitation of limitations of hunting these environments we have to make sure one we understand what ics operational technology is you know started to cover that a little bit now we need to make sure we're on the same page with what threat hunting is right so fun definition because everybody loves the definitions and of course like we're bringing that we're going to bold that we're going to highlight it we do all that fun sort of stuff but great so we know it's it's a proactive approach it's analyst driven you know you can look through some of these things it's ttps but really what does that mean right so this is something we've done

we've done a a lot of work on so maybe it makes sense to actually start with what it is not actually because we actually see a lot of times it's we're not arguing about the definition we're arguing about its actual implementation so if we bring up something like it's a notification right we see we see an smb file share access that is not threat hunting that was we're we're seeing some sort of alert that's not a proactive approach we have something telling us there might be something wrong with your system same thing with hey there rdp connection but we know that remote access could potentially be you know it not you know damaging to an

environment there's a lot of things you can do there's a lot of lateral movement you can do but that is still not threat hunting right you can threat hunt on rdp connections but actually coming from an alert perspective having something tell you what's going on a little bit different so if you just want to read that for a second right some of the some of the things we want to talk about is is hypothesis development right when you're talking about proactive the analyst driven steps the manual process of i'm going to look through data i want to target my approach i want to focus what i'm what i'm going to be doing so i have to come up with a hypothesis i

have to figure out hey i think an adversary is leveraging an approved vpn connection right i have vpn connections in my environment it could be from contractors could be people working from home whatever it is right using remote connections so now when we talk about remote connections that's rdp that's vnc right there's a number of you know wmi all kinds of different protocols we're essentially opening up our scope there but refining it to remote connections now talk about the next action you know they're using those remote connections to move laterally to to change a plc program right now i have an action i'm hunting for some sort of program that's now going to change in my environment

based on a plc asset identification and now i'm going to say that that program is going to cause a plc stop maybe i know that that if that plc stops i now see downtime i now see that there is is risk to the people that i have on the manufacturing floor whatever the consequence is there i know their plc stop is going to be damaging to my environment and i'm going to talk about i'm going to say during the fourth of july we have a lot of a lot of people out you know and that may be you know more of a time for for people to to try and get in right that is essentially

what we're looking at is going to be something that we want to see we want to take these actions we want to say yep we're looking for vpn connections we're looking for remote remote access maybe it's specifically rdp because that's what we use in our environment and we're sort of chaining those behaviors together this is going to be a hip hop a hypothesis of some artifacts that i can now go look for so a couple things to note about about threat hunting right we're looking for some of those unique behaviors if i do something manually i'm hopefully looking for something that is unique from the perspective of i've done it for the first time in my environment

after the fact i want to automate that i want to go be looking for new things i want to continually you know we'll talk about an automation footprint but i want to be practically searching through data right i want i want to have hypothesis development i want to see what the coverage is across um you know maybe you know the the ics kill chain right could be able to look at coverage in my environment automating after investigation and this and again important to mention before we get into really the the the ot and it side of things is going to be this is not an automated process threat hunting is not something you can automate you can automate

threat hunting after the fact you know we talked about file share access maybe you do a hunt that is looking for file shares that are being accessed great you now have an automation footprint where you have the ability to triage that notification same thing with the remote connections all right there should be a hypothesis that leads you to that so eventually when you see an alert like this that is that is the steps that i want to go through i want to be smart about how how i use through honey in my environment so that that kind of gives the basis of what we're looking at for threat honey to be a proactive approach to be

something that you know it's unknown territory for your environment it may be something that someone's been able to automate before but you really need to do some of that manual work up front because again we talked about the special snowflake cases yes what is the elc program plc program so programmable logic controller so what that is it's essentially a computer that is designed to do specific tasks and you can upload a program to it to perform those various tasks now um we won't get too far into to executives but um essentially just so limited functionality do very specific purposes very specific tasks [Music] so um to to generalize that quickly just to talk about what we lump into

operational technology a plc is one example of a controller something that actually physically changes the environment that it lives in effectively it's a embedded embedded device that runs typically some kind of proprietary software speaks proprietary protocols and controls processes at the end of the day all of that lives under the family of what we typically call operational technology we can't have an operational technology presentation without having the itverse ot slide so that's what this is traditionally we view these computing environments as completely separate and we'll see it as this bubble and it's things that we deal with on a daily basis it's mobile devices it's windows workstations it's email servers and then we put this other bubble on the

other side saying this is ot this is where plcs live this is where remote terminal units live this is where a hardened industrial component switching infrastructure lives and then somehow in the middle here this is where the dragons live like everything's neatly and uh nicely organized and we say cool when we go do an ot threat hunt we're going to go look at that because that's where bad things happen in reality that's not really how it works in reality it versus ot is like a pizza um we have a heterogeneous mixture of various computing enviro computing components that are mixed within if we have a pepperoni pizza i'm not going to bite into one piece of

dough and my next bite is going to be all sauce and my next bite is going to be all pepperoni and i miss some cheese in there and similarly when we think about this from a threat hunting perspective we need to be prepared and understand the entire contents of a slice of pizza at all times operational technology isn't anything that is not i.t it's things that include you know standard technologies that are commonplace within it smb dns these all have a very important role typically in industrial environments so how we understand how all those various components play together is very important for how we threat hunt [Music] in this case we've got some i.t scattered and you have a bunch of other

stuff that you've never seen that we'd classify as ot or that we don't even know was there to begin with [Music] the second it versus ot slide this is what we call the purdue purdue reference model um basically it's a way to theoretically and conceptually organize the computing that occurs in in an industrial space at the bottom level zero is where we have field devices these are sensors these are pump controls these actuate and sense from the physical world those speak to level one devices and a plc is an example here a controller above that at level two is the supervisory plane in which we're having humans actually monitor processes we have human machine interfaces so i

can see what computing and components are actually sensing from the physical perspective it's how we build that human gap above that we have operations and support as we're moving up this stack we're getting a little more i.t a little more commonplace facility network is probably where a lot of your more standardized uh it components are going to live when we think of standard computing environments and then level five is is enterprise so we have our enterprise we might have a link down into an operational environment where more of that specialized ot equipment is going to live this is typically how um i i t and ot environments are promised in reality it's significantly more compressed this is how it's typically

delivered and that's why if we take an approach of we're going to go look at all the it stuff today and tomorrow we're going to look at all the ot stuff this is where a lot of threat hunt fails it's because we're not we turn that it lens or that ot lens off at a given time rather than understanding both of those environments throughout an entire hunt put more simply it's a promise to iran swanson and we got it ron swanson so with that understanding a little bit of background there we wanted to talk about um some of the unique constraints what makes it versus ot different and we're going to have four bullet points that we want to talk through

[Music] absolutely yeah so and you know just to add more meat to that right i mean it coming coming in into that you know the ics space you're kind of doing some of the first threat hunts and you start to see some of some of the weirdness there it's like ah we are looking at ron swanson it really it is the same person right you just you know you have to have look at things through it through a different lens there um right so so talking about uh legacy systems really that's that's one of the first kind of constraints that we see in this environment and we talk about legacy right we will address the old like old windows

versions right things that are you know way way out of date way out of um you know out of patching all that kind of stuff right so so things that we see typically windows xp uh we've seen some some windows 95 right like systems that are that are put in the environment that just keep running right it's the mindset of it's if it's not if it's not broke don't fix it right and from from the engineering side from the plant side there's no reason to go upgrade that box or for fix it or patch it because they need to keep the process running right if you're if you're an oil oil gas refinery you need to make sure that you

are continually refining oil 24 7 365 that is the mission of that plant so changing that that windows device you know when cyber security folks come in it's like hey we need to patch that device it's well what is it how is it going to affect you know my production my output right now because if you leave me down for an hour right and there's you're trying to troubleshoot trying to figure things out well i've now lost an hour of production time right and now you become responsible for that as a plan owner the other thing we want to talk about with just legacy systems are things that we run into all the time right plain text hard-coded passwords

[Music] readily available inside of inside of manuals right lack of encryption lack of authentication lack of logging capabilities um those those are things we typically see from all kinds of devices so at another level though what we would talk about is even if you wanted to make something you know tunnel through ssh and set right you want to securely access your boxes you want to go patch you want to go do some of those things there are devices that legitimately just do not support that right telnet is the way that you will access that device because ssh is not a protocol that it will be able to speak right you have to understand those limitations

and when you see telnet in the environment have it not be oh my gosh that's crazy it's okay well let's understand what's inside of it what what more can we understand about that device so one of the war stories i want to put behind this is we were performing a threat hunt at a solar and natural gas facility down in uh down in california we were working through some of their devices we were looking at their at their threat hunt looking at the backbone of their environment we found a bunch of cisco devices their their entire cisco backbone chain was massively out of date and unpatched and we went back to them and and we kind of knew this was going

to happen but we said hey recommendations are are these three options for it what do you think you can do about it they said actually we we know that that's a that is a thing we can't do anything about it because the vendor that supports those devices and supports our control system won't let us update without invaliding our support contract so as as that as that kind of facility right if you have downtime and it's something that the vendor needs to come and fix you need them on contract you need them to get in they need to be boots on the ground to fix that so your backup with your operations for them they have to keep that system

out of date and unpatched or they have to go rebuild their facility and bake in a new contract with it which you're talking three to five years of rebuilding a plant not hey we're just going to keep operations up so that's the big thing that we want to talk about with that first thing of legacy systems the second part that i want to talk about is going to be the uh proprietary devices software protocols trying to get an understanding of what the the vendors in the space are actually doing not from the security perspective but just from the keeping everything running how do we get the you know the cookies actually shipped out to the trucks and

making all the way through their cycle anything with that that degree so a couple ones i want to throw out throughout here are rockwell and you guys may be familiar with them or not these are essentially automation vendors or just control system vendors so rockwell they tightly control access to their customer portal they really don't have much that's that's public out there that you can go and just you know grab from from online and they charge a premium to access some of their extra data even if you're you are a customer schneider they implement their own so some of you guys said you're familiar with modbus modbus is an open open standard protocol you can it's uh

very widely used commonly accepted well schneider actually made their own version of that right so now not only you have to understand regular modbus you have to understand the changes that they that they've implemented on top of that emerson emerson essentially their protocol stack is is built on old protocol stacks so not only again do you have to know what they're doing the new things that they've done is what does the base protocol do and again trying to figure out where you urge you and find that information uh yokogawa some some of the same things right and what we talk about there is how do we actually figure out this information is there stuff available

online like do do we have to reverse engineer ourselves potentially or can we get some sort of partnership where we kind of figure those things out it's all a necessity for us one of the biggest challenges that we've now seen is that vendors coming out with new products boast interop interoperability across all sorts of other vendors so honeywell is now having the ability to integrate with yokogawa and schneider and rockwell great now i have to understand all those different those different devices when i go into into threat in these environments so really what we're trying to show you is this presents a challenge to the threat hunter in this environment right and again it's going to be

different because each facility that you're going to be into is going to be a little bit different as well so again it's it's ultimately harder to find what is normal because you have to figure out what's going to be normal first and even even understanding what the facility is going to be defining as normal as well [Music] another constraint that we see more heavily in an operational environment is the notion of a system what would be my this is at the end of the day operational technology is keeping a facility up and running it's it's what makes a refinery refinery it would make it's what makes a train run in typical computing environments it you might just

have a whole host of workstations and as users change their roles or execute their job functions throughout the day they use different software they use different technologies they use different protocols but at the end of the day the the workstation doesn't really change right you have a standard it build in ot identities really matter we have a workstation for a very specific purpose if that workstation has the hmi software the human machine interface software i can't just do the the job of a of an engineer that's not an operator if i want to go change the software that that lives on an embedded device i need to have a specialized piece of hardware and associated software for that

so identities very much matter why that presents a challenge is that we need to understand from the technical perspective what that identity really is trying to do this device is trying to execute these kind of functions and for the challenges that mark has identified if we have closed protocols or protocols that we don't understand it's our responsibility to go figure out what that actually means so i can see what that device is trying to do similarly these devices as they execute functions they're all typically pretty critical if i have a component in operational technology space it's probably there for a reason one of the one of the trends we're seeing over the past year or so and you guys have probably

seen in the news is we're seeing a lot of opportunistic ransomware targeting industrial firms in typical i.t environments if i have backups great i can isolate the ransomware incident i can flash those machines and i can get users back up and running i'm going to put the same build on there and say hey sorry we're going to you know resync your your email outbox and we're going to you know resync your your messages at the end of the day no problem in ot i need to go figure out what software packages were on that thing i need to go figure out the exact configuration the software build that i need to put specifically on that unique

machine and these present significantly different challenges because that's not a you know hit the flash button and we're good to go [Music] similarly one further point tying network actions to system identities is important being blind to protocols is one such challenge for us but at the end of the day even if we understand a protocol we still need to build context behind that protocol so identifying that a device is speaking modbus is significantly different than this device is querying registers and actually executing register changes to its subordinate devices building that context and having our homework done ahead of time is really important on the notion of a system the fourth thing we'll talk about is

uncertainty in industrial environments the systems are typically rather deterministic via either operational constraints or artificial constraints because nobody's tried anything unique in these environments if i built up a system 30 years ago i don't have a bunch of hands in the pot trying new things i don't have operators that are downloading new software packages um because it can make you know animated gifs out of my videos and i don't have that kind of uncertain behavior on my operational environment when we go into these environments for a pen test or for a threat hunt that brings uncertainty to to operators and that uncertainty causes a lot of uh hesitation with people that own these systems rightfully so

so in order to get around that uncertainty we need to via tribal knowledge the expertise about being in these environments understand the challenges and build unique tactics to get around those so things that are you know kind of commonplace in it environments being able to fire off a scan to collect a bunch of information even if we understand how to collect that appropriate information there's hesitation around that because that kind of scan has never been fired off before we've never done this kind of thing and i'm not going to do that on my operational environment for the first time ever log collection in it we can typically go say let me go look at the authlog let me go look at

the event log in ot we don't always know how to pull those or we don't even know if logging is enabled on some of these devices similar with debug information [Music] we like to call the availability of information very special and this kind of comes back to one of the original questions at the beginning everything is a special snowflake because some environments based on how it was implemented based on how operators are actually implementing the system you might have a trove of information you might not you might have no idea what kind of debugging information is available to you but it might be sitting on a random syslog server or con on the other side of things you

might have absolutely no available no information available to you so i want to talk through one specific use case we talked about how proprietary protocols present a challenge we'll talk through an open protocol here modbus is a fully open source protocol it's integrated into a lot of standard security or network analysis tools out there zeke formerly bro has a modbus parser built into it we can use them in their default configuration to get some kind of information based on network flows but if we just do that just use it in their standard configuration and don't understand operationally how it's used it typically yields information that is insufficient so on the right side here is anyone

familiar with with what a modbus log looks like or is this new to everyone okay this is generally new so what we see here it's similar to a con log from zeek and what we've done is analyzed a source i p address speaking to a destination ip address over port 502 which is what modbus speaks on and there's a concept within the modbus protocol of a function code and when we see read holding registers here that's what that function code actually means so in this case 192.168.112.21 is executing is executing a read holding register command to 192.168.112.50.52 and 51 also and this is what bro and its default configuration will give us so we say

cool great we're seeing the exact same behavior from one device to three different devices and if we just use this information with no context we don't have a lot to go off of however if we dig a little deeper if i now start building some operational understanding to this i can now develop a deeper hypothesis and say based on my understanding i know that that master plc should only be requesting from starting register x50 and if that's the case i know all of its remote plc should only be providing information back on on just that register i can look deeper in the packet and i can pull that information out in this case if we plot it over time

great we see every single read holding register request is initiated starting at five zero further with some additional information we know that the modbus spec you give it a starting register for this function code and a number of bytes to read back so if i build that one step further i should say the master plc should only be requesting six registers now if i can build this kind of analysis from a threat hunt perspective my hypothesis my hypothesis of that master plc is reading more information or it's trying to do some information collecting that it doesn't necessarily need to do i can immediately see some things stick out here and in two instances we're reading

additional information that the system isn't designed for our operators don't understand why it would need to read those additional messages so this is an example of how understanding our environment understanding that operational context we can go beyond standard analysis to really build that context in ahead of time and enable us during a hunt yeah so so we talked a lot about the challenges there what i want to apply that to is is a real-life real-life case study you know some something that we actually run into like let's talk to our threat honey here so we're looking at here is uh is a wind farm that uh that we we did a threat hunt on what you see here is farm one farm one

and two they connect to an energy management system right there there has to be a way that you know when that when each wind farm is producing energy that it's being balanced it's going to the right place so that all ties into the energy management system network and then all the rest of the information goes out to the the hq corporate the biz uh grant to be able to access that information to be able to make it actionable um and then you know just be able to check in on the win farms as well so right from the get go the challenge challenges down the left there are the proprietary and legacy challenges so you tear a habitat is anybody

familiar with what eterna habitat is not at all right this was this is something new for us we walked into is that's that's a unique control system right that's uh that is eterna habitat is is a way that a lot of wind farms and a lot of the electric sector um actually use use that for their energy energy management right it has a very specific way on that interacts with devices and stores information and allows things to be accessed so one that's a proprietary challenge because there isn't just you know the quick wikipedia page on here's how eterna habitat works it's a proprietary system right a vendor a vendor is trying to compete with other vendors on my system

is better than yours so they're going to try and vault that information away and also the modbus controllers again open standard on things got to understand what those what those actually are so moving to the next part right so let's let's zoom down into one of those wind farms what we see from that is that there's a master device there's essentially a master plc and then there's outstation plc's so it's again one computer controlling the rest of them and what that one in the center is is essentially giving all the commands to all the actual wind turbines that are that are out in various locations um you know that actually make up the wind farm

okay giving us you know the directionality here we now identify that there's a system identity challenge right one of the that uh that third bullet point that we talked about is what you know if i'm reading this what does master actually mean what does plc mean what does that mean as an outstation device directionality wise i'm seeing that it should only ever be the master talking the plc the plc should never return information to that well again now i'm sort of that's an identity challenge i need to figure out what's actually going on so when i threat hunt if i ever see a plc talking back to the master giving more information what does that

mean right i have to understand that so the the uh the weirdness that we saw from the thread perspective was um was actually developing the hypothesis that that centered around you know other connections to those plcs if we know that only the master should be talking is there something else that's talking out there we actually found that there were four ip addresses that were actually communicating directly to these plcs and that didn't make sense to us at all so immediately it's like oh shoot we got we got external ip address is talking in uh from what we know about the system identity from the network layout this should not happen at all well it's now uncertainty i'm not

uncertain about you know from what i know about the system identity should this or should this not be happening right so now my tactics are different so you know i'm looking at all right what other information can we pull right we're getting network traffic we get host logs from those controllers are those controllers even possible to even give us other locks can i even go request the aston or go and grab those right do they even feel comfortable going to those those devices also it's a wind farm so you're talking about driving hundreds of miles just to go collect logs potentially that was not the case thankfully but what we found out though uh is that

they're actually vendor connections it was an external vendor out in spain that was actually performing turbine resets so what what that means essentially when you have a turbo a wind turbine that that's up and and the um the wind speed gets too much the the plc will actually break um break the the wind turbine so essentially slows everything down to a halt you need you need that turbine to be reset so the vendor actually performs that connection resets it so it can actually go to spin again in lower wind speeds so we found this was actually these are persistent connections that are coming in right and actually the the asset owner found a contract that for the next three years this has

to be the case right and when you talk about the legacy systems ownership there's nothing the asset owner can do about that because when the control system was first put in this was exactly how the legal documentation was set up so that vendor will continue to have a persistent connection from a place that the asset owner that actually owns the wind farm doesn't know where those ips are coming from doesn't know the security about those things but that's how it has to be that's now an operational challenge that they have to work with and now when you're talking about threat hunting this is not this is not an alert you get fired up about right it initially is right

because we go from that oh my gosh you know to all right we have some context really what really you know this is just this is applying real-life use case to everything we talked about here so what are the what are the key takeaways what are the approaches that that you have to have um nick have you hit one or two uh yeah so talking about some of the constraints we want to be solutions oriented here how do we build a tool set how do we build um a capability to help us in those challenges uh the first one is doing our homework we alluded to it a couple different times as mark mentioned if we have proprietary

protocols our homework is ahead of going into that environment let's understand how this protocol operates is this a pub sub does this require two-way communications is this only a udp packet that we throw over the fence and hope that it gets there understanding all of these constraints from a protocol or a vendor level is really important we should do that ahead of time what that does is it reduces the burden of us the analyst when we're in the field so using all of the time ahead of an engagement we can build up a repository of information we can build up specific tools for that engagement similarly on the notion of a system understanding how these systems

typically operate and i'll come back to the the first question um which was how unique are things if we've been in an industry before we typically know how functionally energy management systems work are we going to look at every energy management system that's ever been created no that's not scalable but understanding from a functional perspective typically how these work we can now ask more pointed questions to quickly build up that information that operational context two define a methodology and a process and document um the findings and the knowledge that you build along the way if we just go in haphazardly and don't follow a regimented process we're going to miss things how we document our hypothesis generation is

important so that when we go back into a second engagement in an in an electric sector environment we know what worked and we know what didn't really work if we uh understand something about a new protocol that hey this was really interesting i would as an adversary i would use that against this kind of system i want to be able to document that so now during my downtime i can build up a detection routine to support a higher level hypothesis all of this information we're a large team not every single person has seen every single system out there being able to share and codify that knowledge is important for us as a platform as a software development

company we do have a platform and understanding all of these things that we learn from the field codifying that into the platform is important for us so that when we want to go run an analytic to identify some additional information that supports our hunt we can actually use that because we've codified it before right and that that really leads in into this last point here right this is probably one of my favorite gifts ever gotta enable that hunt you gotta at least got at least get yourself down a path where you're in you know maybe an uncomfortable situation environment you have no idea what's what's going on and there may not be a lot of research

out there you may have to do some of that work up front right take those those first two steps enable it then you gotta cut off all the knowledge right when you do it once can you do it you know can you do it another time can you make this essentially not happen make make the jump next time right you know continually running that process that's enabling the hunt codify the knowledge we have we have the benefit of being able to use a platform there's also a lot of research that we put out there right very very community driven where it's this is a big problem it's not going to be necessarily solved by one company

to be a lot of people that are getting into these environments because we're not going to look at every energy management system right or even and that's just one vertical that we're going to be looking at there's a lot more that we looked at when we talked in the front i'm trying to look at more beer environments to be honest with you uh you get sometimes you get some cool swag from those facilities wouldn't mind some beer on the way out um but that's that's that's really you know got to find the knowledge and getting things out there um yeah so that's uh that's all that's all we got there um questions yeah absolutely or were they coming in from some sort of

gateway yeah so the question was what the external ips where were they what what kind of connection they actually have have in so they were connecting over modbus there right so the the way the way that they came in um where they came in from the external side but they had local ip addresses right so then they that from the local side they were able to connect directly and over 502 to do those those actual resets how often are you able to ask like the engineers to design the system or like you know their homework it seems like a lot of that you know the baseline stuff right so question being um how often do you get a chance to work

will you ask ask these questions direct to the operators the engineers the ones that know the system network operators um it depends um and that's the worst answer ever so again i'll point again this is a great i swear i didn't plant the question ahead of time everyone's a special snowflake right so you're going to go into environments where the engineers have everything well documented they understand how the i t works they understand which switch control actually gets packets to where they need to go on the other hand you'll have environments where nobody's ever looked at this stuff you've never had an outage you've never had to document anything before you've never had to run any any solution

down from an i.t or ot perspective um so it depending on the environment you have a anywhere along that entire uh entire range there that said what we typically typically try to do is codify that that ahead of time so we can give him a a questionnaire right do you understand uh your routing scheme do you understand if there are any firewalls present in your environment do you understand how external vpn communications can actually route to critical ot components and based on that we can then level set with how we can change our tactics around that because one such example is if you have absolutely no visibility you can't give us an asset management inventory you don't

understand how packets get from here to here that might change uh our tactics on site to say hey we want to go to your backbone infrastructure and start tapping to build some of that context and the other thing that too is even if they have filled out the questionnaire sometimes they'd say well oh well you guys didn't um okay good oh gotcha um yes so sometimes they'll fill up there's like yeah we know how things are supposed to work it's like you get on site say okay what's the documentation it's like oh well we're just going to whiteboard it for you right and it's just one network guy like whiteboarding everything out it's like shoot i don't think that's right at all

um you know it it's exactly that so you get a very enraged yeah so obviously uptime is important in these environments what sort of redundancy do you see in those and how does it affect your ability to help them recover so for instance for do you have a device instance is optimized is there another device that has the same configuration yeah so question being what do we see from the redundancy perspective right how the control systems are built that uh you know help keep up time even even after an incident so again it this is i'll take your first stab and nick you know probably probably take this afterwards but it depends on the industry vertical that

you're in right the electric side being a regulated environment versus something like oil and gas or advanced manufacturing where they don't there will be redundancy built into those systems uh other times we you know from an oil and gas perspective it's we have that one firewall or we have we have that one controller to do that job and that's it a lot of times from the instant response perspective they will have already called in the vendor they will you know to make sure everything's back up then what we come in after the fact is what's the root cause right a lot of times they're not they're not following the nice sans six step picker model of let's

identify let's scope let's take that time and everything it's just get everything back up like if we have to pay the ransomware fee like we'll just we'll just get out of the way we'll bring somebody in after the fact to figure those things out so even if they're redundant or not right they're going to have those vendor contracts they're going to figure out a way to get everything back up and running in in even you know faster time than in a than an ir team could really be on site um any thoughts on that yeah one thing i'll offer um redundancy there's a lot attached to that word right so when we're looking at putting in operational components we

look at redundancy through a certain lens and it's going to be process failure or a random ethernet cable fails so the redundancy that we put in place is yeah i might have two switches so i've got redundancy at l2 but i'm not going to think about redundancy from a cyber perspective so now if i have all the exact same hardware across everything and i know one packet can corrupt the firmware across everything i don't have you know 10 guys waiting with 10 separate laptops to all go physically flash the firmware on those similarly if i cause a hardware failure in some of these i don't have 10 devices ready to be swapped back out i might have one um so thinking of that

in the perspective of what are you what are you building contingency plans for um is is a really important consideration yeah so when you're in a setting where it might cost millions of dollars [Music] yeah that's that's the challenge right the question uh the question is if you're in very critical environments where where cost is a thing and downtime is resulting in millions of dollars of lost revenue per hour how do you convince decision makers to uh implement change effectively is that right yeah so um it's identifying the potential um the potential threats to their risk we don't come in and say hey we're smart and we understand the exact operational risk what we can do though is identify tech

technological challenges how threats can pose a a risk to your risk model that you've already defined and one such example when you can come into an environment and they can give us a risk model to say here's what my bad day really looks like that's goldmine for us because now we can look at that and say how does the control system at the end of the day ultimately support what you care about being able to scope what you care about we can now build a threat profile to say here's how i can cause downtime integrity issues whatever it is to that control system in order to impact what you care about ultimately you have to make the the risk

of it ultimately you have to make the risk of a cyber attack outweigh essentially the the effect of the downtime right if we're talking about millions per hour right from from that downtime we're going to patch something versus downtime to like millions of dollars you know i don't know like like 10 million so that's like 10 million a day kind of thing right you have to you have to make them make them do the math and figure out it's it's a worth worthy investment for me to take my facility down now get things patched get new controllers in there do whatever it is to get to a better place [Music] yeah one question in terms of scanning so doing active

scans often you know to knock out some of your ics systems how effective do you think your passive scans are versus having to do more of your traditional scans like you do on it where you can't do those on some of your yeah so talk about the difference of the active active versus uh passive scanning uh which one more useful because obviously active scanning you know may take down stuff in your ics environment again a lot of times when we when we look at these kinds of environments you know just establishing a what is in my environment from a visibility perspective isn't there so from passive scanning there's a lot more that we can get

that just hasn't been looked into right from the it side when when you have that that ability to use an active scan you're not really going to push the envelope on what you can get from passive in the ot side when we when we look at some of those limitations and constraints from the passive side there's a there's a lot more that we're gaining i mean even even every day just from a research and development perspective on wow i didn't know that this protocol actually passed this kind of information um a lot of the times we talk about active scanning getting host names getting um firmware version software versions model numbers actually we're finding that in a lot of

these protocols that they are passing it but it's figuring out when it's being passed how often and can we actually grab it and parse it out because a lot of those things aren't there one one other thing all that too is it's very implementation specific so if you know there are certain protocols that exist out in in the domain and i want to get a certain element of information out of there a lot of times we can build a matrix to say these protocols are not going to support even if i went with an active query you're just not going to get that information you want right and i might need to get a profile of of

traffic passively for a week to to achieve some objectives some other objectives i want all the serial numbers okay whatever you're going to do with those we could maybe go get those but it really comes down to measuring the implementation versus what functionally can i actually achieve out of that [Music] so we out of time one minute one minute any final question [Music] all right all right thanks for coming