Preparing for Incident Handling and Response within Industrial Control Networks

Name: Preparing for Incident Handling and Response within Industrial Control Networks
Uploaded: 2021-05-08
Duration: 23 min 59 s
Description: Mark Stacey discusses practical preparation for incident response in industrial control systems, arguing that regulatory compliance alone does not adequately prepare organizations to handle incidents. The talk covers critical planning questions, network visibility challenges, playbook development, a

BSides Charm · 201823:591 viewsPublished 2021-05Watch on YouTube ↗

Speakers

Mark Stacey

Tags

CategoryTechnical

TopicDFIR Threat Intel

StyleTalk

About this talk

Mark Stacey discusses practical preparation for incident response in industrial control systems, arguing that regulatory compliance alone does not adequately prepare organizations to handle incidents. The talk covers critical planning questions, network visibility challenges, playbook development, and outsourcing considerations to enable faster, safer returns to operations.

Show original YouTube description

Preparing for Incident Handling and Response within Industrial Control Networks Most Industrial Control System (ICS) networks require Incident Response (IR) procedures. Generally, these procedures fulfill regulatory requirements and do little to actually prepare the organization for handling an incident. This lecture will concentrate on concepts that decrease required resources for IR, arm responders, and facilitate a return to operations. Presenter: Mark Stacey (@lzeroki) Mark Stacey is currently a Principal Threat Analyst with Dragos Inc where he delivers incident response, threat hunting, and adversary research for Industrial Control Systems worldwide. Prior to joining Dragos, Mark was a member of RSA's Incident Response team for 5 years where he provided incident response, discovery, and forensic services globally for private industry, financial institutions, law firms, foreign and domestic governments. Mark spent 7 years with the Department of Energy (DOE) performing cyber and intelligence analysis for various government clients. He has functioned in both cybersecurity operations and research within the intelligence community and frequently provides community education through outreach programs with federal agencies.

Show transcript [en]

good to go cool so my name is mark stacy uh currently i work at dragos we're an industrial control security company uh this slide deck is kind of targeted towards ics planning for incident handling in ics but a lot of it translates over to it as well i've got uh half an hour so i'm going to spew a bunch of words really quick and i've got some slides with a lot of content on them i know some people are going to take out your phone take a picture that's all good we are shortly going to release a refined write-up on this and i'll provide all of that in uh sample documents so if we've already got one if you don't want

to take a picture of it and then translate it over via later then uh we'll release some samples so incident response uh we're not going to talk about really anything technical rather what you can do to prepare for incident response uh and when i say prepare i'm not talking about controls or visibility network host visibility anything like that it's what you can do to prepare based on what you've got now so really the overall goals for ir first is to quarantine if you've got self-propagating code worm-like behavior in ics space we worry about material release or hazardous waste release really the overall goal is operational resilience i define that as expedited economical safe return to stable state

it's literally keeping the lights on that is really the the primary motivation and then additionally minimize future incidents root cause analysis we have ir policies and procedures pretty much every company has those now the whole intent being to prepare for incident handling my argument is that the ir policies and requirements and everything don't really prepare an organization it implies checkbox security so most of the infrastructure in the u.s is privately owned and operated it is heavily regulated by the government however and a lot of the documentation when you get down to the specific sectors they have some reporting requirements time reporting requirements but overall the guidance is pretty vague and this is a good example where we have

subjective adverbs in there and adjectives like timely detection and response correct exploited vulnerabilities what does that mean a lot of them have lessons learned is that protection detection the incident response process so the guidance is somewhat arbitrary that's good incident response in a wind farm is not the same as a petrochemical plant or a nuclear energy site so it allows the industry to target their ir policies it's bad for those same reasons it provides a listing of things the company should be thinking about but really it doesn't go into much detail and i'm sure anyone who's worked in the government understands the impact that an audit can have previous work with the federal government we'd have an ig audit come in

it was two months of dedicated effort to get screenshots of group policy and everything they'd be there for two months they would leave another agency would come and audit us failing those audits can be very expensive if you fail a nerc sip on it the fines can be in the millions of dollars so it kind of draws attention over to those compliance requirements but as we all know being compliant doesn't mean you are secure or prepared a couple case studies i was working with a local electric utility a year or two ago and on the scoping call we said do you have security controls yes i had firewalls okay great do you have aggregated logging they said no but we do have all

of our logs they couldn't get them to us they weren't sure how to extract them so we got on site they gave us access to the firewall and they had 15 minute retention their we called it the uh first in all out configuration so their requirement said that they had to have logs to them it meant in pfsense you go down to that rule log hits against this rule yes you check the box and you're good to go they didn't think to check the resources on the box to see what their override time was i think visibility is a big one a lot of times we say you have to have the capability for incident response

well full packet capture is great let's deploy it let's go a step better and put it on the core so we see ingress egress and core traffic it's really common however to have a dmz between your it and your ot or ics network and your it and the internet unless you have your proxy set up in transparent mode or even just core routers you run the risk of network address translation so do you have full visibility yes is it useful for forensics no it looks like everything is going to or coming from a handful of ips those mac addresses get overwritten by the switches i think the target case is a great example i think it was 2014 target was hacked

around christmas got to be a big deal two banks sued target their claim in the lawsuit was that if target was pci compliant they wouldn't have been hacked if you handle process credit card data pci compliance is what you need to have the license to handle that i don't know if target did it but there were some rumors that target was going to sue the vendor that they brought in to give them pci compliance so granted it's a fund recovery effort but think about that target is suing the vendor that gave them pci compliance to mitigate another lawsuit that said they wouldn't have been hacked if they were pci compliant sounds ridiculous feel free to laugh so

i think one approach is to come up with some difficult questions and see if your ir policies and documents answer these questions and i would bet that they don't i came up with some of these if you have more please shoot them to me i'm looking to grow out a list one is when is ir justified normally incident response procedures have an escalation document in there or some authority approvals but when does that policy take effect if we do tabletop exercises we know we're going to use those plans and policies we have them sitting on the table if you have three users that bring your computer to the help desk because they're acting somewhat slow you

have a junior analyst that's looking at it and he isn't sure if it's adware commodity or something more advanced when does that ir take effect and i've seen it apply when more than 10 machines are infected if it hits any part of the ics network if that machine exists somewhere in the purdue model that you've established for your network but have a launch point what happens if resources are under command and control the response is very different in it we push off any type of remediation until we have a complete picture of what took place otherwise you start blocking ips the adversary knows they'll go dormant for two to three months and then come in through a back

webshell that you haven't found yet in ics we run that risk as well however the cat the repercussions could be much more catastrophic once they know that we're on to them they could push out ransomware logic bomb or something else that can have physical impact and i think this is really the the best one up here when is a shutdown of operations warranted it's not like i t where we just disconnect from the internet uh it was in the subreddit shower thoughts a while ago someone said being a doctor is like a mechanic who has to fix the car while it's running and i think that translates well to ics incident response i don't remember who said it but reddit

doesn't give a about attribution anyway so in ics sometimes to get the logs we have to pull the box out of service and we can't do that it's it's very difficult to weigh the risk and reward until you have a complete picture of what's going on on whether or not that shutdown is warranted generally it's based on safety safety concerns and i think uh last year a very interesting exercise took place overseas where a mock utility network was brought down and they were doing technical exercise jointly with that they had a team of lawyers in london that were doing a lawsuit mock trials and lawsuits to determine liability the judges that participated in that found the electric utility was liable

for all damages to the customers that went without power that is big so the electric utility not only lost downtime which is revenue potentially had to rebuild the network and get new hardware if things were destroyed but all of those clients out there that had an impact those storing medicine that required refrigeration any potential health impact they were liable for those so very big economic impact some sectors like nuclear energy the difference between security and safety is diminishing it's almost completely gone so in those regards his incident response is different than emergency response that will impact the authorities and escalation period and how do you operate if you can't trust the data being reported a well-known attack out there recently

that changed the speed of centrifuges spinning however the data reported to the scada system said everything was fine we saw what trisis the arbitrary code can be written to even safety devices that can change uh potentially change what is reported to scada so if i know this uh let's use a wind farm as an example if i know this ring network of turbines is not acting right but my scada shows is correct how do i handle that a lot of our renewable energy energy clients have wind farms that span multiple states i need to have the resources to go out to those locations and the know-how to get the data needed and that can be a very uh very trying

exercise when you're trying to do incident response economically it can take a long time and then really do you have what you need to investigate root cause analysis and ask the people that will be doing the work so into the meaty bits uh i think it's it's kind of funny i've never in like eight years of instagram uh response i've never gone to a client site and said hey can i get a network topology diagram and ip allocation mapping and some guys says oh yeah here i updated this last week it's always written in like 1930 by some guy who doesn't work there doesn't understand computers it's not very useful we all know that we

need to have network diagrams but nobody does so we created this list and this is what we give to our ir retainer customers now in the ics space some of it is protected by again regulatory requirements so we say don't give it to us gather it and put it somewhere that is accessible keep it updated of note in here are of course network information infrastructure details security controls things that the ir team can leverage when they come on site or when they're starting to do incident response generating playbooks playbooks are getting to be a little more popular in the itn ics sector we've already released some content on this no ben ben come sit down oh fair enough

so uh out on the blog posting at the top uh we've released a little more refined write-up on this but some key points that i wanted to hit on the key goal of a playbook is to facilitate your junior analyst to let them get started either gathering logs or responding triaging data right away i think key points of a playbook are what questions based on the alert should be asked and then what data do you need to answer those really the i think a key problem that a lot of organizations face internally is knowledge transfer and this is a great way to facilitate that so under contents triage techniques uh one thing that i hear commonly with

junior analysts when they start looking at encrypted traffic like ssh well it's encrypted i can't tell anything this is where the triage techniques i would say well look at source and dust information of course but look at uh bytes transfer ratio is that consistent is a periodic between the sessions consistent that's indicative of a service instead of a human that kind of helps prioritize and then when creating playbooks if you're not sure where to start i would say start with those routine functions when you come in uh the first part of the day what reports do you look at if something in those reports looks weird what are you going to do and i hate the exercises that say uh

tell someone how to make a peanut butter and jelly sandwich but assume they don't know what bread is or peanut butter or jelly or a knife or it's not an exercise in documentation keep it bullet points keep it simple but help that junior analyst be able to get started until support can arrive the future is to leverage apis where they exist plug it in with your other tools like your case management solution so this is one screenshot for a or a sample playbook for a screenshot being exfiltrated from an hmi device and we can see under two here we have the assist me and we leverage playbooks to put in best practice information so if the receiving host is external and

bold here do not attempt to connect or scan to it so you can write in best practices to help protect the organization as well under three look at the image we have view extracted files it's not efficient if we have the analyst go to another tool and have to query down to that data to pull it so if you can leverage apis make the data presentable and then say a junior analyst looks at it and says yeah it's sensitive hmi information what should i do we have this assist me category where we provide context and when available specific threat adversary groups that are known to use that tradecraft i think this is really the the key

knowledge transfer piece because now that junior analyst can go pull a threat report on uh dymaloy or whatever activity group it was and see what else they should begin triaging finally uh maintaining in-house the incident response capabilities host forensics malware reverse engineering those are very expensive skills to keep when hopefully you don't have to use them that often so outsourcing ir incident response insurance is becoming more popular don't just establish an ir retainer with a company build that relationship i think establishing first again this should seem obvious but it's rarely done who's going to lead the incident i've done incident response engagements where we arrive on site they give us domain admin say all right you've got two weeks now

present that to the board generally it's led by some i.t manager or technical person but have that established before uh know how you're going to share information what type of communication you'll do if it will be email text whatever physical and logical access or big concerns especially in ics protection personal protection equipment that's needed what you need to get on site to get into the actual laboratory i think one of the biggest questions that you need to think about is what equipment can be used from working i.t incident response every time i arrived on site i've got my virtual machines my tools i love and the client says oh you can't connect to our network you can't

touch a keyboard because i haven't done the background checks needed for the financial sector or whatever so we're shoulder surfing throughout the entire engagement if we knew that ahead of time we could have planned appropriately one thing i would recommend for some environments like uh nuclear energy sites where everything at certain levels is considered a critical asset cables keyboard mice everything in those instances have the ir retainer staff get the training that they need beforehand have them get the nercsip training and then get two or three laptops ask them what tools they need and put the tools on the laptops build them out stick them in a locker update them quarterly but that way they

can land and have the tools if i can't save logs to a usb drive and use my own tool suite on them i'm really not going to be efficient if i'm limited to excel data handling if they can pull that data off if they have to analyze it on your system and do them a favor take half an hour and establish a timeline of what has taken place before you call them i've been on multiple initial calls where there's 10 to 15 people and it's like well joe put this firewall rule in place and then we started having these alerts and you know joe responded to it but we're not sure if he sanitized a laptop or not

and whatever took place and it makes the ir analyst have to rebuild that timeline sit down and tell them what was found the original indicators what took place and what actions you have already done it really allows the ir team to hit the ground running and pick up the work from where you left off and also train together at drago's one thing that we really encourage is to do a tabletop exercise or some type of limited engagement to see how each company operates we assist with gridx for a couple of our retainer clients so if you have a retainer reach out take three or four hours and ask them to go through their process with you

and i would say actually work through the process have them call the ir hotline have them initiate an actual mock engagement have them share sample data you would go back and request what logs additionally you would need and actually work through it with them so to keep it short and sweet preparing for incident response your compliance dictates you have to have ir policies and procedures those do very little i would think think of them as a good listing of what documents what things you should be thinking about but avoid that checkbox security really gather information about your network we all should be doing this we've heard it for years nobody is doing it efficiently generate

playbooks facilitate that knowledge transfer because when an actual ir event happens that is really what improves the efficiency that allows whoever gets the first alert to begin gathering logs if you have 15 minute retention well avoid that at all costs but at least then you can start grabbing that volatile information if that junior analyst knows what to do and then build relationships with the ir retainers so again we'll release uh i'm pointing on my laptop screen like you can see it we'll release some information on our blog site with some of these sample documents that you can download and tweak to your own environment and then if you have any other questions on uh companies or other examples of questions

that companies should be asking themselves to really make sure that they're prepared then shoot those who may be certainly interested and i think we've got uh five minutes for questions correction we have no minutes okay yeah

so great question i have a very confusing slide in here somewhere that i pulled out for time but nuclear energy generation just to show an example so the nrc is ultimately responsible for uh safety in that area so the nrc got together and did that example that i showed earlier somewhat arbitrary so the asset owners got together and said what does this mean and they went to nei and said tell us how to interpret the nrc's guidance they came out with oh 809 that had a little more granularity but it's still very abstract once the power hits a switch yard and it gets transferred to a high voltage forward transmission then ferc takes over nrc and ferc are

independent government agencies so ferc is responsible for uh reliability well they're federal government so they didn't feel uh like it was their place to identify those requirements and so they gave the ero or that organizational responsibility to nerc part of nerc is critical infrastructure protection which is nerc sip and they really take over at that uh the transmission line for distribution out to clients so from the outside looks very confusing inside it's a little better understood and i know this font is small but if you read these descriptions they're all somewhat arbitrary the intent is to have organizations decipher what they should be thinking about unfortunately what happens is organizations check a box they do what

is needed to fulfill that requirement complete the audits and then they move on

the issue yeah um you know non-trustworthy data coming from your senses and are you finding that your clients are open to the idea of changing

so uh the question are customers really open to uh i would say additional technologies they're solutions to validate the data being reported to the scada systems uh they're certainly open to whatever improvements they can do that won't impact operations the uh one of the main concerns is if a physical device has a firmware overwritten then there's nothing really you can put between that and the scada system that will validate that data depending on how your environment's set up you can have uh like redundant temperature sensors but really if a safety system like with trisys is responsible for monitoring certain things and that firmware is overwritten i guess short answer very contingent on your environment is a customer open to it absolutely they

they want to get better they want to be doing the right things other questions all right then i think uh are we about out of time five minutes so i've got a very small dance uh it is highly inappropriate but just hold on no i'm kidding uh i think the other track is right next door so when we uh shut this down your job is to cheer and loud crazy likes so the person next door gets kind of shaken and feels like he has to compensate so that's all i've got no other questions cool thank you guys all right all right now you're patronizing

Preparing for Incident Handling and Response within Industrial Control Networks

Related talks