
And uh today we have Theresa Sub with us uh with mission assurance applications beyond the military context. So thank you very much.
>> All right everyone good day. I'm Theresa also known as Tess and today I'll be talking you through this concept called mission assurance. Uh so the scope of this brief I'm going to touch on what mission assurance actually is. We'll talk about it, what it is in a military context, but then we'll actually move on to why it matters outside of that context. So, the critical service sectors that it means in cyber security in general, and then we're going to jump into a case study and show how this changes traditional incident response procedures and what we can learn from that going forward as incident response and cyber security experts. So, before we go ahead, just a quick
disclaimer. The views and opinions expressed in this program are those of the speakers and they don't necessarily reflect the views or positions of the entities they represent. So now we've got that out of the way. Uh what is mission assurance? So mission assurance comes from the US, specifically the US Air Force. So the US Air Force realized that they couldn't just focus on cyber security as simply being focusing on systems and systems individually without focusing on the larger picture. So they defined mission assurance as the process to protect or ensure the continued function and resilience of capabilities and assets critical to the execution of defense mission essential functions regardless of any operating environment or condition. So essentially what mission
assurance means is you have to be able to achieve your mission regardless of the cyber security events that occur on your network. So if cyber is an input to your mission, we protect the mission by protecting our cyber infrastructure. So the military why? Well, systems don't just exist for cyber's sake. They exist for a purpose, for an underlying business need. So in a military sense, cyber infrastructure exists because it forms part of military systems and military systems exist because they contribute to military missions and military missions exist so they contribute to military effects. So if the end world is military effects, cyber all the way down the bottom infrastructure achieves those military effects down the line. So when we think
about how we do threat hunts and incident response through this lens, it changes how we consider all our actions. So they can start to look really different. So let's thinking about prioritizing the mission, some of the things that might be considered. One of the first components is the idea of fighting through the attack. So in a cyber environment where mission assurance is the priority, we can't assume that a network will ever be in a safe or hygienic state ever. Not only do we have the assume breach mentality of hunts, but we also have the assume contested or conflict mentality. So your cyber system is never not going to be contested. It's never not going to be
attacked. It's never not going to be safe. So you need to be able to fight through the attack. You need to be able to assure that mission to ensure service availability, reliability, even if it's under attack, even if there is an intrusion actively in that system. That also means sometimes that in your network you may have to sacrifice resources. You're fighting through an attack. You need to make decisions and sometimes those decisions may be hard. And in order to sacrifice resources, you need to understand what's important in my network. What's less important? what actually will contribute to my mission succeeding and what actually might not. As part of this, you may have to limit your forensic actions. So, what I mean
by this is because you're trying to move through so quickly and maintain your position to achieve the mission, you may not have time to slow down, take items and assets offline to do isolation and slowtime forensics because those assets are needed to achieve the mission. And if service continuity is your goal, you can't take critical infrastructure offline, which means doing the long-term forensics, taking things offline, it's just not an option. And finally, decision decision superiority. That means making decisions faster than your adversary. So when you have an active intrusion set on a network, it's more important that you jump in front of them and make decisions faster than they do. That will enable you to achieve mission
success. So, we've kind of talked about this in a military perspective, but outside of what the US Department of Defense has defined, what does this actually mean for us as industry professionals? Well, consider the government, I'm sorry, consider the sectors including government that must be resilient and survivable. So, these are things like finance, government, and national critical infrastructure. These are things that where maintaining critical service and continuity of service is extremely important, sometimes more important than just maintaining cyber hygiene or engineering standards. These are also sectors that are most likely going to be targeted by advanced persistent threats as well. And they're areas where we're going to have an assumed breach threat mentality in our
threat hunts. In these sectors, remediation is not necessarily going to be more important than maintaining critical services in the event of a cyber attack or an intrusion. It's all going to be content context dependent. It's going to depend on what's happening. And no two services are ever going to be valued the same. So power outages might look really different at 3:00 a.m. in the morning compared to 9:00 a.m. in the evening in sorry 9:00 a.m. the next day. And so we need to consider what the context around these intrusions and how they affect critical services and what that actually means. So what we're doing here is we're switching from a type of incident response process that's very system
focused to a process that's mission focused. And we're going to go through a case study example for this. So we're going to look at an incident response for an emergency management center. So this emergency management center runs 24/7 and it's essential for emergency operations within a town. It takes emergency calls. It does emergency vehicle dispatch and it does emergency response management and e emergency evacuation support. Here's a network map of what the EMC network looks like. Now, this is a really small network, but the principles we're going to talk about today can be applied to networks with hundreds, if not thousands of hosts. So, really focus on the aspects and the principles we talk about, not necessarily the specific
hosts that we're looking at because for an infrastructure perspective, this is not that big. In the EMC network, we have the call center, which receives the triple0ero calls that come in over VoIP. Then we have the emergency response subnet. They manage the incidents. They dispatch the emergency response and they do liazison with the fire services, ambulance and police services. We then have the management subnet. Think your mid-level managers up to your CEO monitoring operations, closing tickets as they come through, generally overseeing the operations of the EMC. Down the bottom, we have the administrator subnet. There you got your system engineers, system admins working on these higher privileged computers. Of course, then we have our servers. We
have a file server that's hosting the incident records for the emergency management center. We also have a web server which hosts publicly available emergency information. Think current updates for bushfire situation, emergency serious context traffic jams. Now, you may notice that that web server is not sitting in a DMZ. So, as an instant responder, I'm already saying red flags. Another thing to note about the EMC is 30% of its workers work from home and they RDP in and use Microsoft Teams as part of their solution to work. So, let's do a mission assurance analysis on this before we jump into the instant response. The mission of this system is to coordinate emergency response operations. But what does that
actually mean? Like we need to break that down to make it meaningful for us as incident responders. Well, what are the four mission essential services that this actually provides? What are the four things we need to do in order to achieve that mission? We need to one receive emergency calls from the community. We need to two lies with emergency services over the phone and internet. Three, we need to be able to track and manage emergency incidents as they're occurring. And four, we need to provide emergency information to community via the website. Once we know what these services are, we can then overlay them across our infrastructure to get a better understanding of what elements of our
infrastructure are critical to the mission succeeding and what elements are less critical to the mission succeeding. So for the mission assured tool service one receiving calls that really relies on the call center being able to receive an active internet connection to receive the VoIP voice calls for function two emergency service liaison that needs the emergency response subnet to be able to liaz out through the internet as well. Three, tracking incidents involves that file server being up and running so those incident management records can be accessed. And four, the host website that involves that website down the bottom in that server subnet being up and available, having up-to-date information that can be accessed through the gateway out to the public internet.
So now we've got a good idea of the key elements of this infrastructure that enable those mission elements and those mission elements altogether enable that overall function of the emergency management center. We've also got a couple of cyber elements that are worth considering as well. These are things that are not specific to the business functions of the emergency management center, but because of the nature of how networks actually work, they're probably good to keep in mind. So that central router, if that goes down, things are probably not going to go very well for this network. Same with that firewall. And of course, active dire directory or the domain controller is also essential for this domain. I've also included the
administrator subnet in there because they're doing a lot of your administrative tasks. So now let's move on to the actual incident. We know what's important on this network, but what's actually happened? Well, someone has used John Dorian's account. John Dorian is one of the network administrators for this network and he tried to log in this morning and his credentials did not work. You've been called in to do an incident response to find out what happened and to stop an intrusion if it has occurred. You've been told directly by the director of the emergency management center that the EMC has to remain operational during this time. They're still taking calls. They're still managing emergency dispatch. nothing can go offline.
So what are your initial actions? You install your network tabs. You deploy your host space agents and you baseline your network activity to understand what the network is actually looking like looking like at this stage. Here are your initial findings. You find where that administrator account is actually currently logged on. Concerningly, it's the domain controller. Also on the domain controller, you see that there's a weird registry key sitting in the user init file that and it's a UX theme.exe. Kind of weird. It's not on any of the other servers. So, it's automatically something that's seeming a little bit suspicious. As you scan the network, you also identify that one of the hosts, AR response 10 in the emergency response
subnet, has a registry key that doesn't exist anywhere else on the network. It's a little bit different. It's a bit of an outlier. It's in the run folder. That's suspicious. BLB digital. We'll keep that in mind. Something to look after. Basically, here we'll do an initial triage. So, from here, we learn that admin John Dorian is logged onto the DC and that loon has occurred via RDP. And we've seen that RDP connection going out to the 192 address. Now, yes, I know that's an internal address, but just pretend it's not. That suspicious IP, now that we know what the IP address is, is calling out to two other computers in this domain. Not a surprise. One of those is response
10, which is the one with that weird registry key, but it's also calling out to one of the computers in the control center, call 06. strange, but we're starting to see a little bit of correlation happening that's becoming more and more suspicious. We do a check for process injection. Let's try and see if we can find where execution is happening. And on all three hosts, we find process injection either in Microsoft Edge or in the runtime broker. So, right now we have execution, we have command and control either through RDP or over HTTP, and we have some form of persistence. Things are getting pretty serious at the moment, we're definitely looking at something malicious. So, taking a missioncentric perspective
right now, instead of a traditional incident response perspective, time is really critical for us in the emergency management center. The adversary is at a critical point where they have administrator credentials and they're on the domain controller and they've pretty much set themselves up to do whatever they need to do at this point. So, stopping the kill chain right now needs to be your priority. It's more important than going back and finding what the initial way they got into the network is. It's more important than finding out how John Dorian's credentials got found in the first place. You need to stop this intrusion set and you need to stop it now. So taking those three hosts
offline is not really an option. So the call center and the response 10 hosts are sitting in those two critical subnets. There are people on those hosts right now taking calls and dealing with real emergency incidents. People's lives are literally at stake at the moment. And the active directory, there's only one DC on this network. There's no backup. We can't just take that off and isolate it. The EMC is running at full capacity. So isolation and traditional techniques are not an option right now. Forensics is something we can continue to think about after the threat has occurred. So right now, we need to think about a response actions that prioritize the functionality of the EMC over
anything else. So if forensics is about the past and mission assurance is about the future, you have to ask yourself, what's the next step in the kill chain and how do I stop this adversary's ability to achieve their actions on objectives? So let's have a look at that critical service overlay that we saw before. The red dots are our hosts that have been identified as compromised and we have those critical elements of the networks seen here. Each of those red dots is sitting on a piece of critical service infrastructure that's helping achieve the mission effect. This is of serious concern to us. There's nothing sitting in the management subnet that we're not particularly concerned about. It's all
in mission critical terrain and that makes us really need to make an effort at this point because things are getting quite serious. As you're making a decision, suddenly a system administrator runs towards you. Ransomware has been deployed on the administrator subnet. All files have been encrypted and they've lost control of all the administrator hosts. It appears to me it appears to be the rayuk malware. So depending on the deployment mechanism that's just happened, you've got seconds, maybe minutes if you're lucky before this goes across the entire domain. You have to stop this intrusion set if you can, but you might already be too late. So you do a quick post ransom triage with your team. The DC has mounted a
network share to the admin subnet. That's probably how the ransomware was deployed. And it's opened a new share to the file server and the mail server. There's also an a share opening from call center 6 into the management subnet. You have no more information at this stage. This is all you have in order to make your decision. Remember, at this point, you don't need to know the whole story. Mission assurance isn't about understanding the entire kill chain or understanding where the initial access, the credit access or the lap movement occurred. It's about getting in front of the kill chain and ensuring that the overall continuity of service for the organization is maintained. So gaps in the kill chain
are okay. We have to live with those. As long as we have enough information to make a decision, then we make a decision. We care about the next step, not the past. And if we have time later, we'll do forensics. Being having decision superiority means that we are making decisions faster than our adversary is. And it's more important to get 80% of the intrusion set out of our network and maintain that service continuity than doing 100% clearance but degrade the EMC's functions and have them lose say internet access. So how would we do a mission prioritized eradication? Well, first thing we do, we need to stop the ransomware deployment. But specifically, we have to prioritize
a ransomware deployment on the servers. The servers are our critical infrastructure, the management subnet. In the end, if those hosts go down, a few managers will lose access to a couple of hosts. But those management computers were not identified as critical terrain or contributing to the mission functions that were essential to the emergency management center running. But the file server, the web server, those were identified as critical infrastructure that contributed to the mission succeeding. So if they go down, the whole EMC goes down. So we need to stop there in first before we move on. So we stop the ransomware deployment on the server. Then we can move on to the stopping it on the server or sorry on
the management deployment. Third, we remove persistence across the network and we stop all other in-memory execution that's occurring. And four, we monitor the network for any IP call backs. Uh you do will notice that I have not blocked that IP immediately. And the idea behind that is because I haven't seen the full intrusion set and I've got some gaps in that killchain. I really want to see if that IP pops up again in this network and starts communicating. And if I block that IP, I might not see that happen. So, we wait and we monitor and we see if anything comes back. So, what does this actually look like on the network? Well, first we're going to
drop that RDP session on the domain controller that the administrator account is using. And then we're going to stop that injected thread in the runtime broker. Then we're going to move on to the call 06 and stop that thread. Finally, we'll move to the registry keys and delete all persistence in the registry across the network. And then we'll stop the process injection on response 10. That should mean that all of the actions are continued and we can stop, pause and wait to see if any C2 comes back up. So we wait and luckily steps one to four was successful and we don't see any network activity come back up. So now that we've done the mission
assurance priorities, we can actually move forth and do the rest of our instant response actions. So this is where we can talk to system administrators about resetting credentials. We can go and do our deep dive forensics and actually understand how this attack happened, pardon me, and fill in all the gaps that we're missing. This this intrusion set is much larger than the small picture that you've seen today. Mission assurance and doing these hunts is often like chasing the shadows. You very rarely get a good understanding of what you're actually chasing after until post the event. And finally, you get to reset the network and uplift the security of that network going forward. One of the things I would suggest is a
DMZ. Um, so the adversary today was Wizard Spider. They're a financially motivated Russianbased group originally known for the development of the trickbot malware. Um, and they can pro progress through the kill chain relatively quickly from 2 hours to 5 days on average. So depending on your target, that's pretty quick turnaround. The software that was used for this intrusion set and you saw some of it in action and some of it happened and we didn't see the evidence of it was imitate, trickbot, rubious, adfind and rayuk. So what are the things we've learned out of this? Well, mission assurance does not replace forensics. It's a perspective that you take in certain environments where service continuity is
the priority. It enables critical business functions to continue and where IR can until a time where IR and forensics can take place. One of the ways I like to think about it is mission assurance is like having someone in an ambulance and you're trying to keep that person alive in the ambulance until they can get to a hospital and you can hand them over to a surgeon. That's what mission assurance is. You don't have to be perfect. You don't have to make all the perfect decisions. you've just got to keep that person alive until you can get them to a point where you can hand them over to a hospital. So mission assurance prioritizes the business and service
needs of the organization over the cyber needs of the system. In the end, you need to make decisions that best meet the needs of the service continuity of the entire organization rather than making decisions that best meet the engineering standards of the system or focus on maintaining system hygiene. Mission assurance is a lens that we use to change how we think about cyber and how we deal with intrusions when they occur. And so it's really useful in sectors such as government, finance, and national critical infrastructure because they all need these mission insurance principles because of their reliance on delivering services that are needed for society. So, anything that really needs resilience, anything that needs to be
able to survive in contested environments or deliver service continuity over a long period of time is a potential candidate for these mission assurance responses. So in total, I would like to hope that everyone here has gotten a better understanding of what M insurance is and how it differs from traditional responses, but also how it doesn't replace incident response or traditional forensics. Think of it as another tool in your tool belt, something you can use and apply within certain situations. And thank you everyone for listening.
Thank you very much for that talk, Teresa. It was very interesting. Uh, does anyone have any questions
>> during an incident? Um, what do you feel is the best way to communicate what the intent is moving forward with the response team? Because one of the usual instances I'll have is the response team starts taking actions but they're focused on they've kind they've kind of now they've got the focus straight on well what what what's happening what's the forensic. So what's the best way do you feel to communicate the effectively the commander's intent in this case? >> Yeah absolutely. So having everyone on board from the beginning of an incident that we're here to provide service continuity from the start and taking that through all the way through all the processes. So everyone has to be on the
same page going into this and it also requires just a lot of practice. So I find that when we have teams who have done a lot of traditional incident response and forensics, they tend to really want to deep dive into information and you really have to as the incident manager be able to say hey pull them out and say all I need is the information enough high level to make a decision and they need to be okay with switching that mindset to get that information and then move on to the next task. Uh so it's really about you as an incident manager. you kind of have to take on a little bit more mental load to
keep them on track and keep them at that higher level where they're quickly triaging information and stop them from going down rabbit holes because you don't have time to go down a rabbit hole. >> Yeah. >> Um I noticed that your steps one through four worked effectively in this uh particular example. Um what would have happened if they didn't? >> Yeah. So say we did this example and there was some persistence on this network that we missed. So say a scheduled task or something like that pops back up re we've got something that reinjects back in. C2 goes through. We essentially just repeat the process, right? We assess what's going on and we make a decision about the risk to the
mission and then we make a missionbased eradication plan and then we go through and make that decision. So it's always just based on what you're trying to achieve in terms of ensuring service continuity. So you make an assessment at the risk to service continuity based on the information you have on the network in front of you. how likely you think that the threat that you're seeing on that network is going to threaten that service continuity. And then you make a decision based on that whether a remediation action is necessary and what form that should take.
>> Um thanks for the talk. Um just a question. How do you determine like what assets or systems are mission critical? Like cuz again critical infrastructure you've got like energy providers, distribution, transportation and so forth. >> Yeah, 100%. So it really comes down to understanding like a systems analysis perspective. So taking it from I guess that really high level strategic view, it's not really my job as an incident manager to make that decision. I just do it tactically at the system level. Uh but I would assume that there there are people who determine what national critical infrastructure is and what what those systems should look like. But then as we come in to do it at
that lower system level, we can go in and say, "Okay, based on what the mission is or what the business continuity service that I need to protect is, you can break that down into what that means practically in terms of tasks. And then you can look at your cyber infrastructure and say, how does my cyber infrastructure achieve those tasks?" And then you can literally I I usually just get a highlighter and I highlight what elements of the cyber infrastructure on the network map actually achieve each of those tasks. And then I know like this server is directly supporting task one. This like router is a key router that supports traffic between these two subnets which
need to talk to each other in order for task two to be completed. And you just go through methodically and do it that way. And by the end, you'll have a network map and it'll have some areas that are really highlighted and that'll tell you, you know, those areas of the network that are really important and you'll have some areas that just kind of hang off the side and aren't highlighted at all. And then, you know, like we still we don't want them to be, you know, hacked, but if something happened to them, the mission's not going to fail. You know, your your network's not going to cascadingly fall apart if they go down. So it just helps you understand
the threat level on your network. >> Excellent. And I guess time for one more from over there. >> Yeah. Just um just wondering in your experience, have you seen like uh IT and business teams uh like practice um using mission assurance through like wargaming scenarios. So they can kind of practice that kind of mindset before the the real incident happens.
>> Not really. No >> not really. Not not in practice. >> Not in practice. No. Uh it tends to be something that comes from the cyber teams that they bring when it's >> ah okay. Yeah. So I guess mo most of the uh the driving would come from internet managers just to manage how how the scenario plays out. >> Yes. So it definitely comes from like the cyber elements coming in >> from cyber element. Yeah. Okay. Yep. Cool. >> Awesome. Well, let's thank Theresa again. >> Thank you.