
good afternoon everyone my name is Margaret Farrow I am the head of corporate security at La Decora and if you want to tweet or mess it on or whatever my pronouns are they them today we are going to talk about backup plans for your backup plans for your backup plans this image is a NASA photo of Catherine Johnson at her desk she calculated Ellen Shepard's trajectory for the flight where he became the first American in space among many other voyages but one of her Specialties was calculating backup navigation charts for astronauts to use in the case of electronic failures and so we hope to channel that energy as we dedicate our efforts to hopefully less life or death backup plans here today you might notice the image citations as we go through are all with NASA image numbers these are all from a really cool Gallery where you can find these and many more space images today I'm going to start you off with what I'm calling acronym soup this is a bcdr talk I didn't say that in the intro you may have been tricked into coming here but we're going to start off by discussing some core terms and Concepts in bcdr and other continuity planning just to make sure that everybody is on board with that then I will have an opportunity for people to ask terminology questions please do limit that to just a few questions about key terms and then I'll take broader questions at the end but I do want to make sure that nobody is getting lost kind of right off the bat if there is something you're unsure of there's almost definitely somebody else who's also unsure of it so please don't hesitate to ask questions during that section if you have any then we are going to talk about business continuity when it's going well I call this business continuity as usual but we're going to start with an idealized version then we will discuss business continuity for your business continuity before going a little bit into how you can plan for unplanned plan failures first off when we're talking about backup plans in a security context there tend to be a lot of acronyms in play hopefully many of you are already familiar with these but just in case you're not we're going to start with a little review of some of the terminology that might come up either during this talk or during the question and answers after first off some terms that I've already used business continuity and business continuity plan or planning business continuity is almost exactly what it sounds like it is ensuring that the business can continue to go about its daily operations when people are talking about business continuity they usually mean continuing to go about your normal operations in the face of something having gone wrong your business continuity plan or the business continuity planning process are how you come up with a strategy or a plan in advance so that when something goes wrong you know how you will ensure business continuity Disaster Recovery is a slightly different but closely related concept here we're specifically talking about disasters go a large thing going wrong this often refers to things like recovering from a fire or a flood or a data center outage or a broken air conditioning system in a room that had a bunch of your on-prem servers in it and Disaster Recovery plan or planning is again what you've written in advance as kind of a strategic document hopefully some detailed procedures and the process of coming up with those so that you can be ready when those things happen you may also see bcdr with or without a slash these are both exactly the same thing and here we have an image of Hubble viewing two galaxies merging because very much like two galaxies emerging in this space photo bcdr is just the combination of business continuity and Disaster Recovery all in one we're talking here about the kind of combined holistic approach to ensuring that business continues in the event of an emergency um finally here we've got a business impact analysis a business impact analysis is a way of calculating in advance what the likely effect on the continuance of business operations is going to be if something goes wrong that can be an important part of your risk planning some other common terms that might come up in the course of this include crisis event or incident these are all slightly different but we're going to talk about all of them in conjunction with each other a lot today because really anything along the crisis event or incident scale is going to be a business continuity event and a situation in which you may want to activate your BCP we're also going to talk a little bit about incident response activating an incident response process is a certain type of business continuity activity but this is specifically usually when there's something that your organization has determined to be an incident is happening a crisis is often something affecting many different organizations though you may also encounter it in the context of like Crisis Communications especially if you work with your PR team at all um an event is not necessarily going to be rising to the level of a crisis or an incident but is instead something that has happened that was either unexpected or that was expected but maybe a little bit unusual um we also might talk about recovery point or recovery Point objective often an acronym anachronomized as RP or RPO as well as RT or RTO recovery time or recovery time objective these kind of interlock with each other so your RPO is the state of the system you hope to return to usually referring to data loss and the recovery point is the kind of point in relation to things having um in relation to the recovery having begun that you're going to get back to that in some cases you may be able to tolerate up to several minutes lost data or even longer while in others especially Finance or similar any data loss may be unacceptable and so it's important to set out a shared recovery Point objective in advance the recovery point is what you actually end up getting to recovery Point objective is where you want to go to and then the recovery time and recovery time objective are the same general type of concept but instead it's how long it takes you to restore the system to that state once the recovery activities start I'm going to talk very generally here it's going to be fast and loose and a little sloppy about risk and risk assessments if you want more specific discussions on risk we can absolutely do that in the hallway after but here I'm going to lump kind of all of business and Technical risk together just so that we don't have to get too too specific about the types of risk that you might be handling in a business continuity scenario um with that said before we move on to business continuity as usual I'm going to stay on this one so that I can see you in the audience in case anyone is Raising hands does anyone have any questions about this terminology before we go into kind of jumping straight into the content excellent let's continue then business continuity as usual we are going to talk about the ideal business continuity state that you could be in here you've got a vast library of resources ready to activate for any of the common scenarios you might encounter you are ready for voluntary departures of key people you are ready for layoffs and firings you are ready for fires floods anything that could possibly disrupt your business we're going to start talking about the dream BCP that you could have in this ideal situation with the full knowledge that nobody actually has all of this in a row unless you are very skilled and very lucky and very well supported by the rest of your business even the largest and most mature companies and government organizations aren't actually going to have all of this but first we're going to talk about what you might strive for if you could have literally anything you wanted so these are some of the covered scenarios that your ideal business continuity plan would address fire and flood are pretty self-explanatory here a utility outage could be electricity or water to an office or to a data center a course service outage is where everybody's favorites come in that's where we're talking about DNS outages that's where we're talking about AWS Us East one all those types of things a disease outbreak is the one everybody used to joke about even the CDC they had a really cool zombie plan but probably one of the most frequent ones that you've handled lately if you're working on a BCP at all in a company right now uh temporary loss of a key person could be where you lose internet but only to one person's house or there's not a whole outbreak of a disease but the one person who does get sick is the CFO and they need to sign things um departure of a key person you hopefully get a little bit more notice for there should be a transition plan there ideally that succession plan is really a part of your continuity planning as well um but you need to solve the problem longer term maybe a key holder moved on to a new role in your organization or you are otherwise losing some institutional knowledge a malware infestation could include things like ransomware as well as other infections where you maybe can't trust Key Systems anymore and region-specific disasters are going to be things like the power going out because it stayed really cold for a long time in Texas or there's an earthquake in an area with a fault line there's a hurricane a tornado a tsunami things that you don't expect to have happening everywhere that you have Personnel in a remote environment or in a larger corporate environment with multiple offices but that you do expect with some regularity in the areas that are affected by them and at varying levels of severity there are also a couple more scenarios you'll want to cover like various violent threats to the building but this is an ideal BCP so we're going to assume that all of the scenarios that are not covered on this slide are also covered and you've got resources ready so to address each of these scenarios you will use some combination of these fully tested tools and probably some others but these ones are super common to see in a BCP you've got expert resources you've got data backups you've got backup office locations you've got pre-written support macros you've got IR run books policies and procedures backup Services cross-trained backup people and PR training you um for your expert resources you're going to have people like your legal teams maybe some medical professionals for the disease outbreak situation on call your communication stuff and anyone else you need specialized advice from on how to act in this specific type of situation your policies and procedures should tell the vast majority of the business how they should proceed in these situations and hopefully everyone is familiar with those before you need them because if nobody knows that you have a policy let alone what it is beforehand you have approximately a zero percent chance of it getting used in an actual disaster situation hopefully you have some data backups as well as backup services like a power generator or another AWS region ready to go where applicable um if you typically have people working out of offices a backup office location even if that's everybody go home and pick back up there is really important for any event that would shut your office down for all of those key person risks you hopefully have some cross-trained backup people who can take over a task if that initial person becomes unavailable and for corporate Communications about the kinds of incidents that make the news ideally there are some pre-written support macros to help that team enabled to do their jobs as well as PR training for anybody who might have to give interviews and people who are involved in IR for technical incidents um sorry for people who are involved in IR for technical incidents you hopefully have your run books or your playbooks or whatever your organization likes to call that big thick packet of detailed procedures and supplementary information calling out exactly what should be done in the event of an incident and who needs to do each bit of it so let's say you're in one of those covered scenarios you are experiencing the temporary loss of a key person your CEO has a really bad cold like the worst cold and cannot get work done at all maybe they lost their voice and their eyes are runny they can't even see the computer screen well enough to work if they wanted to the first step that your BCP has covered for you is that a couple of other people have PR training anytime the CEO has to skip an event because of their cold you've got somebody ready to show up and give an excellent speech about how the business is doing great and they can cover any public-facing questions about canceled appearances next you're going to use some of those pre-written support macros maybe the CEO is a figurehead and your email your users are emailing you disappointed because they missed an appearance that people had been really looking forward to but your support team is already prepared with some heartfelt words from the CEO about how valuable the users are because your CEO getting sick is a thing that's going to happen eventually and so you can anticipate this and have them write this who the support team has it on hand when it happens this allows the CEO to tell the users directly that they regret they couldn't make this appearance even if they aren't in any state to write an email at the time when it happens finally cross-trained backup people if your CEO is out sick there are some decisions you should probably delay but hopefully other Executives have the background information and the trusting relationship with the CEO that they can still keep key initiatives unblocked and moving while your CEO is out this allows the CEO to wait until they actually feel better to come back and when they come back they are not coming back to a pile of everything that happened while they were gone being stuck which is never the thing you want the first day back to work having successfully implemented your business continuity plan in this perfect world where you have unlimited resources you are now feeling great about yourself you are well prepared and executed your plan successfully allowing your CEO time to rest and recuperate maybe other businesses would have forced them back to work early but not you this is the dream however this is not typically how things go you may not have everything you need in place before an event happens and unfortunately incidents and crises and anything along that Spectrum are not nearly so polite us to take turns so you might find especially the last couple of years that you are not handling one adverse event with three well-formed Advanced mitigations but instead several events with maybe one of your mitigations working as expected if you're lucky this is where you need business continuity for your business continuity in early 1966 the crew of The Gemini 8 mission were the first to link two spacecraft together in Earth orbit various technical controls were in place to ensure successful docking and if they ran into trouble the Gemini had a command to Take Over Control of the second spacecraft the agina in the case of attitude control failures was a particular area that they seem to have been worried about because there were a ton of extra controls around that the ajino was supposed to obey orders from both Gemini and Ground Control note that redundancy coming in there Mission Control was also available for live assistance if further troubleshooting procedures needed to be enacted however just after the two spacecraft successfully docked completing the core portion of their mission a yaw Thruster in the orbital attitude and maneuvering system or oams misfired as it turns out later probably because of a short circuit this unfortunately happened just after they left the communications range of Mission Control one of those layers down and even though that was planned it still reduced the redundancy available the spacecraft began to Tumble and using the oams thrusters to stop the tumbling didn't stop the roll for long the crew initially suspected that something had gone wrong with the ajina the spacecraft that they had adopted to and so they undocked and backed away but without the agina's mass the rate of spin accelerated quickly and by the time they re-entered Communications range Neil Armstrong who you may recognize from his later work like being the first person to walk on the moon reported that we're rolling up and we can't turn anything off as the spin rate became faster and faster the astronaut's Vision got blurry so they couldn't read their instruments and Armstrong turned off the entire system that seemed to be causing the problem instead using the re-entry control system thrusters to regain command of Gemini 8 and stop the spinning because the new Chief flight director John Hodge knew that Armstrong had used nearly 75 percent of the re-entry maneuvering propellant stopping this he had to make the hard decision to bring the crew home without pursuing remaining mission objectives this was because too many backup plans had been used up at this point for adequate resiliency to continue work on that mission even though the mission objectives weren't entirely achieved I do consider this an overall successful Mission because they achieved their primary objective and importantly the entire team made it home safely we have later work we can talk about from them the Gemini 10 Mission leader that year was able to Rendezvous with the agina and complete some of the remaining objectives as well because of this issue astronauts were given an additional switch in future versions of Gemini spacecraft so they could turn off the individual elements of systems that weren't working properly thereby adding that extra layer of resilience that they would have needed this team needed business continuity for their business continuity if NASA had settled for one or two or maybe even three layers of backup plans and considered things handled this incident could have prevented Gemini 8 from successfully confirming their docking and could even have resulted in loss of life fortunately they were looking at the systems level and also at the people who needed to be involved in the processes and the effects of those people as humans and their tools as Technologies so of course there were unexpected behaviors that could without adequate planning and redundancy create significant risks to the mission and the astronauts involved in it and the use of some of those backup systems like the thrusters caused others to be more likely to fail but they had plans for this I'm going to ask everybody to reflect right now on where your organization might need more support to ensure that your business continuity plan can be followed when it needs to how many people can go out of the office at a single time before some of your core processes become unstable or impossible how many of those people live on the same fault line or use the same internet provider or have the same electrical grid have you adjusted the base rates for the likelihood of disasters in your current environment while living in a pandemic and working probably more remote than you're used to is this still you I see entirely too many of these note that date this is a pretty nice template however templates don't need to be updated quite as often as actual policies so while it's fine that this template version of the document hasn't been updated since March 2020 it is not fine if yours hasn't we have learned so much and hopefully your business processes have been adjusted accordingly but many organizations haven't taken advantage of that new knowledge to improve policies and processes you've got to adapt those to your current business environment off my soapbox