
it's my pleasure to introduce Joel cardello he has over 24 years of experience in IT ranging from everything from network operations sales engagement to information security before working at rapid7 he has worked in multiple verticals including telecom healthcare and manufacturing Joel will be speaking to us about welcome to the world of yesterday tomorrow thanks Joel and none of that matters thank you very much I do not I did not support to purport to be a Thor attea on anything so let's frame the discussion there my name is Joel Cardella I work in for rapid seven global services I want to talk to you today about some things that I'm calling part of a storytelling series and our
story begins in 1986 january twenty eighth nineteen eighty-six at eleven thirty nine a.m. the space shuttle Challenger lifted off the launch pad at cape canaveral this was the 25th mission of the shuttle series and it was a very special mission for some very special reasons that we're going to talk about and find out about 73 seconds after liftoff the right solid booster solid rocket booster dislodged there was an explosion and three minutes later All Hands were lost what went wrong NASA had a very specific plan to create a program of reusability and that program included managing the risk of human lives in space this is the 30th anniversary of STS 51 L which is the space shuttle
Challenger this is all about managing risk and that's what the topic is today so I'm going to illustrate for you the shuttle program what happened in the shuttle program and how NASA who has a very rigid risk management program actually failed themselves by ignoring some of the rules and things that they had set in place specifically to deal with human lives and we'll have to talk about the the shuttle program will have to kind of go back in time and and look at the the things they did and we're going to learn some some names and things and I'm gonna throw a lot of names at you but there's really only a couple of names I want you to focus on
and we'll talk about that when we get there but really again this is a talk about managing risk we want to learn from the lessons of our past so we don't repeat them so again we don't want the world of yesterday tomorrow we really like that the other way around so the shuttle program when the shuttle program was officially launched it was known as the space transportation system STS it was a US government launched manned launch program it ran from 1980 12 2011 it was designed to be reusable and that's really really important because prior to that we did not have any kind of reusability we had Rockets that we would build we would send them into
space the Rockets would explode and the money would be lost that would be it so we started talking about how can we get something that's recoverable so we can actually make space travel economical and affordable we can send things up we can bring him down that's what SpaceX is doing today when they're launching their rockets out into space and then bringing them back down in that amazing platform on rough seas right the dragon x like the most amazing video I think I've ever seen in my lifetime and I've seen all this stuff I was alive when they were launching the space shuttles which was the most amazing thing right when we can get to the point where we can recycle
this stuff we get to this point where Wow economics doesn't become such a huge factor and we can actually do something with this technology we can make it into something that's really exciting and that's what the driver is for the space shuttle program there was a hundred and thirty-five missions that were flown between 1981 and 2011 and those missions were things like carrying payloads doing crew rotations for Space Station's recovery of satellites we would not have the Hubble today were it not for the shuttle mission we would not have the docking ports on the space station Mir or the International Space Station if it wasn't for the shuttle program the shuttle program enabled all of these
other programs to happen so it was a very important part of space travel history to date it is the only manned spacecraft that is orbited and landed which is pretty exciting now the shuttle program itself NASA asked for up to 14 billion dollars for sts Congress approved 5.5 billion what do you think they did well they took the money right but by taking the money they had to cut corners absolutely they had to look at what they had laid out as as a forecast to say this is a 14 billion dollar program we're only getting five and a half how can we manage this to make it work because they needed to make it work okay so let's kind of go back in
time a little bit to look at why we have to understand history the Apollo program was the manned space flight program 1967 it was conceived during the Eisenhower administration it was a follow-up to Project Mercury little little tidbit for you there The Mercury capsule could only hold one astronaut and that was a problem because with only one astronaut that astronaut has to be everything he has to be the pilot he's got to be the scientist he's got to be the whatever right all these things in one so the the Apollo program then was created to allow three people to fly into space now we've got some options we've got one person who's the pilot and then we can have
mission specialists in the other parts of the program now we can start doing some things we can do science right we can do experimentation what have you the goal that John F Kennedy set out was to eventually put a man on the moon okay so landing men on the moon by the end of 1989 it required really the most sudden burst of technological creativity we've ever seen 24 billion dollars was committed to the Apollo program 400,000 people were employed directly or indirectly as a result of the Apollo program supported by 20,000 financial institutions and universities a massive undertaking all under the auspice of putting a person on the moon so the Apollo program especially for the time
24 billion dollars in 1967 1966 this is a tremendous amount of money and so people are not sure if we should be doing this what's the goal what's the point why would we do this and all these questions are being asked but we're pursuing it in February 1967 we had the Apollo 1 disaster where grissom white and Chaffee died on the launch pad during a test when the the capsule caught fire and they did not have a proper escape route for the capsule right on March night march of 1969 Apollo 9 which is the first manned space flight tests of the lunar module happens and then in July twentieth three months later apollo 11 lands men on the moon so between 1967
and 1969 we learned enough about what we needed to do to never lose human life again three astronauts lost their lives in 67 remember that that's an important point but then three years later we made the moon landing we did what we we needed to do right so tremendous success this was an amazing accomplishment for the space program and they completed it in a compressed time frame which means as far as risk management goes they had to make sure that they incorporated all of the elements of risk to be able to do this especially in a compressed time frame so the Apollo program massively successful however massive success sometimes sets you up for failure because now you have to live up to your
past this is what happens to us today when we get caught in these same kinds of cycles in infosec and we have things that are massively successful products are deployed companies grow and they do these things and they go it's never happened before why would it happen again right we sort of become victims of the success of our past so it's really important to understand Apollo is really successful so when the shuttle program comes in when STS starts at stuff they have to build on the successes of the past which immediately means what political pressure right we've got massive amounts of political pressure but NASA's strong right a NASA will persevere one of the biggest pieces of
political pressure they had on shuttle mission 25 was this woman anybody who is this is that's Christa McAuliffe Christa McAuliffe is significant because she was to be the first civilian in space not just the first civilian in space the first teacher in space she was chosen from sixteen thousand applicants to go through astronaut training trained with astronauts fly in the space shuttle program and deliver a lesson from space which was an amazing thing to think about Christa McAuliffe though especially if you're younger and you weren't around is she was a media darling she was genuinely charismatic she was real she was somebody people could relate to now with Christa McAuliffe going in space we can all go
into space this is like the coolest thing ever so there was a tremendous amount of media coverage and attention around Christa McAuliffe because she was such this media darling she was on all the major media programs good morning america CBS Morning News The Today Show that's when we only had three channels guys all right The Tonight Show I mean all this stuff she was she was out there and everybody loved to everybody loved Chris McAuliffe so let's talk about some of the technical things that actually happened that led to the failure remembering we've got these other things in play and they all kind of kind of lead together so what really happened was there's these o-rings thats it
inside the solid rocket booster these o-rings are about 12 feet in diameter and what they do is they feed through the solid rocket booster motor joints to allow flexing when the the shuttle is being launched okay because things like air pressure and gravity are causing these stresses on this vehicle as it's it's going into orbit right there's this flexing that needs to happen as heat causes the gaps to widen the o-rings have to pop back into the gaps and make the seal in milliseconds so as a flex happened a gap occurs the rubber in the o-ring has to expand to fill the gap if it doesn't hot gases will escape if those hot gases escape it's very very
likely those hot gasses catch fire if they catch fire that means an explosion especially because the solid rocket booster is right up against that main fuel tank the solid rocket the solid fuel tank right does that make sense everybody got that okay so here's the issue when they launched from Cape Canaveral Florida that day it was 29 degrees in the morning here are some photographs from the actual launch pad that morning these are icicles and this is southern Florida this is very unusual this is a year that the orange crops were completely decimated because the cold was so severe and so bad all right so we have this issue with cold do you know what happens
to rubber when is very cold it stiffens it doesn't flex and problems occur okay now we are not a group of NASA engineers anybody the NASA engineers in here no offense okay well at least one we are not a group of NASA engineers we understand cold affects the properties of rubber the question is if we know this and it's sort of common knowledge why do you think this wouldn't occur to two people running this program so what happened is the o-ring failed there was a small escape of hot gas the hot gases caught fire there is an explosion the explosion is not by the way what killed the astronauts a lot of people think that that's true the explosion what it
did is it dislodged the main cabin that housed the astronauts and they fell 30,000 feet 40,000 feet hitting the ocean at a speed of about 300 miles an hour and that took about three minutes of free fall and that's what they think actually actually killed the crew right so it wasn't the explosion so why didn't anybody anticipate the scenario why do you think we know that rubber doesn't work under pressure why do you think nobody anticipated this Oh somebody just said it who said they did because you know what they did they absolutely did they knew what would happen now I'm telling you this in retrospect right they knew it would happen because we now
know they knew it would happen but the steps that it took for us to figure out why they knew it would happen are all the failures that I'm talking about in managing risk that's what's going to come out here that's what's part of this story very specifically written documentation showed that they had proof that these problems happened with their o rings which we'll talk about a little bit so this man up here is George Hardy you don't remember these names I'm just pointing out George Hardy and this is Larry Malloy these were people in NASA who were part of what was going on when the engineers at a contractor called thiokol went to their superiors and said we are very
very concerned about the shuttle launch we think that it's too cold and the o-rings will not perform if it's under 53 degrees we think you should scrub the launch ok this is the contractor of the solid fuel rocket booster right they go to their management and say we have a problem the management goes well what's the problem it's these these tolerances these temperatures we have an issue we have to make sure that they don't launch so they have a convene about 6 p.m. the night before the launch the management calls up NASA they go NASA we don't think you should launch NASA's like what and these are some quotes I'm appalled I'm appalled by your recommendation not
to launch when do you want me to launch next April here's what you have to understand this particular shuttle mission which all eyes are on because Christa McAuliffe is the first civilian in space has already gone through three scrubs they've had three times where they were going to go to launch and they didn't it also is the exact same day as the State of the Union speech by the president now there was some research done to see if any of these things are related and there wasn't any concrete evidence found that these things are related that their political pressure was so great that they had to launch on that day so the president could talk to
Christa McAuliffe from space but anecdotally it's probably there and there are issues with that that will look at here in a minute right so all this comes out later by the way through this investigative body that we call the Rogers Commission held by chaired by by chairman Rogers okay so this is the Rogers Commission made up of some pretty significant people we've got Sally Ride the first woman in space we've got Neil Armstrong the first man on the Moon we've got Chuck Yeager the first man to break the sound barrier we've got Richard Feynman a Nobel Prize winning physicist Richard Feynman is going to be the focus of the rest of what I talk about because he's a fascinating
individual that really went outside the realm of what he was tasked to do to figure out what these problems were and I credit him for finding these issues of risk management and really bringing them to light but suffice to say these are important people the two people that I want you to remember though are Richard who is a Nobel Prize winning physicist on the panel and this is Donald Katina he's a general well entrenched in NASA really knows all about the space program he was a fighter pilot he was involved in many many many aspects it also turns out that he was super politically savvy and a puppet master and he is the reason that we know what went on today why it
went why not okay we also have a special guest that I'll talk about later now one thing you have to understand is I'm a thespian and so we have this thing in theater where we draw the curtain back slowly so I'm not going to tell you who this is now until the end because that's all suspense and drama that I get to build up and that's good for me right so Feynman or Fineman however you want to say it um he gets this job because he has this in quitted inquisitive nature they call him up and they say we want you to do this job and he goes I'm a physicist I'm not a politician I don't
want to deal with this this is ridiculous his wife when a--the says all right look if you don't do this here's what's going to happen they're going to get 12 people these 12 people are all going to walk around in a group doing things and seeing things and seeing the same things and coming to the same conclusions if they pick you they're going to have 11 people going around doing the same things coming to the same conclusions and one guy running around figuring everything out he's like yeah you know what you're right and that was Richard Feynman he was the guy running around figuring everything out chairman Rogers even told him more than once you're a pain in the ass stop it you're
causing problems he actually wanted him to stop the investigative efforts he was doing fine man was never really sure where his line was so he just started crossing them he's like all right well if you're not going to officially tell me I can't do this I'm just going to go to it right so he had a somewhat of a shaky start people were upset with Richard Feynman because he was asking questions right and people don't like to be asked questions especially when they're put on the spot now think about it we've had a shuttle disaster it's a national disaster we've had all this media attention around Christa McAuliffe it's a big deal all eyes are on what's
happening the Rogers Commission people are very very very nervous and some crazy guys asking them a bunch of questions right what's their first inclination right as little as possible information fed kind of like working with an auditor right when you work with an auditor you answer the questions the auditor asks and you don't volunteer more information now I love auditors I volunteer more information I am a completely different individual because I understand what that gets us in terms of helping us go forward so did Richard fine man so I like to think up a little bit like him you know without the brains and stuff right but really fineman's big disappointment here is he doesn't like
the sterile conditions of the testimony because what's happening in these congressional hearings is they're pulling people up there sitting in front of a congressional inquiry it has a rodgers Commission there and senators and congressmen and there they're testifying in front of a microphone you know similar to some things we've seen in this political season right that's exactly the way it's going in fineman's going nuts because he's not a politician he's like no I want to talk to the engineers and he gets to talk to the engineers once or twice and they start telling him stuff he's okay i'm really excited by this because he's a technical guy even though he doesn't understand you know space travel in the space
program he gets to talk to them about their technical problems and he said yeah tell me about your problems and he really starts to figure out what these route indicators are that wind up being the total problem at the end but only when they let him talk to the subject matter experts and that's one of our lessons here is we have to listen to our subject matter experts especially when we're dealing with risk you want to go to the people who understand most about what that risk is to get the information to make a decision that's exactly what fineman's doing right so he starts meeting with engineers he's making these small discoveries and he's figuring out
that these are smart enough people that they should have known that these rubber seals were a problem turns out they did and they have written documentation to show it but this is what he's suspecting at this time he's running around he's like yes I can see you know what you're talking about you have all this great knowledge so really what's the deal he makes the first of several critical discoveries if the technical people know what's going on but there was no communication no one has discussed the problems between their flights every single shuttle mission the 24 prior shuttle missions showed problems with the o-rings they have a thing in in space travel where they do a flight
safety prep check they do a check before the launch they do a check after the launch and they followed their procedures to the letter every single one of these checks showed problems with the o-rings for 24 missions but they lucked out 24 times and it never failed it was never talked about between flights and he went why why did you not talk about this between flights and the answer he got back was because there wasn't a failure what was there to talk about right has this happened to you have you had issues where things go wrong but they're not talked about until something goes wrong and causes you to talk about it why would we do that this is 30 years ago
why are we still doing that that makes no sense right we want to be able to have these conversations if you're managing risk risk is all about communication you have to establish those lines of communication and talk about those things especially when problems appear so you can discuss why the problems or problems which is what they should have been doing here the reports that they dug up actually mentioned the joint seal as being most critical to operations so what that ol ring was sealing up it's the most critical piece of the flight but then the report also says that safe flying can continue if they pass their checklist and fireman goes wait if we have something that we rate as
absolutely critical why would it not be absolutely critical why would we say it's okay to fly it shouldn't be every one of these other 24 reports said we had failures if it's critical in it's a failure you don't fly that's his conclusion why are we flying it's a good question he also makes another critical discovery when he looks into the computer simulations they had poor risk tolerance so here's what happens there's a few people who are making decisions and they're making decisions based on their available information and what they did is they said okay if we have a set of conditions that executes under these conditions and it causes us not to fly how can reduce the
expectation of the conditions to get us to fly so effectively what they did is they took the risk tolerance which they already had here and said we will not lose a human life which came directly out of the Apollo program remember we've lost three astronauts we will not do anything to lose a human life the actual quote is something like if any party disagrees that this is a problem in human life is at risk we don't fly but at NASA they decided to shrink their criteria so their risk tolerance got lower and lower until they allowed something to happen have you encountered this before perhaps in your jobs where we we know we have a set of criteria and
we say these criteria exist for this reason and if we work within these criteria we know this happens and when the criteria change all of a sudden we get unpredictable results that's risk management the same thing is happening at NASA but it's a little bit more critical because they're dealing with human lives maybe an infosec we're not dealing with human lives all the time but we've got medical IOT now we should be paying attention to these same lessons we should be learning from this so Fineman um he's a ghast at these discoveries like he's just his mind is blown the other people are politicians are like add happens all the time right I just that's that's the way it works
right so he's going in a no-no so he's looking like like how can we assess this how can we do this so he goes to the National Air and Space Museum and at this point he's really down on NASA he's like I can't believe these guys let this happen and that's kind of a quote they let this happen he goes to the National Air and Space Museum and the director of NASA brings him through there and shows him a film on what it took to actually get the space program going and something clicked with him and he went wow I can't believe this many people were involved in this massive effort put all this time money and energy into it
and it failed it was I can't believe that and he changed his mind set being anti NASA to being pro NASA to go you guys are awesome and you do awesome things and let's find out why you had this failure and that's what I'm suggesting to you if you're in a situation where you're facing some real negativity and you're really sort of against what these forces that are causing you this negativity understand what the drivers are behind those forces try to change your mindset it will help you when you're assessing that risk because it will help you with things like that risk tolerance because maybe you can reduce the tolerance a little bit just not to the level that you're
being requested that's that's working right that's moving room that's you negotiating and that that's useful and that helps and it helped Fineman a lot but this is the cool part so there's a key moment so general katina remember I told you about general Katina he has Richard Feynman over for dinner and he's talking about stuff and he brings him to the garage and he's showing him as 1973 opal and Fineman knows nothing about cars and could care less Katinas going yeah this is my car been working on it and oh that's the carburetor over there and you know what's funny I've been working on the carburetor and I noticed something that when it's really really
cold the seals in the carburetor they don't work right what do you think happens to seals when it gets really cold finding goes well they don't work aha and he has an aha moment this was absolutely orchestrated by kortina find men later says in his memoirs I'm pretty sure somebody told him that this was the problem and he directed me toward it in his way in 2012 we find out he's absolutely correct that is exactly what happened right and I will talk about how that happened because it's brilliant so the engineers concerns are starting to come to light so here's the thing this is Alan McDonald he was one of the chief engineers on the NASA side I believe he
comes to a public meeting uninvited so they're having these hearings he's Rogers Commission hearings and this guy walks out with his engineers and sits down no I'm sorry Alan McDonald word for thiokol he wasn't asset he was thiokol the solid rocket booster makers and he sits down and they're like well who are you and he's like I'm both I a call and they're like well why are you here he goes because I'd like to offer testimony and here's what he said we recommended to NASA that they do not fly under 53 degrees this is shocking to the Commission they had not heard this before they'd heard all this testimony and nobody had said that there was a recommendation not to
launch they said well is that true and they said yes but we reversed ourselves under pressure from NASA which is exactly what happened the thiokol management went to NASA they said guys don't launch NASA said when are we supposed to launch thiokol went okay you're right and internally they had some problems where thiokol said to the engineers take off your engineering hat for a minute and put on your management hat so we can figure this out which means we disregarded our SMEs we disregarded the knowledge that they were they were giving us right that's effectively would happen but the Commission had never heard this this is the first time the hearing us and they are genuinely shocked Fineman is pretty
convinced that he knew that they had a temperature problem with the o-rings by this time he's like all evidence points to the fact that they knew but he has to figure out how to show that they knew without having documentation that officially proves that so so far we know that there's problems with the seals that were not properly communicated out right that's the X that's the suspicions so we're trying to prove our theory we know we had problems with Morton Thiokol management bowing to pressure and we know NASA is accepting risk beyond their tolerance it's a recipe for disaster at this point we know this but again we only know this in hindsight that's the
problem with risk management when we manage risk we manage risk in the moment and it's difficult for us to manage risk for the future because the conditions of the future change so much which means when you're managing risk and you have constraints you need to stick to those constraints as closely as you possibly can so the outcome becomes what you want the outcome to be does that make sense okay NASA is not doing that all right so fineman's looking for better answers so Fineman does something that is a pivotal moment in in the discovery he stages an experiment and what he does is he gets a glass of water and he's got an o-ring that he's pried off one of the shuttle
models that they're using the Commission hearing to use as people are talking about what's where and he drops this o-ring in the glass of water katina is sitting next to him and sees what he's doing after a minute Fineman reaches for his mic and katina goes no not yet Hyman's like okay right let's another minute go by reaches her as Mike Katina goes Ben and a hold on not yet oh yeah okay but another minute passes he reaches included Tina goes hold on he goes when he gets to the testimony where he's saying this that's when you do it imma goes okay so Larry Malloy is testifying he gets to this point where he says and we had no
indication find me goes excuse me and he pulls the o-ring seal out of the water and he'd had a seat clamp on it pulls a c-clamp out and the thing very very very slowly starts going back into shape and he has a famous quote where he says I believe this has some significance to our problem because remember they were supposed to pop back in milliseconds the press went crazy they saw the experiment I mean this was drama right I still drive as a thespian this is what we live for right this is the critical moment Wow fantastic this is amazing and he makes this this sort of critical discovery that should have been obvious to everybody which it turns out it was
obvious to everybody but we ignored the obvious nosov it and because of what was happening and that's the politics in play right so the press is reporting that NASA is under great political pressure to launch after this this actually turns the heat up on NASA katina is a really keen political observer he points out that the Commission has many weaknesses in its membership basically everybody's tied to NASA fineman's the only outsider and so he's tellin Fineman you're the only one that can really get to the truth of what happens here because you are the person who comes from the outside and doesn't have any emotional attachment to what's going on he was just a fact-finder
somebody from the outside and this is why I sale of Auditors because really that's what auditors are there fact finders especially when they come from the outside may be internal auditors are a little too close to it right but external auditors their fact finders that's how they should be viewed somebody who can help us determine what these root causes are and when we're managing risk sometimes we need an extra pair of eyes to give us that fact-finding opportunity right we're just too blind by our own pressures by our own politics by everything else that's happening to actually see what we need to see so Sally Ride for an instance she still had a job with NASA
Fineman was the invincible man Neil Armstrong was a consultant for NASA so even though these are amazing people they've done amazing things they still have these very strong ties they're blinded by what they're involved in right some of the other political forces Reagan had announced in 1986 the shuttle program will within a year put a teacher in space called the teacher in space program so we are focused on getting a teacher in space there's a lot of politics behind this because the president makes this announcement there are people who want to please the president and they're going to do these things so maybe there wasn't direct thumb2 to noah's pressure on getting this done but certainly if you work in
the government and the president issues a dictum you're going to do the best you can do to make that happen right shuttle launch is the same day as the state of the union now Fineman check this out and he said I don't believe it this is true but here's what I'll tell you I believe that there were probably some people who are ambitious who wanted to make this happen so the president looked good so it was an awesomely staged event again there is no direct proof of that but we can speculate we can say probably there were people with some ambition and maybe some ego that caused that to to occur we've got this frenzied media coverage
around christa mcauliffe and up to now we've had these significant launch delays these are all problems that NASA's facing when they decide that they're not going to scrub this fourth time right so I'm not saying it's right or wrong I'm saying that based on what was happening in the focus of what they had in the sphere of what they could control there are things forces that are causing them to make decisions and that's what's happening to you when you are having problems managing risk or when you're having problems getting management to understand probably what you don't see is some of these other forces that are happening you might have a CEO that issues a dictum and ego and
ambition you're getting in the way right you might have some frenzy around a new product or service that's being released and you're going crazy going but we haven't even you know we haven't even looked at security architecture for this yet right if ever all right all these things will play to the to the same things that are happening to you remember this is 30 years ago so the same things are happening today right well the engineers spoke out and because of the Alan McDonald testimony thiokol is called in for a more probative inquiry and they actually asked the engineers raise your hand if you were in favor of the launch and not a single hand went up and they're like
okay so Fineman goes let me ask you this who's your most important engineer who understands o-rings and they name names they named raja lee thompson Cap'n birds he's like okay three of those people are here what did you think mr. Raja broadly they said I recommended we didn't launch and what did you think I recommended we didn't launch what did you think I recommended we didn't launch yet the testimony is that it was roughly evenly divided in the decision to launch our not to launch fineman's going that doesn't make sense that can't be true you know you're playing politics and we're trying to get to the root of this problem why are we having these issues
this doesn't make sense so he talks to the managers who say the workers aren't as disciplined as they used to be so at Thiokol they're like yeah these they used to follow all the instructions now they don't when he talks to the workers he's like they're doing everything they were supposed to do they're doing it by the numbers when he talks to the the workers alone with one manager present the managers surprised to find out that the workers wanted to talk about what they did but the managers wanted to shield them from having to go through this inquiry because they saw this congressional inquiry process as being something that was super scary that they might not want to deal with right so he
talked to the worker he finds that their frustration and dealing with change these are all communication issues these are all breakdowns in communication management thinks that there's a problem that maybe isn't a problem the workers think that there's a problem that isn't a problem stuff is being communicated up that stops being communicated that's not being communicated down these are the same problems we're facing today it's communication based it's the most important thing we can do right the safety officers define failure as one and a hundred the NASA estimate is one in a hundred thousand for a failure the difference between one percent and point zero zero zero one percent of failure why would the safety people think it's
one percent and the managers be so far off this driving Fame and crazy is that why why would we have this disparity because of these communication issues and they chose at the level to accept a higher tolerance for risk than the safety engineers who were arguably the subject matter experts we're giving right does that sound familiar do you hear these things happening and we need to identify what these things aren't we need to point them out when they do so the big findings the biggest contributor to the accident was poor communication we had critical safety concerns they weren't reaching those who needed to hear that they just were not they were not going up to management in one case the flight
commander at NASA was never made aware that anybody at thiokol had objected to the launch and that is a critical communication failure nobody at NASA passed that on to the flight commander who would have immediately scrub the launch because of that rule the Apollo rule which says if anyone disagrees we do not launch because human lives are at stake right they didn't accept the judgments of its engineers who actually agreed with Thiokol that there was the design flaws in what they had and the testing showed that and then NASA wanted proof of their stated problem when they were on that call and they said we reckoned we want you to scrub the launch NASA said well prove to us that they
don't work under 53 degrees and less than 24 hours to launch what are they supposed to do here's data the data pretty much proves it and they're not believing the data that's put in front of it have you ever heard that before right that's pretty obvious management had faith in the Machine finding has this great quote what is the cause of management's fantastic faith in the machinery I will turn that on you a little bit and ask you what is management's great faith in technology today why do our managers believe that there's a big red button for security that says we're secure why why do you think why do you think what is it they're not tap into each
other what else what what other kinds of things can you think of they want to believe it money money right what you say easy answer right messenger good shot trust if you haven't seen a failure everything is working that's exactly right if we have never seen a failure before why would we see a failure in the future here's what i'll counter with that though to you it's a valid question to ask you have to prove it you have to prove why you think it will fail in the future where you say yeah bonuses incentive right all of these things but really what don't what don't they understand about the fact that we need to have the secure architecture and the
things these are the things you need to think about right complexity is definitely an issue how do you explain it how do you explain it right a big red button is easy what you're talking about it has all these moving parts I don't like it right
absolutely there's so there's the knee-jerk right Wow we don't have anything and we have to have all this stuff and now all of a sudden dollar signs start going off and people have birds and stars flying around their heads because they're knocked out with how crazy this is right just a lack of understanding so again it comes back to communication you need to communicate this is the most important thing so the decision to launch contributors we've got the 1958 mercury prop Project Mercury operational ground rule which I've already told you no man's flight undertaken until all parties responsible felt perfectly assured everything was ready they ignored it this had been in place since nineteen fifty-eight right
the Apollo 1 disaster happened because of a failure of equipment it did not happen because parties were not in agreement and there were other launches that were scrubbed because party were in agreement but they ignored the 1958 operational ground rule we had engineers who expressing the safety of orings they had presented a convincing argument to their management at thiokol larry malloy said the data is inconclusive at NASA Gerald Mason and one of the managers take off your engineering have put on your management hat right all of these things are happening that are contributing to this failure it's not one thing that's the lesson here it's never about one thing it's about all these things and when we start to sum
them up and they aggregate it becomes awful the final launch decision belong to Jesse more he was informed of concerns but was told they had approved the launch he was never informed that they had objected that's a problem that's a huge problem that was the person who could have saved the lives but he did not have the information that was required to save those lives uninformed NASA management they had high-level managers they insisted that they were unaware of things like the recent problems of o-rings that they didn't have a clear understanding the concern and the Marshall Space Flight Center project managers failed to provide information fully to their project managers all communication failures guess what happens februari
third of 2003 fast-forwarding the time the same thing happens this is the shuttle Columbia the shuttle columbia disintegrated on impact and what they found when they investigated the shuttle Columbia was the same failures in a risk management that had happened with challenger happened with Columbia why why did those same failures happen that's a whole nother talk right there political pressures that were happening at the time but how can we make it better which is really the the core of why we're here so recommendations of the Rogers commission that we're going to apply to what we do today one of their first recommendations was promote astronauts to management positions basically what they're saying is make your subject matter experts the
operational manage the operational activities this is what should be happening right we should not have managers managing operational activities who don't understand the operation because management and operations are two different disciplines there are two different philosophies and when you're involved in an organization that is so operationally entrenched like Space Flight you have to have subject matter experts at the operational level being in management they recommended that they redefine the responsibilities and maybe that's something you have to do to maybe you have to redefine responsibilities so the people who are right for those roles as we say aces in their places right that they have that voice that they need to have that they get there and they get
to where they need to be the Commission record recommended they establish an advisory panel with representation for many different areas and organizations absolutely that's what a review board is like a security review board right that's a really great thing establish an office responsible for reporting documentation of problems problem resolution and trends they ignored their problem trends they just ignore them but you need to have an office that reports them changes of personnel organization indoctrination or all three to eliminate the management isolation which happened you might not have control over that but sometimes that shake-up is good develop policies which govern the imposition in the removal of constraints establish a flight rate consistent with resources if
you're in DevOps maybe you've got way too much in your queue right now to deal with the output your flight rate is now being impacted right what you need to deliver is being impacted by the queue itself it's got to be managed it's got to be focused it's got to be able to be within constraints that you can manage with risk and then are you part of the problem so Malloy explains how the seals are supposed to work in Fineman says in his usual way he's using acronyms it's hard for anybody else to understand we're all guilty of this our industry can have entire conversations using 12 letters that's insane to an outsider they have no idea what that means right
we need to work on this this is one this one's on us so I'm gonna jump forces or jump forward a little bit here the force is working against you when you're doing this is you've got political forces economic forces ego and ambition these are all things that conspire to work against you and it's difficult to overcome but you need to be aware of them so you can manage them when they happen you may not be aware of the politics so become aware ask questions you may not be aware of the economics ask questions right ego and ambition these are personal problems either these are things were going to run into and they're going to be issues establishing
a rift spring the first thing in any risk assessment is to establish the frame include assumptions and constraints it's so important because that's what you're going to work under when you're managing risk these are this is my set of Tolerance this is what I have to manage to and be like Fineman Fineman is just a pain in the ass walking around asking questions asked questions don't just assume something is a skit it's really really important all right so let me jump to hear the human component I've talked to you about Richard Feynman I talk to you about katina I've said Katina was a mastermind Fineman in his memoir said I think Katina heard it from somebody at NASA
probably an astronaut that these o-rings had a problem but I can't prove that Katina never said a word until 2012 in 2012 he said I was walking down the hall next to an astronaut at NASA they pulled a piece of paper out of the notebook and handed it to me without looking at it and on that piece of paper were two columns on one side was temperature and on the other side was resilience of the o-rings this was a NASA internal memo so they knew at NASA that they had problems with o-rings and temperature now who do you think gave it to Katina who do you think I've actually talked about that person was Sally Ride who was on the
rogers Commission who had ties at NASA and with those ties used it to pass the information to Katina in a covert way he then took that information puppet mastered with Fineman and got the information out right she still worked at NASA she risked her career he still worked at NASA he risked his career how could they get the information to where they need to get it to where it needed to happen pretty brilliant and maybe that's what you can do to pretty diabolical so you got to be thinking pretty diabolically it's pretty tough so for a successful technology reality must take presidents over public relations for nature cannot be fooled so I'm being asked to stop
which is fine do we have any questions for any of this do you understand a little bit more about managing risk about how it has to be framed and how we have to contextualize it it's incredibly important to understand as we go through before I close I want to say one more thing I want to talk about this man Bobby billing babi guling was one of the engineers at thiokol Bobby billing personally felt responsible for the deaths of seven people so personally responsible that on the day of the launch he told his wife I'm going to take a gun into Mission Control and I'm going to stop them from launching if I have to kill everybody there that's how
personally he took it after the launch he spiraled into a deep depression for 30 years before he died this year in January he took it personally until some people at NASA and other people who had heard his story said it wasn't your fault my message here is this you can be emotionally attached to things and they can really really really bother you but what you have to understand is that sometimes they are out of your control and if they're out of your control deal with them as best you can don't internalize them right Bobby bling died with a clear conscience but I don't want you guys to walk away from something with a bunch of negativity just because
there were things you couldn't control right things out of your control they're out of your control manage risk the best you can understand your constraints that's the big lesson there's a bunch of references if you're interested I recommend reading any Richard Feynman's books they're fantastic thank you very much