BSides Vancouver 2022 Keynote: Systems Thinking for Security with Eleanor Saitta

Name: BSides Vancouver 2022 Keynote: Systems Thinking for Security with Eleanor Saitta
Uploaded: 2022-07-11
Duration: 53 min
Description: Eleanor presents a keynote presentation on "Systems Thinking for Security" for this year's BSides Vancouver Security Conference.

BSides Vancouver53:0093 viewsPublished 2022-07Watch on YouTube ↗

About this talk

Eleanor presents a keynote presentation on "Systems Thinking for Security" for this year's BSides Vancouver Security Conference.

Show transcript [en]

day two of b-sides vancouver uh i'm really really excited to have eleanor saeeta here with us and um sorry getting distracted here uh she is the principal security consultant for systems structure limited which is a boutique security consultancy that deals almost exclusively with fractional cso services for series a b and c internet companies she has been in for 20 years with uh consultants such as io active and bishop fox did a stint as the security architect for etsy and has spent five years supporting ngos that were targeted by nation states so as a wealth of experience to share with us and uh i'm just going to hand it off to her and thank you very much

thanks cindy i'm super happy to be here um so today i'm mostly going to talk about systems thinking for security and we'll get into a little bit in a second what that means um but uh because it's been at least for uh the americans in the room an interesting news day i've got some other stuff that i want to talk through a little bit as well just uh because i don't know it's not really not really in the headspace to just give a a straight uh straight systems theory talk today so um when i say system in the context of really most conversations but especially talking to security folks i'm talking about a system that exists to do

something in the world and so this means yes you know a set of applications yes a set of computers etc but it's also a set of business processes right it's a set of all of the things around that system that actually make it deliver outcomes in the real world not just kind of this thing off to the side that's um you know that's just doing a technical thing it's about the thing in the world now to be useful a system needs to have certain properties and these are properties that come from the entire system operating together in a specific context um and just jump ahead and look at so these are these are properties like um correctness right if the system

doesn't do the thing that you think it's going to do then it's probably not going to you know going to have the function in the world that you expect it to um performance right if a system isn't sufficiently performative for um the you know for the problem it's trying to solve for the money that you can afford to run it etc then you know you have to fix that right you have to have um you know you have to have it have enough performance enough efficiency etc to actually meet the problem domain and accomplish those goals you probably need to know what the system is doing um and from our perspective you need it to

be secure um and also resilience we'll talk about resilience a little bit more later because that one's a bit more complicated but so all of these are properties that emerge from the entire system running in a context right you can't say that um you know the correctness of one component in a system is not the correctness of the whole system and the correctness of the technical system is not the correctness of the system operating in the world right that's the you know that requires the human parts of the system as well as just the digital parts um and all of these things are things which require unified effort across a bunch of teams to deliver right no one

team can deliver efficiency no one team can deliver performance etc all of this stuff is is hopefully fairly obvious um so now let's talk about what security is a lot of us are kind of coming from a mindset where security is a specific property of technical systems right you can say that you know oh this uh this web front end is secure it has no cross-site scripting attacks except i think that that's actually not a very useful understanding of security when we look at the work that we're actually doing right because what we're actually trying to do with the system is to enable some set of people to predictably predictably accomplish their goals in the world right that's what we

actually care about we don't actually care about cross-site scripting we care about what this thing actually does and we expect them to be able to do so in the face of a set of a chosen set of adversaries i say chosen set because um you know you probably are not going to in most cases build a system that says no one in the world can possibly interact with this in a way that we don't want because you don't have the budget for that right you're looking at the set of adversaries that you think are relevant to the work that you're doing are likely to actually be attacking you and also the set of adversaries that you

can afford to defend against because you can't afford to defend against everyone um and oh did that ah did that font get cut off yeah it did um anyway so predictably prevents that chosen set of adversaries from uh accomplishing a separate set of goals right so this is um you know an adversary doesn't care about stopping your users from doing anything but they want to use your system to mine bitcoin or do whatever it is they're doing this week right so it's it's both the positive you know the uh the chosen the chosen users can accomplish their goals and a negative the uh you know other people can't accomplish their goals using your system and so and this is a this is a set of

this is a definition of security that we can actually reason about more directly um you know we can say is a system secure with respect to a set of goals even if individual components are compromised right um and reliability and predictability and correctness here become interesting um you know where we know that all components eventually probably have some set of vulnerabilities in them and that's fine well or it has to be fine because we don't have another option right so another way of saying this is that it's about reliability and correctness of outcomes in the presence of an adversary and it's also about defense of those outcomes right because it's not you know this is the security

is a process etc um you know so there's there's kind of again there's two components here and this is interesting when we start talking about resilience so um if you're if you find this this next bit interesting i'd recommend looking up some of the work of the resilience engineering association resilience is basically the ability of a system to deal with failures that no one was expecting right we are relatively good at this point in engineering of saying okay yes we have fault tolerance we have failover we have backups etc there are a bunch of categories of failure of modes of failure that are foreseen right that are expected right we expect that we might get data loss in some context because we

know it happens enough so we have backups you know we expect that um you know that individual nodes will crash so we have multiple of them we expect that ddos this might happen so we have you know load balancers and upstream filters etc so we have these we have these predictable failure modes resilience is um you know when you have an unpredictable failure mode and this is often um you know any any real compromise generally becomes or often becomes a resilience exercise unless you're in a team that's big enough that dealing with compromises and kind of the flow of individual compromises becomes predictable which is honestly pretty rare because if you can predict a compromise mode you probably fix it and

you know instead of having to deal with uh responding to that same instant again and again now this is interesting because resilience is not a property of code right resilience comes from the human team who is managing the um managing that system and uh when we look at designing for a kind of a resilient security right a security that can allow us to deal with more of the unforeseen we can look at this from um uh sort of a principle-centric approach and this is something that i've been spending gosh well it's been it's been probably like 15 or 20 years in some ways thinking about now and it sort of started from looking at um kind of cia and you know cia tried

and some of those existing kind of models for how do we think about um modeling the security of an application and realizing that actually these um kind of fail to capture as much as they do you know as much as they do capture and they're not very good for um like they're they're not very good for reasoning ahead of time about the security properties of a system right um and and even you know when you think about it the requirements level how do you specify that a system must be secure well often you know you have the you know the um technical design dock and there's a section at the bottom that everybody leaves blank which is security um and

everybody leaves it blank in part because they don't really know what to put in there and they don't really know how to think about this structure um so this is the um this kind of principle-centric approach is is um my approach or or you know a start of an approach at how do we get better at thinking about security ahead of time how do we get better at um at you know doing the kind of design and architectural work to understand if the system that we're about to build is going to end up being fit for purpose so i'm going to stop here a little bit because it has been an interesting political day and

as security engineers we're all responsible for the impact of our work on people's lives this is especially important when we are in a situation where the um the security environment of the of the kind of the world we live in is changing pretty quickly um now a lot of companies will decide that well we're willing to overlook this harm right we're not willing to spend more than x effort to protect our users or it's more important that we reduce the friction to people signing up to our system than it is us um you know ensure that our authentication is sufficiently strong for the purposes that we're aware that people are using it for um i would argue pretty strongly that it's

maybe time to start thinking about some of those things a bit differently and to push back a lot more um we're looking at a situation and like i would say that this is probably going to be as true in canada as it is in the us sooner than any of us would like um we've got a lot of systems that were designed for operating an environment where they were fundamentally working under the rule of law and they're now going to be operated outside that scope you know we've made a lot of assumptions about lawful intercept is lawful um what happens when it isn't um you know and i would say that our our responsibility to think about the impact

of our actions is not just for mitigating known harms it's for looking at harms that are probable it's for looking at the structure of our work in that kind of larger context and this is where this is one of the kinds of places where this way of thinking about things from first principles i find sort of clarifies a lot of stuff um so i want to talk a little bit about you know when we say you've chosen a set of users right you know what are the goals of your chosen users right and not the goals with them using their system but the goals of they're trying to do things in the world right so let's say

you run a chat service which is sometimes used um for you know social stuff and sometimes used for work stuff so here's a few personas that you might think about examining in the context of designing that chat system say you are a domestic violence victim seeking an abortion say you're a queer teen say you're a union organizer each of these is interesting because they break some of the expectations that we have when we're designing systems right if you are looking at doing authentication for a system that will be used by domestic violence victims and i can guarantee you if you are doing if you're designing a system that uses authentication you are designing a system that a domestic

violence victim will be using to authenticate um you know simply because they're you know these people exist in lots of contexts this is not you know you can't say oh well we're not social right you know we're a web shop we don't need to think about this well you probably do um and this is an interesting one because it means that you can't for instance assume that the user has control over their devices you can't assume that the user has reliable control over their email address you know somebody else might be able to send mail from that address somebody else can almost certainly read email from that address um and i know in a lot of cases that sounds

like well this is impossible right we can't possibly do anything here um except you can right so let's say you've got uh let's say you know you're this chat system well so let's say you support deleting messages great do you leave a tombstone for those messages right if you look at a chat log after the fact do you see so and so deleted a message so and so deleted a message do you need those tombstones right maybe you make those vanish you know they show up for 30 seconds just so that kind of the user knows what happened to you know to smooth over the ux and then they go away so if you look back

at that chat log in two hours you can't see anything right that means because at the end of the day that user is gonna be trying to have some conversations moment-to-moment right and they may know okay i've got two hours where i know that the person who i'm trying to to stay safe from won't be able to read this chat log okay i can now use this tool to go have a quick conversation to i don't know arrange and exit do all do all sorts of whatever get support any number of things um you know so things like account recoveries understanding are there other sessions currently logged into uh into my account right facebook has

actually gotten pretty good about this you can see oh yeah here are all of the sessions that have accessed this account that are currently accessing etc there's a nice button log out all the other sessions which makes it pretty fast to do an account recovery to say okay um log out all the other sessions change the password okay now change the recovery tokens you know change change which um which email address something's associated to um you know and obviously this goes both ways right that means you know if you make that easy then it's easy for anybody who does have access to that account but um you know this is where we look at you

know some of the kind of um interactions where that uh that abuser is probably not necessarily trying to um eliminate access because they want to retain long-term access um queer teens are interesting when we think about parental controls right we spend a lot of you know if you're if you're designing apps for kids there is a lot of concern into like okay making sure that parents can see things etc but you know that's not necessarily actually always going to be useful and figuring out how do you um how do you balance those different needs you know like some of the um the first round of the apple um uh csam reporting system where they were

gonna do client-side scanning for anything that might be csam on devices owned by kids and the first version of that was and then we'll send it to your parents on the assumption that well yeah the parents are a trusted party who should know what's going on except now you pray you've built a machine that just outs people to their parents um you know and now they do something of like blurring the content by default and you know suggesting access to resources so that you know folks can still be protected against some of this kind of stuff but there isn't this kind of um you know there isn't this structure that will magnify and literally create instances

of abuse um a union organizer right this is um you know i mean i think most of the folks doing union organizing at this point well yeah you don't use company slack because we know that the company slack has all of us in it and we may have been told by you know somebody in in corporate to take a look at stuff or hand certain things over legal et cetera um i don't know do you want to always have everything internal logged in all contexts you know do you want to set up so you know and i mean slack has done this right where the default is dms are actually private and you have to tell

people if they're going to get turned on right that's i you know slack hasn't made all great choices everywhere it'd be nice if they had like some anti-abuse tooling in there but um but that is a great choice where by default not everybody does have that access so i think this is something where as we as we think about the tools that we're building as we think about what they actually mean it changes our understanding of the products we build of the way we respond to um different kinds of security situations etc um you know if you have um if you have a customer service context and or you have an abuse line etc you

probably do need to be in a position to deal with um abuse victims from a bunch of these kinds of things you know from a bunch of these of these kinds of personas who may end up you know in some situation where they're stuck and because of their situation they don't have recourse to the usual channels and you know this is just kind of one little corner right you know these are these are illustrative personas because they show us a lot of the assumptions that we make are not necessarily going to hold true for all of our users but this does you know this is uh this is not sort of the total set of of

context that you need to consider but it's a start and it's a start at kind of getting away from you know some of those those baseline assumptions and looking at what do our tools mean when they're used in contexts we don't expect and i think that as we start looking at um as we start looking at more and more authoritarian company countries and possibly much more of the internet becoming more authoritarian um and becoming policed and many more people needing recourse to private communications needing recourse to being able to remove data about themselves being able to not be caught up in surveillance a lot of our jobs are going to get a lot more complicated because the harms that

were called upon to balance are not going to be as straightforward you know and we may also be in positions of you know well there's a you know there's a lawful access request for data that you hold um it's a system that we've designed so that there aren't logs of internal data deletions which lawful access requests do you respond to which do you not you know and kind of understanding what the direct cost to humans who've trusted you to enable them to do the things that they're trying to do in the world is going to be is going to become more and more important um unfortunately for probably pretty rapidly so sermon over um i would really like to live in a world

where we needed to think about some of this stuff less but unfortunately that's not the world we live in let's talk about how we do some of these things around designing for resilient security um the first four of these principles are are heavily tied together and we'll kind of run through them and i think hopefully um i'll give you guys a bit of a feel of how this kind of principle-centric thinking works in this uh in the context of this kind of systems analysis so um state and logic right services should either do computation or hold state not both and this is i mean i think probably familiar to some of you who've looked at 12-factor apps

and all of that kind of modern you know sort of more modern dev principles but from a security perspective complexity is basically enemy zero right um you know every increment of complexity you know you've got this exponential curve of how difficult it is to secure a system because all of your components interact with all of your other components so the simpler those individual components are and the simpler the interactions between those components are the easier it is to understand what your system is actually doing um some of this gets into the kind of weird machines territory um show of hands for how many people are familiar with that phrase um i know there's a bit of a delay so i'll keep

talking but um so yeah so uh this is basically saying if you have a service that does computation right if you have if you have a web service etc it should be completely stateless if you have to have back end state put it in a separate cache layer that you know it could be a sidecar it can be whatever um and just make sure that you can decouple these and that they're inspectable right if you have a separate um a separate service that's holding your state you're more likely to be able to you know go in and inspect etc than if it's just like well we have a big blob in memory which is all of the state that

this service instance is uh is using um so i'll explain the weird machines thing this is uh um oh god len sasseman i think is the original credit for this um he it's basically the idea that every machine every every piece of software has a um unintended function that you know its intended input its intended outputs but most pieces of software also have some weird functions where if you send it some unintended input you get unintended outputs um and this is just kind of another way of describing you know it's a vulnerability but if you think about well we've got a bunch of apis that are stitched together and they're all mostly correct but you

know some of them also will do other stuff sometimes um it makes it more likely that an adversary can stitch together a machine that does something that's useful to them um you know out of those those normal machines and their weird functionality they can build a weird machine that lets them accomplish a goal in the world so um getting away from this kind of complexity makes it less likely that you accidentally build useful weird machines so now immutability and ephemerality also relatively familiar ways of thinking about systems at this point now the way i like to think about this is that you're like okay obviously data is state right cache is state um configuration is also state memory is

also state um kind of obviously um if you have if your systems are immutable then you eliminate state that is not necessary right because if i can't edit a thing right if i can't change a thing then it's not actually state it's just the structure of the thing um and you can look at limiting kind of the scope of editability of state in terms of like you know is configuration editable instead of container or is it you know completely um completely you know completely um uh you know read only or you know can i change things okay that's one way of eliminating state you can also look at eliminating state on a time scale right

so every time you respin a cluster um you've now reset the state of that cluster right which is another way of limiting the scope that an adversary could potentially edit and change the change the state of that system so we also want kind of minimal canonical state ideally every single piece of state should exist canonically in exactly one place right so this mean this might mean that for instance in an ideal world the entire configuration of your of your aws environment would live in github great now i can validate and verify that um or now i can assume that aws always has this state assuming my deployment system you know only goes in one direction assuming my um you know i

never make changes in production and so you know in theory i've got okay github is master and aws always matches that now in practice you actually do have duplicated state because you can't ensure that aws can't be edited so now you have to validate it but you do at least have an authoritative copy of what that state is supposed to be which means that it's possible to validate it right if you only have your non-canonical non-versioned copy sitting in aws validation is manual validation is look through this thing hopefully you can find all the pieces of state and then try and validate it so as few places as possible should be stores of state right this is kind of the

underlying reason for immutability you don't want those additional state stores you don't want to have to think about configuration drift you know all of these are sort of opportunities to introduce unintentional vulnerabilities and what this fundamentally gets back back to is this is kind of a version of least privilege right if mutating state in production isn't required for the system to do the things that it's trying to do in the world then it shouldn't be possible this is still true even if there's no intentional interface to allow um users to do this because attackers may be using unintentional interfaces right so you still you know it's it's a kind of an iterated version of least privilege where you're

looking at privilege even if those things you know even if those um you know those interfaces etc aren't intentionally created so this is maybe a little it's kind of a chain of things of looking at state and systems and this is you know it's just a sort of a few little principles off in the corner of look but it has implications for the larger way we think and reason about systems and the whole goal of this entire chain of things is to reduce that structural complexity to simplify the system to make it more predictable to make it easier to reason about um now talk about one other principle at the kind of component level um that's more technical which is

unlinkability so obviously there's a lot of folks talking about privacy there's a lot of you know systems like tor etc that are trying to create anonymity um one of the things that i've found in you know again in a while of looking at this in contexts where it's mattered um is that it's often very difficult for engineers or teams to define what they mean by anonymity and so i like to think about it as unlinkability instead because this is something that we can actually analyze in a meaningful way right if you know a piece of data like this user clicked on this link right so you've got a user identity and an action right and if you can say okay

you know this ip address right we're using ip addresses as a proxy for identity here so this ip address is unlinkable to this link click under some set of assumptions that you know like the adversary isn't on the local network that um less than x percent of the nodes in this um in this onion network are um are compromised um so this is basically a way of reasoning about where you do and don't want to have um specific kinds of um sort of structural uh structural leakages and structural linkages and in general i would suggest that in a lot of contexts it's useful to a it's useful to actually actively define this about which things you are and are not

willing to be linkable and also to whom you know for instance if you have a um if you have you know some random web application that has a bunch of sas services that it interacts with um and in some of those you're gonna call out with like oh i've got a user id and some piece of data that they're gonna you know look up send somewhere process in some way does that user id need to be the same user id that you use internally and that you use with other vendors right so let's say you're comfortable with the privacy impact of using this fender separately and this vendor separately you know so you've got you've got these

two vendors each of those pieces of data it's acceptable to have out in the world separately but maybe it's not acceptable to have both of those pieces of data together you know cross-linked so if you use the same user id for both of these things then if those two sites get a breach a third party can combine those two data streams you know and this is using uh using a non-opic identifier like email as your key that you send to the third party or using the same guide for all the third parties whereas if every vendor gets a separate guide for the same user so that you've got you know here's our here's our good stream for

for vendor y our google stream for vendor z etc then it doesn't matter if those breaches happen it doesn't give the adversary the ability to cross-link those those pieces of data um so yeah this is this is kind of and i'm not i'm not uh i'm not claiming right now that this uh the set of principles i'm talking through here are a complete set of principles for designing from a systems thinking perspective more that they are a um you know these are a few interesting principles um i'm trying to you know not keep you guys here all day for for this talk um so let's talk a little bit more now about process principles um and these

are more about how we do the work um how we interact with the system how we you know how we work as teams than it is about um the kind of the specific technical components um so the first one is looking at declarative versus procedural logic um in general if you can configure something declaratively it is going to be much easier for both humans and sometimes also the computers but especially the humans to understand the end state of the thing right so you have state a you apply a configuration you end up in state b if you declare that configuration then you can generally look at state a look at the declaration of of the you know the

transition and understand okay this is what state b is going to look like if you have a bunch of for loops that iterate over various things figuring out okay what's the end configuration is often very very difficult um and this this concept applies in a bunch of different places um so for instance if you're looking at a parser right you're receiving something in off the network and you need to make sure that it is well-formed and you need to understand like you know okay here's the here's the set of elements that we're trying to get from this um you know you've probably got an idea if you're you've got an example data right you've got the thing that you're expecting to

receive but understanding whether or not the parser that you wrote actually matches the spec that you have is incredibly difficult right that is a very hard comparison um question like actually both for computers and for humans um it's not just hard for for humans so instead if you say okay we're going to write a spec declaratively and then generate with a with a tested parser generator a parser that will um you know uh that will just accept those things you're much more likely to end up with a purser that doesn't accidentally accept something weird and this kind of goes back to the weird machines um you know the second to uh second to memory issues

partial issues are kind of you know one of the most common classes of vulnerability um you know xss all these things are all fundamentally parser issues and we still haven't gotten yet we're still letting people write parsers by hand um and when you look at say oh no we don't we don't need to worry about parsers because we've got we've got json now right and there's a nice you know validated json parser built in but you write javascript and so you accept an object and that object has some structure does it have the structure that you expect it to um you know if you use a strongly typed language you can actually make some assertions we've declared the structure

of this object right we've declared the allowable structure and now you know throw the json in there throw it okay type error right um state machines are another context you know the number of times both like looking at all the all the platforms that have had uh ip fragmentation issues and then the number of times that we've reinvented that problem in different places in the stack like um you know sip fragmentation you know okay now you know we've we've we've built a messaging system on top of ip and now it needs to frac oh and we have the same you know 90s fragmentation issues um you know if you have a state machine generator that

simply drives that structure it it lets you get an outcome where you actually can say things strongly about no no you can't enter in a legal state the system won't allow you to skip this step um and this doesn't have to be a formal methods world thing right this doesn't have to be something that's like it's not just an aerospace world thing um using a parser generator is faster than writing a parser using a state machine generator is faster than writing a state machine by hand um you know obviously you know there's a little bit of overhead you have to learn the library the first one might be slower the first five might be slower

once you get through that place um it is a lot faster right but this is also terraform right this is also infrastructure is code it's saying i declare that i want this to be the state of of my ecosystem figure out how to make it that rather than run this set of commands and hopefully it ends up in the state that you want um you know a lot of this is basically let's let's build systems in a way that uh allows humans to actually interact with them in a reliable way you know let's make things as easy as possible for the computers but also for the humans um designing for failure right um i don't know how many of you when you're

when you're kind of especially if you're looking at like a relatively greenfield security environment where you're standing up a lot of stuff you know we we generally kind of will do a little bit of designing the system we'd like to have but then it's like oh well but we really did need uh you know we needed a forensics environment we needed a whatever um you know the being able to say that okay we know that for at least the kinds of compromise that we can foresee we have these tools in place right instead of saying well we've got a control for auth authentication we've got a control for data backup um we don't have a control for

you know data backup in the context where the authentication system fails it allows someone to go delete stuff um you know so designing with the assumption that any given layer is probably going to fail and sometimes the answer may be well we can't actually completely duplicate this layer like we don't have a way of saying oh well no one can delete data if you know um you know because we you know the auth system might fail that's not viable fine you know but you say okay if we keep backups for 30 days even after deletion um they're only accessible through some other process you know and then okay we have to balance versus lawful intercept if somebody really needed that

data gone like there's you know there's trade-offs in all of these directions but just assuming that kind of compromise is inevitable in the same way that we assume that um you know that we assume that a uh like a given component a given node is probably going to crash at some point right um so this is we're getting now more into kind of the um the actual resilience side of things so one of the things that we've seen when looking at um resilience with teams operating in high risk situations uh or kind of high high consequence situations and so this is stuff like you know yes it's ops teams but it's also um like hospital operating rooms um

you know flight decks these kinds of places is that it's often very critical for safe outcomes and i think also for secure outcomes with some caveats that decisions are happening closer to the edge right that the person who is deciding is a thing safe is a thing functional is the person who's actually engaged in the full context of the work um and that means that you need to be focusing more on coordination and communication and less on central control right um you know and in some cases this is saying to this might be saying to a um you know to a development team you know this is the set of things that you can deploy without any oversight this is the

structure this is you know i mean let's say we're thinking about reliability or efficiency like you know you can go you know you can spike 2x or 3x or 10x over your normal compute budget you know to keep the system up right we're fine you know we're fine taking that cost at some point we need to kind of you know tie it in you know but but getting those decisions closer to the place where there's context is really useful um what you need to do this is you need thick thick horizontal relationships right and this is what i was talking kind of at the beginning about needing um you know these are any of these kind of emergent properties

require the entire set of people working on a system to operate collectively you know this is the well you know you can't ship a secure system if the dev team is isn't also trying to ship a secure system and that means that the dev team needs to know who security is because they're going to have questions right and like yeah you can have your security champions program and your this and your that um but like for instance at etsy um you know there were a bunch of healthy snacks in the building and then security had candy right and a whole lot of people from a whole lot of different places would show up and come get candy

from security and that meant that literally like we knew a bunch of people in a bunch of different contexts that we wouldn't necessarily have known otherwise just you know for those kind of social reasons which meant that it was much easier to go and you know like when when somebody had a security question they probably knew somebody on the security team well enough to kind of lower that bar to just you know ping them on slack etc um you know and we could get that kind of thick communication decentralized coordination not top-down control um which allowed for better response etc when things did go kind of outside the outside the bounds of normal normal operation

um prioritize replanning um this is again like you know this is something for resilient systems in general i think it's especially true for you know things like instant response we know we end up doing this a bunch but also for um you know looking at like major security initiatives etc right we know that you know you make a plan mostly to have something to destroy as soon as you start work um you could basically guarantee that that plan is going to change um now that means that you need to be like you need to focus on replanning instead of focusing on on like making that initial plan as heavy as you know as correct as possible

um and this is you know i think one of the things that like at the at the sort of at the sprint level we've gotten better at this right we're kind of used to re-planning on that context we're not necessarily used to [Music] managing resynchronization as stuff changes across teams on a sprint by sprint basis and we're definitely not good at managing this kind of replanning at the structural level um i have yet to see a large dev org that could reliably do reorgs without someone quitting you know and that to me feels like given that we know that reorgs are inevitable we know that replanting is inevitable we know that shuffling goals is inevitable

that seems like a pretty big you know failure given the way that we know we have to work in a resilient scenario and i think as we're looking at our threat environments changing our um you know our um the the meaning of our systems changing and changing fairly rapidly this is gonna have an impact on the um on the you know on security teams more and more directly um just to answer the question on balancing decentralized decision making versus having a single sort of truth um what i'm talking about here is the um uh decentralized decision making in the context of um of the engineers doing the work not necessarily like i'm not saying that you

know there shouldn't be still a single search of truth but i think this is this is one of the things where if you if you have as a principle like that's that's kind of accepted that we will always ship systems that have a single source of truth then it's basically a forcing function right that principle exists as a forcing function to save those two teams like look you can figure this out directly right you're empowered to go make that decision between the two of you you probably also required as you know as kind of the caveat to that power to go find the other stakeholders um but um that doesn't mean you get to bend the

principal right you still have to say like one of you two has to win and ideally you're in a context where you are on the same side and you're not dealing with um you know inter-team or inter-dev org in fights about you know territory and power etc um but if you are you should probably solve those instead of bending the principles for your system design you know and this this kind of gets into the um you know like the incentive structure that you design in your organization is going to determine the applications that you build right the reason why aws has so many services is that above a certain point you have to launch a new service

to get promoted this is a very dumb way to design a cloud ecosystem um we should perhaps not do that um so one last one um slack right you cannot resiliently respond to out out-of-context problems when you have a team that hasn't taken vacations in three years where everyone's working 70-hour weeks um you know this means yeah you need to have big enough on-call rotations you need to make sure people actually take all of their vacation you should be tracking out of hours work especially for sec ops and ops teams um and you should have like actual caps on how fast you're actually going to try and change stuff um anytime you're trying to improve these

kind of emergent properties across whole systems it is going to take more time than the bare minimum and any time you get outside of kind of planned outcomes you're going to end up asking your team to put in a bunch of additional work so in both of those contexts you make you need to make sure that the teens actually have the time to uh to do the work that they need to do for those properties not just for the bare minimum correctness but also that when you hit a wall and you know all of a sudden it's like okay we've got a bunch of stuff in the logs that shouldn't look there this looks serious

everybody cancel your weekends um that people aren't just like you know i'm just going to ignore that message i don't really care they can fire me that is not a thing that you want to find out that you have your teams in that state during an incident um so you very least need to know where they are and you probably need to actively manage this stuff um and i know that a lot of folks in security and i've been in this position myself end up with this kind of superhero mentality of like this is important we're gonna save the world yeah it doesn't actually work um all you do is get burnt out and not save

anything um so take your vacations anyway um if this stuff is interesting and you aren't in security and have a startup i'd be happy to talk to you about how you build a security team from xero um and i'm also happy to talk through questions

all right thank you very much eleanor uh there's one question here is how do you deal with unlinkability over multiple degrees so two pieces of data that can you know with a link that can be inferred through a third piece of data yeah i mean i think it's it's gonna depend on the context of of how specifically you wanna tease those apart um you know in general switching from uh shared identifiers to opaque identifiers um moving to you know systems where you send transformed information instead of um instead of raw information all of these kinds of things let you start teasing it apart um you know but it's gonna it's really gonna depend on the specific context um

you know there are um there's been a bunch of work on some of the kind of like privacy preserving matching systems where you have a certain privacy budget etc some of that work turns out to maybe not actually work as well as we thought it did like there's a there's a whole evolving field there um but honestly even if all you look at is the identifiers you use where they go to which third parties what data you actually need to send to those third parties what data you can avoid and in general basically um treat information as a as a cost right every piece of data that you store send you share has a specific cost and kind of try to

minimize that cost yourself um you know you end up kind of minimizing that risk if you have more complicated situations where you need to start looking at interesting crypto and stuff cetera uh i'm happy to talk about that offline because those problems get super fascinating but there also aren't general answers thank you how do you balance decentralized decision making with having single sources of truth for state for instance two teams i already covered this a little bit earlier actually yeah no i think it's i think it's basically this is a question of um making sure that you've you know making sure that you're doing your design intentionally making sure the teams are actually talking to each

other and also making sure that your system architecture isn't determined by politics which i know is um incredibly difficult in a lot of contexts but um i mean one of the things i'll just i'll i'll pick on aws some more um but i think a lot of a lot of uh cloud providers are doing this right now um complexity is lock-in right and so they have there is no um there is no drive to simplify our cloud stacks because you know you give people a lot of individual canned features that map very directly to user problems it's faster hey talk to your aws rep and they'll tell you how to fix this um and i mean i

could in a lot of cases i could say the same thing about gcp and azure um whereas if we think about this from kind of an emergent complexity structure actually we need to be we need to be flattening stuff a lot right we need to you know the the number of layers that we kind of stack on top of each other each one of those layers has some set of unpredictable failure modes um you know and obviously it's not a trivial thing for us to just go out and like well let's um let's redesign let's just boil the entire ocean redesign the world in one go but at a certain point we're gonna have to start kind of knocking out some of

those layers flattening stuff out etc or you know we can't just keep adding craft on croft thank you very much um i i think that that's it for for the questions that i've seen pop up here give anyone a last second here if not thank you very much for for the presentation uh you covered a lot of really important topics and uh gave a lot to think about as well i know that i at least um i'm having a few new questions to add to my repertoire when when dealing with clients so i really really appreciate that um thank you very much for for kicking off day two of b-sides vancouver and to everyone who's watching uh please enjoy

the conference we've got a number of uh fantastic speakers lined up for the rest of the day uh big thanks again to our sponsors who helped us put this on we just can't do it without you so we really appreciate your support thank you very much for having me thank you bye

you

BSides Vancouver 2022 Keynote: Systems Thinking for Security with Eleanor Saitta

Related talks