← All talks

BSidesCharm 2024 - Scaling the Security Wall: Agile Threat Modeling for Complex Systems

BSides Charm35:05111 viewsPublished 2024-06Watch on YouTube ↗
About this talk
This talk advocates for a paradigm shift in threat modeling to tackle complexities in large-scale systems. It emphasizes the drawbacks of traditional security measures and proposes threat modeling as a cost-effective solution. Challenges posed by cloud architectures and rapid development are discussed, alongside strategies for integrating scalable threat modeling into the SDLC. Presenter: Vineeth Sai Narajala As an Application Security Engineer at Amazon Web Services (AWS), I specialize in core Data Analytics services such as EMR, Athena, and LakeFormation. Prior to my current role, I held positions in Pentesting and Threat Intelligence. Additionally, I gained valuable experience in Business Recovery and Disaster Recovery, particularly in mitigating ransomware attacks during my tenure at Nordstrom.
Show transcript [en]

[Music] uh good afternoon everyone my name is vinit hope you guys had a good lunch uh I'm an application security engineer working at uh Amazon web services in New York City uh I mostly work on AWS services that are part of the Hardo big big data ecosystem like uh EMR elastic map reduce aena Lake formation and glue um before this I worked as a pentas engineer and threat Hunter for Nordstrom um back in the west coast I was also an adjunct instructor for the University of Nevada uh for some cyber security courses uh outside of work I really enjoy skiing but haven't found any good snow on the East Coast it's just all ice unfortunately

um so yeah uh in the past couple years uh at AWS I've T modeled at least 130 to 150 features for tier one uh aw services and my goal at the end of this presentation is to convince each and every one of you that uh threat modeling can be done uh effectively at scale and it's probably the most effective proactive security measure that you can Implement in your software development life cycle cool so uh let's start by acknowledging the um the elephant in the room right everyone's moving to microservice based architecture and the complexity is increasing exponentially right with so many of us moving to this new micros service based architecture um we have a

growing web of interactions that gets really hard to track down over time um having this having this really large complex architecture means a threat surface and our security pure are also harder to secure right we have a lot of moving Parts is opportunities for uh things to go wrong and each new interaction can lead to uh could possibly lead to a malicious actor gaining access to our system um so think of this as like a really big city with each new building you have to build a new road and with and with more streets the more uh surface you have to cover for your for your um risks it is crucial to understand mitigate these risks cuz a

single bad microservice can lead to a cascading issues uh across the entire system I know we've recently started moving to zero trust architectures but a lot of us where where companies are much older we haven't moved to this zero trust system yet um and that that image there is actually uh amazon.com just amazon.com back in 2008 each each uh node is a microservice and each Edge is a interaction between those microservices you can only imagine how much more scown since then and so okay so let's let's look at how do these uh security issues of wbes get introduced in the first place so uh this is our sdlc right um we have different phases where we get our

requirements we design we build test deploy and we do this quite often and we can introduce these security issues at at you know many points but mostly it's during code and configs and and it happens a lot of times in the design build phase cuz sometimes you might design things you know that are inherently vulnerable or sometimes the code itself is vulnerable and so we also have a lot of different opportunities to find out where our issues are so we can identify our risks like you know working backwards all the from all the way from Life production so our bug bounties security researchers hackers white hat black hat they can all help us identify risks in our system um but you know it's

all always better to do so you know before it hits you know customers so we can find issues in our deploy stage um we can find issues in our test phase this is the most popular place where uh we and also the place where we invest the most amount of money for finding issues this where the pentesting happens there's an entire industry built on this phase this where Dynamic analysis happens and all of the good stuff there and in our build phase we can do code reviews and static code analysis but um I specifically I'm interested in the design phasee cuz irrespective of all of the other places where you could find issues they are mostly boled on they're

not they're not security um issues that we can identify with the the system itself right cuz when we build physical infrastructure we designed with like Security in mind but for software for some reason we try to bolt on security at the end minute and it gets really expensive really fast right fixing uh an architectural issue uh in production gets really expensive cuz not only do you have to redesign it but you have to like build it again test deploy and gets you know and so that's where the shift left Mantra has been you know getting really popular these days and so threat modeling is I would say the the first phase where we can actually find our security bugs and I

think is the the easiest place to find them cuz you don't have to do a lot of code review you don't have to go deep into it all you need to do is have a good understanding of your system and so again why use a threat model right so uh the combination of early insights that can be derived puts in puts you in the best position to build and protect your product or service securely at the lowest cost and then and effort and time so the cost as I said like decrease increases as you move across thec and so T modeling is the cheapest way to you know add security to your pipeline cool so so

when do you threat model so uh I like to say earlier the better the more often the better and it's never too late um a good time to uh trigger a new threat model is when you introduce a change or a new feature uh to a system threat modeling should be done uh in featur Sprints cuz any mitigations or things that youve discover can always be added to your backlog and then prioritized at a later date and implement it uh another good time to do set modeling is when previous assumptions that you've held true uh no longer hold true either because you've added a new feature uh a new a new research of like figured out a

way to do something really bad or uh the thre model becomes invalidated by um like a change in the development environment a lot of times like for example log for right that that issue caused a lot of us to rethink how our our systems are built and so a thread model can be triggered at any point um but even if your system is in production right it's there's no harm that modeling a system in production in fact I would like urge everyone to do so cuz you can always miss things when you did and it's always good to that model and so yeah so again do it as early as possible as often as possible and it's

never too late with that model cool and so let's just really quickly talk about the basics of thre modeling uh these are Adam's 4 question framework most of you are probably familiar with with this um how do you like you have four big questions what are you working on what can go wrong what are we going to do about it and did we do a good job right the first question is what are we working on the intent of this question is to help you um set a baseline understanding uh of the system that you're building and the details that are relevant to security creating a threat model or diagram is the most popular way

people do this um the second question is what can go wrong the intent here is to analyze uh the output from what are we working on and then come up with a A list of like issues or risks or threats that can affect your system and then the next up is what are we going to do about it here your goal is to come up with uh response strategies to your different risks and threats that you've discussed in the previous step and then did we do a good job is a question that's designed to make you um contemplate on the process see how you can improve cuz there's always some room for improvement and the result of the threat

modeling process is a creation of a living and dynamic threat model that can be updated over time uh there's a lot of tools that exist and there's a different bunch of methodologies that also exist some of you must be familiar with few of them some of you I know hate a few of them as well um and and there's also a bunch of Open Source tools as well so but all of them boil down to the same you know few steps like we discussed first is to have a complete design document and an architecture diagram uh these serve as a blueprint providing a comprehensive uh overview of the system components um next which is the bulk of

the threat modeling is to list out the threats and mitigations uh we systematically systematically brainstorm potential threats ranging from common vulnerabilities all the way to really complex sophisticated attack scenarios uh for each threat identified we develop a corresponding risk respon strategy uh either to neutralize minimize uh reduce or transfer the risk um together these practices from the backbone of a threat modeling process and they lay the the groundwork for all future security testing as part of your uh sdlc um while while traditional threat moding approaches have their merits um they also come with a set of challenges I'm sure most of you faced this before it takes an immense amount of time to get 10 people from different verticals into

one room and actually that model right um it's also very rigid following stride or you know pasta gets really rigid um it's not very flexible and it really slows down the development process security is already seen as a bottleneck and we don't want to add to the uh the the notion that we are slowing people down uh another challenge is shortage of security Engineers there's there's a huge demand for for security engineers and it's really hard to find them so um being able to scale this without having as many security Engineers is also really important and and lastly traditional thre moding suffers from lack of standardization uh two people can threat mod the same exact

application and come up with totally different threats and this variability can lead to inconsistencies and and and ineff inefficiencies at threat model makes it hard to like agree upon one model and makes it really challenging for developers to decide what to do but again despite these challenges like there's always a place for doing traditional threat modeling if you're only launching four or five things in a year or you have a really small set of things you're launching doing the entire slow process you know Works um with just a raise of hands how many of you hear like threat model at work or have done threat modeling before cool how many of you do it for

every single product feature or system you guys release none right cuz like like it just doesn't work at scale like it's it's super hard to do it and I only seen a few places that have done this you know relatively successfully so let's just take a like my Approach is like okay we're in 2024 we can't be wasting time looking at xss csrf and these lowlevel threats delegate them to the machine and let's focus on the the higher level bigger picture business logic issues right uh our goal here is twofold one is to pre prevent developer fatigue um that's that that happens quite often security tells development hey here's 200 things you need to do and here's thousand findings

you have go fix them like they at some point they're just going to give up and not going to like look at them and so um and also we should avoid making security a bottleneck like I discussed before so okay now let's we've done away with the easy stuff right we automated some things that we can uh so who's heard of the o asvs who's actually implemented the entire o bvs okay right so one step further after like getting rid of the lowlevel stuff is to uh build upon the standards the security Community has already um come up with right an example here's a few examples from the uh the asvs from version 4 um I

just picked out a few random ones so you here the first one is verify that cookie based uh session tokens have secure attribute set so here it's if you think about this although this is a requirement it is basically uh a shortcut to a threat we already had so there's a threat implied in the statement right the threat here would be a bad actor is able to view the contents of a of a cookie in plain text so the the OS asvs does a good job in providing like requirements and and um to make your system secure but it abstracts a set layer right imagine giving a 15-page document to your developers and be like

hey Implement everything I don't know which one's apply but you know go ahead and implement it right it's they're not going to listen they're probably going to like throw it in the B like as soon as you leave the room so we can we can we can tempti this um but this generic one size fits all just doesn't work and so what we can do is we can take this one step further and we can decompose the application into multiple uh reusable components and then set model or use templates for them specif specifically so uh just like a modular house that that's built offsite we can you know bring Parts cuz like no one's creating

brand new Frameworks every day right you're using the same 10 tools that your company uses over and over again in different places so the goal here is to create reusable components and predefined threats that developers can leverage to streamline the threat modeling process um by adopting this modular approach developers can inherit predefined threats and don't have to like spend a lot of time thinking about new ones and new mitigations and so on and this also reduces the time it takes to uh develop and reduces the bottleneck it also gives developers the option to pick and choose what they want cuz giving them are prescribing them hey you have to do this it's not going to work

and and sometimes the business needs come over security needs right so let letting them pick and choose what they want which components they want gives them the flexibility to develop at a fastest uh Pace but okay we'll come back to issues with this uh here's a really simple um diagram so G giving some context like let's say you guys all work for a Transit company and you're the sole security engineer for the company you're working on a new website cuz it's 2024 and the transit company wants finally wants a new website um your developers come up with this really high level uh design for their uh for the website can any of you just yell out a few uh

threats that could potentially affect the system idor good one anyone else yeah that's just a random profile number D do in injection CU you have user input anything else yep right you guys are all right right so these are all potential places where you know a a bad actor could you know impact us system so we have like you said idar you have user input you have file upload and then on the component side you have database you have a web service you have an API Gateway and so I would say 70% of the applications are the same rest API like web services right it's just API after API and so we can like like I said in the previous

slide we can you could build threat model for components specific to components and then reuse them across various applications right so we can ask our developers hey what compute service are you using what framework are using what API endpoint servers are using and we can they can pick and choose what they want we can take the like we can take the asvs or some derivation of that and then create our own uh component specific thre models and then we can and once the developers pick and choose what they want from the initial questioner they can inherit all of these threats and mitigations right this is similar to objectoriented programming can you know we can overload polymorphism all of the

good stuff so this is like step one and the easiest way to like get your company or your organization to set model at scale give the developers the options they want build these predefined scripts and then they're happy you're happy and at least youve you're getting you know a good 80% of your sets covered uh we can take this one step further and make a set model machine read a lot of lot of times when people do threat modeling it's usually on PDF docs and it's and how many of you have finished deploying something one year later went back and read the design documents anyone no right it's just not going to happen just cuz it's so much work and we

never go back to see old documents uh just like old logs we never see them right so unless something really went wrong so um using a threat model like either Json yaml or any machine readable format that can be iterated over time um and not using a static document can really help you like uh scale thre modeling across your entire company there's an added benefit of doing this uh the machine readable document also facilitates automation uh integration with other uh development tools and can also act as a asset inventory of all the components in your system so if you have a set model for various components and you know how many times this component was used you

can create a really detailed uh asset inventory of everything that's in your environment uh another thing that you can do is have a threat catalog of all threats that impact your entire organization so a leadership can have a high level view of hey uh these are the components we have and these are the largest threat surface that we have and so it gives them uh a way to prioritize um you know like different things like mitigations hiring so on and so and again like I said you can take your machine Le format and then um deploy it as part of your like cicd Pipeline and and you know you can indicate integrated with other tools as well and with llm

and eii coming uh I don't want to say this but everyone's saying it so we can dump this into one of the llms and ask it to see what it can get I've not tried it because I'm not allowed to if any of you are allowed to do it you know feel free to go ahead uh so this is just an example of uh a set model in Json format I don't know how much you guys can see but it's just basically a bunch of threats and linked mitigations um so you know we can stop here and be done right you've put in 20% of the effort you've you've just taken the uh pre-existing standards templatized them

for various systems and let your developers pick it we can stop here completely be done but and you cover 80% of your thre security outcomes right but a lot of us we need the 100% cuz you're probably in an industry where you can't afford to lose something right this is where it becomes a little more tricky right you the rest 80% of your effort will just go in 20% of your security outcomes but if you guys have seen the recent news a lot of the recent uh incidents have been business logic level uh not often do we see hey accss on Google's you know front page right it's just much more rarer for those issues to come out and a lot of

the issues now are mostly business project focused so and and one issue with the previous method is component are feature agnostic you can have the same API Gateway being used across 20 different components but used in different ways same thing for your uh databases same thing for your web service or your computer environment so um what we can do here is we can we can take our components and and actually threat mod the features the comp the the the components are built on right so we can as we add more features and more components components that were underneath that things that haven't been touched in years you could possibly introduce new data flow parts or new

vulnerabilities that you didn't intend to or a previous mitigation for a threat that you thought was fixed can now be exploited so Tech modeling features is is the rest 80% of the effort that you'll be putting in and this is also the hardest part cuz each feature is different and this what and this is what takes most of your time uh and adding new features again can fundamentally change the structure of your application even if you haven't changed a single component and and I I have a few examples of and I'll show you how how this can impact features so in in our previous example you guys were all again working for a Transit company now after uh a few

months uh your product team comes back and says hey our users want to check the trip history can any of you yell out a few threats that could impact this new trip history feature that we are implementing anyone SQL injection nice anything else yep Brute Force yep encrypting address cuz now you have credit card information right you have a beast called PCI DSS compliance sitting on your head right now so it so just a small new feature but you haven't introduced any new component right can introduce a lot more you know requirements for you so again so you guys are right root force uh injection and so on let me quickly switch to you guys see the screen or should I

zoom in a little give me one [Music] second okay is it better now Okay cool so I've just taken our existing system and I've just like drawn it out in a data s diagram so you have an actor you have a login page register page you have the Account Details page and the and a database that just has a bunch of information about the Account Details right this is what we had uh until the tripsc feature was created so let's maybe take a minute to just add uh give me one second I can't see much from here okay let's add a uh a new database that's your trip history database right so we have have a new database I your

tripy database and then your actor can now

um I talk so actually before that um we've had a trip history database and it was um we had to be authenticated to see the trip history of our of our of our users so the way they would come is via the Account Details page so they can come and then see the trip history from here right so this is just a new um flow that we added to our system so with this uh with this flow we can we can add like hey we can see if this system is using any authentication so let's go and add this to a thre model so we have database authentication let's assume it's a MySQL DB and has some you know simple

authentication uh it does implement some secrets and let's see it uses uh is that CSV yeah it uses CSV so here's basically what we added to our system now a few months later your uh product team comes back and says hey our customers really like this feature but they don't want to they want to see the trip C without logging in right so it looks give me one second okay it looks something like uh okay it looks something like this uh you don't have to log in you can check your trip history with just a credit card and expiration date do you guys see any big issue with

this exactly right how like how like how many of you have had like uh fraudulent transactions happen in a credit card uh the number of times credit card information is leaked on like any of these dark web net sites is like really high right so we just or just a picture of a credit card you can get the trip history of of a person and if especially if they use this uh transit system so going back to our um going back to our uh website a chat model here as you can see we've added a new interaction from the user directly to the tric without any login right so you're not going either the the login route or the

account registration route so this is basically again no new components just a new interaction and we just removed the authentication feature um if you think this was so obvious uh I'm sorry to say but the New York City metro did that we imple they did they implemented this in 2019 and it's caused a lot of issues cuz um you can stalk someone cuz you can see the time at which they took a they took the subway which station when you can easily get the information or the habits of a user right talking and other issues start coming up so again to us it seems so simple cuz again we drew it out we had you know base level of threats and we

and we discussed it but an entire agency implemented this in 2019 and discovered the issue in 2022 so again this is set modeling is not as common as you think we've all heard about it but not enough people do it so yeah and again right now they've disabled the entire feature so none of us can see our threat uh our threat our trip history and so let's go back to the um thre modeling tool and let's add maybe a a quick threat in our threat section right so again so this tool is is the it's called thread composer um initially it was developed by AWS but the Canadian Digital Services took a fork of this I

like the I like their version more than the aw's version cuz it lets you draw things out and you know which the the original threat modeling tool didn't have so let's just add a quick threat statement so we have I can't see it properly but so we have a thread actor I think um who is unauthenticated unauthenticated and authorized um I'm sorry I can't read that what does it say uh let me zoom in okay and they are able to view trip history uh which leads to like stalking for example and it impacts our Riders right so this website gives you a framework or a format to one you have your impacted assets you have the threats you're

talking about you have the prerequisites so you create this you have a standardized template for all of your threats and so you can use this information like over time compile them into different like databases so you can have a list of all your assets all your threats all of your prerequisites and so on and so we can and then once we add this threat to our sorry to our system let me go back here uh once we add this to our workspace we can then uh take a threat and then we can set the priority for our threat so we can set this hey this is a high threat and then over time we can

incorporate this into our backlog and give it to our development teams or we just choose other uh risk mitigation responses here and then always we can we can always discuss mitigations later and controls later but the important thing is to come up with a list of threats and then make that the starting point for your build process cuz once you know what the threats You're Building against you you you incorporate security into your build process so you're not bolting it on later on cool so now we've completed our threat model now what right so we have a bunch of our threat statements we do the analysis and the priori sorry go

ahead no it gives you Json yeah yeah so yeah um so once you have all your threats you can uh you analyze you prioritize them and then you have you know a bunch of risk response strategies you have avoid mitigate transfer accept um and the most important thing in my my opinion is your threat threat model should be the input to your pentest scope right so a lot of you a lot of you guys and a lot of your companies do pentesting for all of your services and you don't know where to start you tell the pentesters hey here's our system here's the boundary in which you're allowed to test you know do it right how

that's that's it can be very productive right they don't know what they're looking at they don't have the details and and if and if they're like internal pentesters they should know what they're testing against so giving them so going back to the same tool right so like you can give them an example is uh yeah you give them all of your threats you give them all of your mitigations all of your assumptions all of your assets so on and you and make this an input to your pentas scope which makes life easier for you the designers and the pentesters and I'm sure your pentesters will like you more when you give them a better scope so yeah going back to the

presentation okay so now just to summarize right we we've done automation of the easy stuff we've threat malled reusable components we threat malled the actual features and the flows we have a risk response strategies and we have an input to our pentest so at this point you've probably done 99% of what you could do and you're able to do this at scale and you're able to derive insights that you previously couldn't because of machine readable formats and and and having an like a bird's eye view of all of your threats assets and mitigation ations so yeah that's my presentation and if you only take two things away from the stock it is De modeling can be

done at scale and that model board features as well as components those are the if you just take two things away those are the two things I would recommend you taking away from the stock cool thank you very much do you guys have any

questions yep go ahead it's called threat company composer uh it's but the one I'm using is the if you just Google thread composer Canadian Digital Services or CDs that's the first thing that pops up yeah cool any other questions yeah go ahead yeah it it this one supports stride natively um again I do not like stride cuz it again boxes you and stride has helps you it might give you uh like things to think about but this one specifically supports stride but I think that they have plans to add other ones as well but again this is open source you can always add it and even like even in the data flow diagrams um you know

you could customize this for your sorry uh yeah you can customize like for example if your company uses a specific type of authentication right you can C like and so for all of your data flows you can set what kind of uh authentication that you guys are using uh sorry yeah so you can you can set your authentication you can set all the custom stuff so everything is open source here so you can you can build it for your company and and customize it for yourself yeah go

ahead um without going into too much detail like it depends on the the size and complexity of the service small ones take a few hours really large really complex one take a few days um but again like these few days you do everything right you take the design documents you like do the entire te modeling you create the you have the the scope for the pent ready so it it adds a little time to your development process but you don't have to Bol security later on to your to your product cool any other questions yeah

oh this is open source you can just go to the GitHub and run it on your own laptop yeah it's also hosted for free if you want to like play around with it but again it's open source download it and run it

yeah yeah we actually do right cuz during the requirement process we inherit our previous threats cuz once you do a threat model you can you can see what threats affected you before and that and that drives a requirement process and that should like in the sdlc you remember the cycle uh all the threats from the previous thre models can help you design around the threats knowing what you already will be affected by cool thank you so much everyone [Applause]