← All talks

Identity At Risk: Identity-Centric Threat Modeling - Apostolos Giannakidis

BSides Dublin · 202437:2681 viewsPublished 2024-06Watch on YouTube ↗
Speakers
Tags
About this talk
A principal security engineer at Microsoft examines identity-centric threat modeling and why identity security is critical in cloud environments. The talk covers fundamental threat modeling principles, real security incidents at Microsoft (including test environment compromise and Cosmos DB notebook vulnerabilities), and lessons learned about overprivileged access, defense in depth, and continuous verification.
Show transcript [en]

all right thanks for coming um it's uh it's really exciting to be here at the bides in Dublin actually this is the second time I'm speaking at the bides conference and the um it might have been the third one similar to the previous uh speaker I was meant to speak in 20120 before covid um things changed the plant changed um so you can consider this stock to be a followup uh of the previous stock coincidentally I was not aware that such a talk would be just before mine um but there's a caveat this talk is about building secure systems so I'm a principal security engineer at Microsoft here in Dublin and I'm working with the

software engineers in identity in the identity organization to build secure software so uh all the mistakes that you saw previously don't blame me you can blame my colleague though all right so uh yeah as I said I work as a principal product secur engineer at Microsoft identity and actually fun fact um I believe it's the only product security team across Microsoft uh excluding GitHub uh it's like a new initiative in Microsoft and if it goes well hopefully uh they will establish more product security teams like M365 is something they call product security team but they're very differently focused from us different different exactly so be a few things about myself before I joined Microsoft I worked as a vice president

of application security at JP Morgan uh and as a security architect in another cyber security startup company called warch Dublin based overall I have 20 years of experience and up until recently like last year I obtained my second master's degree in in uh the computer science field all right quick disclaimer um even though I work at Microsoft I do not represent Microsoft today uh and all of these uh These are My Views and they do not represent Microsoft uh some of the topics I will talk about are a bit uh juicy but I have I've I've made an attempt to remove anything that is confidential so most of this information if not everything is already publicly

available um and um uh if there's anything uh that is confidential my colleague Tom will interrupt me so what are we going to talk about uh first we will understand why should we should care about identity identity security I guess that's already Apparent from the pre previous talk but I will make an attempt to explain maral why we should care about identity security then um I will talk about what we do in Microsoft identity uh essentially we do threat modeling and my question to you before I start do you know what threat modeling is anyone here who doesn't know what threat modeling is what's threat modeling okay I'm not going to explain what threat modeling is to Tom Tom is

the best threat modeler I've ever met all right uh so threat modeling is an essential uh process to build secure systems uh so we will talk about the some of the most fundamental security principles in threat modeling that we follow uh to build secure identity systems then that ju far we will talk about real security incidents that we had in in Microsoft identity uh like the uh like some good parts and some not so good parts the Lessons Learned uh and how what we learn from these incidents and at the end uh I have some exciting news for you guys so why should we care about identity security all the previous things that we saw in the previous talk

but but also uh in Microsoft we have this is from a last year report but it's indicative of U uh the extent uh that we have to face in Microsoft just in a single month who had 50 billion identity attacks this translates to 4,000 identity attacks block per second that's insane out of those 42% of these attacks were successful all right that's not good news and identity related bridges are constantly in the news as headlines the interesting thing is that attackers do not hack their way in they log in they actually use our credentials to to L in and when I say credentials here I mean broadly speaking like um um could be doesn't have to be like user credentials

it might be mine credentials we'll see we'll elaborate this further okay so uh identity especially in the cloud uh is the new perimeter like it used to be like firewalls uh back in the on premises era but in the cloud era the identity is the perimeter is the front door to all applications and services and essentially a compromise at the identity layer gives an attacker the keys to the kingdom they they they can segment uh your whole infrastructure so for these reasons identity security is a top concern for Microsoft regardless of what you just saw earlier uh and the the fact that U the identity back boundy in Microsoft is the highest Spain Cloud program um is a testament to how

critical we consider um uh identity security in Microsoft and do you know why it's the highest ping Cloud program probably you already understand that because it is the first the primary attack Vector for attackers like um and maybe I shouldn't be saying that uh but it's already in the slides if you're a back bounding hun Hunter and you want to earn money identity is where you need to focus your efforts uh it's paying the highest dollar in the market all right these are a few other uh reasons why we should care about identity security these are obvious ones so I'm not going to elaborate all right so security principles what are the principles based upon we build secure systems in

Microsoft and everywhere else uh all of those obviously I don't have time to uh this is not a lecture to uh go through all of those but I will focus on the ones that are more more important about identity list privilege defense in depth and assume Bridge secure by default as well we saw previously uh a few examples of uh in other thems in Microsoft uh where secure by default was not followed uh but that's also very important in identity uh so let's um let's uh proceed with the first one the most important one in identity security the principle of list privilege essentially that's the heart of identity security right uh few permissions it means more security more fine grain

permissions you the more secure you are uh at its score it's enforcing minimal required access um and uh what do we gain by that like what's the benefit for us the benefit essentially is um we're shrinking the attx essentially uh so in case of an unauthorized access we limit the blast radius um you we should have in mind to assign permissions uh for to to the users and processing systems for the thing the task that they need to perform nothing more and nothing less all right so how do we achieve the principle of list privilege um common ways is RO based Access Control uh just in time access and we have we need to have a a specific

mindset in mind when we apply the principle of list privilege is the deny by default so you start with that like a clean seat uh you have no privileges and then you assign fine grain permissions um with you you assign explicit find G permissions based on the business name uh and then you another one uh another one uh is the segmentation we we need to segment our applications and services both at the network level and at the application Level and because most of the applications nowadays run on clusters we need to micro segment uh the workloads uh in an identity hour uh manner all right equally important is the security hygiene without that everything is everything else doesn't

matter so regularly review authorization policies and do you know when is the time to review authorization policies here in threat modeling that's the essence of threat modeling um and um and sure something that we've seen in Microsoft there might be authorization policies in place but there are sometimes no unit tests or end to end tests to verify that these authorization policies actually work so unless you test something there's no guarantee that it will work it's like backup I have this experience uh like even if you take um uh if you have backups you take regularly take backups unless you validate that these backups can be restored you don't have backups it's like the show gear backup same

thing uh with authorization policies all right so it's one thing to know what to do to achieve uh the principle of list privilege but it's equally important to know how to violate the principle of L privileg and these are some of the root causes of overprivileged uh cases or incidents we've seen in Microsoft uh like um uh one of the most common issues we've seen is the absence of authorization checks sometimes we see sometimes API in points for example do not valid Val date uh if if an identity machine identity typically I'm talking about machine identities mostly in this talk uh has the right permission to achieve a task uh I've seen in the past in another

company I used to work for um and actually I have a colleague from from that past company in the audience um so I'm I'm going to I'm not going to say anything confidential again uh we've seen Services CL Services during threat modeling or security assessment say uh that they do perform authorization checks uh and the the checks are in the in a j token think about JW tokens and claims in the JW token so during threat modeling we ask the question oh oh how do you perform authorization and the the typical response is oh we have this claim in the ja token all right do we stop there do we accept it as a valid

answer do we need proof that this actually works sometimes yes depend depends on how much time we have uh for our review but in that case there was an incident in in a previous company where the check was supposed to be done at the Gateway so the so when a s engineer says oh yeah we do authorization checks at the Gateway level so it's outside of our control do we stop threat modeling there or do we increase the scope of threat modeling to also make sure that the Gateway does the author the authorization check that is supposed to do one one common bit around that so so a a trick that you do you do sometimes uh we

we get back and forth with the developer say like oh the check happens in the Gateway right so like the Gateway is secure so so it's fine okay so like you're doing you're you're doing Network blocks to so only the Gateway can connect you to RSE like oh no no like okay well then I guess you need to be doing the check there as well on your side or you assume that the Gateway is compliments right so like this is this is person yeah yeah exactly uh and that's uh in my next slides actually sorry yeah no worries no worries uh thank thanks St um all right so even if there are authorization checks in place sometimes

these authorization checks are improper like you could be doing everything right apart from one thing and guess what attackers care about attackers care about a single overprivileged identity you might be doing everything right apart from one claim one claim might be not checked properly from at the Gateway level or at the API end point level attackers care about that one missing check so one overprivileged identity uh essentially could um be damaging your whole infrastructure all right so uh story time uh there is or there was a cloud service in Microsoft I'm not going to name the service uh that had the standing access to manage service principal peden standing acces like uplevel access all right um that was the

business need that was there was a valid business reason to have standing access to manage service principal credentials for those who don't know service principal is the identity of the application uh and we're talking about multiple service principle 26 meaning multiple applications so whoever manages to access to to to authenticate to that service they can manage service principles multiple applications the credentials of multiple applications and here we're talking about first party and third party applications so pretty pretty severe uh the what's the difference between a first party and a third party application U roughly speaking we can say uh first party Microsoft owned third party external partners partners um not always true but that's the highl so both

internal like our own and then also everybody else everybody else everybody else all right so so far so good the only problem was that uh the access to that application or service as we call them uh was done through multiple application uh multiple authentication mechanisms um is that a good thing or a bad thing well if there's a valid business region maybe it's a good thing but the reality is that the Legacy the other authentication mechanisms were Legacy authentication mechanism no longer in use the other bad thing is that these Legacy authentication mechanisms were overprivileged all right so over what's the secret risk here the overprivileged access could allow lateral movement meaning you compromise one another application service

principle and then you gain access to the other appli a and so on and so forth all right so lessons learn U or as we call them post more times or three more times uh multiple authentication mechanisms create complexity and something we hate in threat modeling is complexity so try to keep it simple I think that was one of the uh security principles at the start of this C so try to keep it simple especially dur modeling we do not want unnecessary complexity uh and during threat modeling we need to identify all authorization paths that's a time to do that like during every during the operational uh uh period of the application or service you don't

need to care about that that's operational but during the design phase and this is supposed to be a design phase you need to review all your authorization parts and during the authorization uh during that review you need to find if there are any Legacy authorization mechanisms remember the example from of the pr speaker saying do you have like a 12 year old old car hope not well there are cases that there are applications that use 12y old authentication mechanisms is a 40y old authentication mechanism at point exactly so that's the sad reality uh so review and identify all this uh Legacy authentication mechanisms and API and points identify uh if there are there's any real

business need and quickly swiftly migrate to newer ones if there are any high impact permissions assigned they need to be documented and Justified uh and of course remove any unused permissions all right so another security principle I want to talk about today and Tom already uh made a quick introduction to that uh defense in T and another one is the assume bridge I consider them to be like siblings um one uh complements the other okay so uh what what does this mean never trust always verify previously in the previous ages it was like U uh verify and trust or or sorry trust but verify that's not the way this should work it should work like verify first

and Contin continuously and and then trust do not verify once verify throughout the data flow so in data so in in in thre modeling we talk about data CL right so we we have a source and a sync where the the data originates and where it ends or the operation the transaction so during all throughout this uh data flow make sure that you validate the identity continuously uh assume everything is hostile what does this mean essentially it means that every part of the organization or or of the environment is potentially hostile uh everything could be compromised even if it is within your trust boundary uh and why should we uh think like that because a compromise in one

trust boundary could allow movement to another trust boundary so everything is hostile and we will see examples of that uh and defense in depth essentially means it's a complement that idea and adds security controls throughout all architectural layers uh this avoids sign single point point of fail and minimizes the blast edu in case of a bre um it's like having multiple locks on your door an alarm alarm system and a dog a guard dog to protect you need all of these to to have defense in depth all right so how do we achieve uh I think I already covered some of those uh we need to assess threats in all architectural layers uh validate data uh across uh untrusted

sources this is why trust trust boundaries is very important in in uh threat modeling you need to know uh what is under your control and what is outside of your control everything that is outside of your control should not be trusted and validated both the data Integrity confident confidentiality uh and um and the identity of who's making the call um multiple layers security control um diverse security controls and uh author strong authentication and authorization across all layers again security hygiene is super important and I I'd like to highlight the data classification and dependency analysis here um you need to if you have data classification you need to this allows you to know whether some of the data needs special attention more

security controls in place uh compared to other like credit cards right you you need to handle credit cards are sensitive information credentials uh more tightly than other data like the uh nonsensitive ones uh dependency analysis is equally important in threat modeling and not not everybody does that we in Microsoft try to do that we have a process called security ports and we try to evaluate the service with all its dependencies and what do we achieve with that essentially we acheve a blast Ru analysis so if there's a compromise of one service if we know the the the services that this service depends like I think like a tree of dependencies you know immediately what the subsequent uh

compromise would be uh in case of uh of of a brid so you know what other services could be uh you know the ATT how how the ATT could move laterally in in the system so you put security controls across all these dependent uh Services all right so story time again and I like I really like this one so there was a service uh in Microsoft first part of the series uh that uh used the cookie to store authentication uh the authentication state of a user uh during the authentication process the cookie to to is outside of the trust B right we do not control the cookie the the backend service does not control the

cook it's on the the client side it's not trusted so how do we make it secure that's that was the question of the softare engineers so the obvious uh response to that is let's encrypt it let's encrypt the contents of the cookies this way we can store securely the the the authentication State like the authentication State think think of it like multiple steps until you finally you you log in uh all right so let's encrypt it and uh then during in the cookie data let's have the identity of the user and compare that or validate the identity of the user when we receive the the cookie on the bucket all right sounds good any problems with that

approach do you see any problems with encrypting just encrypting the the contents of the cookie any Integrity Integrity all right so so the cookie data was encrypted and considered secure and the the identity was validated but what if an adversary changes the cook data all right encryption does not guarantee data tempering encryption is different some people believe just by encrypting data um you make your data secure yeah but that's not how it works uh in cryptography encryption protects confidentiality not Integrity so what the adversary did is they found out the key that was performing the encryption and the the key in this case was a user key the the tenant key so they use the same key to

reencrypt the data in the cookie substitute the value of the cookie with their own value change the the authentication State and bypass authentication as simple as that all right so what did we learn encryption is not data inet you need signatures for that uh and challenge assumtions about data security during threat modeling uh your data is your assets that's how we start threat modeling identify assets and then challenge your assumptions uh whether your data is secure uh assume that untrusted data are compromised they did that the soft Engineers of that service did that they considered the cookie to be untrusted and compromised that's why they encrypted the cookie however that was not enough and always validate data coming

from untrusted sources um yeah and continuously update thre models as the app evolves I have this last bullet just because this case this uh uh flow in in that service was caused because of a change in the in the service so at some point the service was St model it was Secure someone made the change in the cookie and that was not reviewed at the time there's there's one additional thing on there sort of a meta point from the is not data Integrity cryptography is this like very sharp and pointy sort like find a subject matter expert if you were doing anything even moderately interesting with it the engineer who like made this mistake like felt really

bad about it this was this was a principal engineer this is someone I personally respect quite a bit like messed up this one minor detail on the protography and it was an incident like not everyone is an expert on everything just because you're you're you're a wonderful security engineer doesn't mean you're a cryptographer for sure as I said cryptography is complex uh not everyone understands everything every detail uh and again threat modeling at least in the sad reality is that threat modeling is can be a timec consuming exercise so if you have 250 services like my team uh handles there's no time to go through EX in in an exhaustive way across all details of of

every service and this has to be done every six months according to to compliance regulations all right so uh an interesting case study another juicy story uh that happened uh uh that happened a few years ago I was torn uh I wasn't sure if I should talk about the midnight Blizzard or the cow de be I decided not to go with a midnight blizzard because this is still work in progress in Microsoft the previous speaker did mention uh midnight blizzard and actually before I go through this case uh study I want to talk briefly about midnight blizzard I think that was a good uh bridge between what I want to say uh uh what I just

said and what the previous speaker said about midnight blizzard so the one of the main root causes if not the main root cause of mid I don't know who knows midnight Bard all right so midnight blizzard is a Russian hacking group they managed to exfiltrate some important data from Microsoft a couple of months ago and this is still work in progress uh as the previous speaker said the the main entry point for their attack was um a test environment that was created by a software engineer for testing regions and that test environment was abandoned no one cared about the test environment quick parenthesis there are thousands if not millions of test environments uh at the

moment uh we we made we make an active effort to shut them down uh but yeah that group identified one of those test environments that was that didn't have any other access apart from its own tenant it's a test tenant all right so that test tenant because it's a test tenant we do not threat model we barely have that's the sad reality we barely have time to threat model Production Services we have millions of test tenants there's no way we can threat model test tenants that that's out of scope all right so in a test ten tenant test Engineers can do whatever they want pretty much so the test engineer created or used an overprivileged

permission I don't want to go into details but that overprivileged permission gave the attackers access to Pivot to the Corp Network and XR data all right so another lesson learn if you have resources threat model your test environments that might not be practical but keep that in mind or shut them down immediately when you don't need them test environments might have equal importance as the production environments if and if you don't have time to model individual with a St test environment have guidelines have guidelines have rules like your engineers you want to set up a test environment follow these rules yeah don't forget decomission the test environment when you're done and also remember the directory R

right all permission like even in test environments do you really need such a powerful permission all right let's let's see that uh interesting one that happened like 3 years ago it was presented at Black uh so I do have the freedom to say a few more technical details about this because most of what I will say today is already in um in uh on on YouTube all right so Cosmos DB not the actual Cosmos DB service that everyone uses and loves it's super popular but a feature in Cosmos DB which is called notebooks notebooks in Cosmos DB allows users to add Live code in a notebook that quates the data of Cosmos Tob and then

visualizes the data in a pretty uh way that data analyst like uh and they're happy with that all right so sounds cool all right so uh when we do a threat modeling uh review We As I said we need to identify the assets of a service so these are some of the assets API Keys primary secondary encryption keys and fabric cluster certificates all this uh service was running on a cluster like a kubernetes cluster we call it fabric cluster uh Public Service cluster all right and the components of course we have the service fabric we have the containers the dock containers we have the an end point for authentication we have the inest network firewall we'll

see that in a moment and we have the wire server the wire server is very important here it's essentially an internal server that manages metadata and secrets uh and certificates all right uh security controls in place we should treat Live code as malicious uh and so we we encapsulated in a uh Docker container so we have compute isolation here uh we have um all code running Us in an unprivileged uh user and the access to the word server that holds like keys to the kingdom is restricted by an inest firewall all right so this is the data flow diagram in threat modeling we use Thea diagrams this is over simple five uh and I realize something is missing

from from here the cosos to be data store that stores the data that you want to create all right so what's the expected path when we do a threat model review we are presented with a data flow diagram similar to this and we have the unrusted boundary the nested virtual machine here is where the notebook code lives okay so in order for the notebook to be able to fetch the data from Cosmos Tob they need to authenticate the cosmos Tob how do they authenticate Cosmos Tob they need secrets that live in the wire server all right so how are you going to get the secrets from The Wire server through the nested VM firewall and then

there's a guest VM firewall in the VM that is specific spefic to every customer every customer have their own VM all right perfect this sounds good [Music] right does this sound good well everything was great until uh due to a misconfiguration uh the adversaries managed to get through we assume Bridge right the docker container right from the start should be considered compromise okay so that's by Design we should consider this to be by Design but they got Roo and the first thing that they did is they changed the firewall rules of the nested VM the moment they did that they didn't have to go through the guest firewall anymore they could simply go to hyperv directly

and through the hyperv they went to to the War server the moment they changed the the the path of execution the wire server because of another problem I don't want to go into details they considered the user to be a trusted um imposer so through the hyperv where server gave them not only the tenant Secrets but all the secrets of all tenants when I say Secrets I mean certificates now to make things worse the certificates hosted in the War server were not even local CER certificates they were Global certificates across regions so that was really painful that happened before I joined Microsoft don't blame me all right so lessons learn uh again challenge access requirements if they

are uh even if they are by Design assume route for every environmental trust untrusted code um security controls should be behind behind trusted boundaries so that in guested firewall was doing nothing there almost nothing uh it was like a security control to uh for for uh to keep things in place but uh overall the the security of the system should not be based on that one single firew um powerful certificates that Grant overprivileged access uh make sure that you do not have such certificates and if you do have such certificates uh perform risk analysis during threat modeling identify every certificate that you use uh ensure if it contains the Privileges that it requires for the task

and if it is if it is considered overprivileged um maybe it's a good idea to segment it uh have separate certificates one per agion one per uh per cluster Maybe uh and and do not uh allow a compromise to to to to to increase the blast radius uh and and jump from one ten to another or from one region to another uh and so on and so forth because because that's what you don't want during an incident all right so uh finally uh I believe I'm almost in time I have some uh good news uh I believe I did not mention at all MFA multi M multiactor authentication and I don't believe there should be any

identity security talk that doesn't mention MFA that that would be a big Miss so I want to address MFA quickly um almost every compromised account in Microsoft did not use MFA that's a fact and MFA could if if MFA was enabled if could have blocked more than 99% of the account takeover attacks so what do we do about that and midnight blizzard by the way could have been avoided if MFA was enforced in the in that test environment all right so Microsoft starting from a month and a half from now decided to enforce MFA across the board almost no exceptions uh that's straight from from the oven uh straight from the Press uh so no exceptions uh with just this

change Microsoft aims to uh eliminate 99% of account takeover attacks uh going forward uh and I think that's a really good uh move that's uh that's me thank you for your com