BSidesSF 2019 - Building Identity for an Open Perimeter (Tejas Dharamshi)

Name: BSidesSF 2019 - Building Identity for an Open Perimeter (Tejas Dharamshi)
Uploaded: 2019-03-18
Duration: 28 min 14 s
Description: Netflix is a 100% cloud first company. The traditional corporate network security perimeter no longer meets our needs. In this talk, I will be covering the core building blocks comprising of identity, single sign-on using standards like SAML, OIDC and OAuth, multi-factor authentication, adaptive aut

BSidesSF · 201928:14217 viewsPublished 2019-03Watch on YouTube ↗

Speakers

Tejas Dharamshi

Tags

CategoryTechnical

TopicCloud IAM

StyleTalk

Mentioned in this talk

Tools used

Apache Mecham Stethoscope

Platforms

AWS AWS ALB AWS Cognito

Frameworks

Spring Boot

Protocols

OAuth 2.0 OpenID Connect SAML

About this talk

Netflix is a 100% cloud first company. The traditional corporate network security perimeter no longer meets our needs. In this talk, I will be covering the core building blocks comprising of identity, single sign-on using standards like SAML, OIDC and OAuth, multi-factor authentication, adaptive authentication, device health, and authorization we have invested in, to make identity as the new security perimeter.

Show transcript [en]

so I'm gonna be talking about building identity for an open perimeter before that like my name is stage mr. I'm shy I am a senior security software engineer working on the Netflix's identity and access engineering team so Netflix went global in 2016 we started streaming content in more than 190 countries and but anything about like when we started streaming content in more than her in 90 countries what did this actually mean when we think from an enterprise security perspective not only what we're going to stream content in more than 190 countries our local presence in terms of our employee or contractor population was going to see an equal amount of expansion which meant we will be having

new office locations and people especially our employee mean contractor population either they will be working from those office locations or they might be working remotely which meant like they might be connecting through any public Wi-Fi we started making more original content than ever before both here in the US and all across the globe and which meant the applications that were being bird specials which are specifically focused towards the production use cases was going to expand which meant like people who are working on our productions they would need access to these applications and they are going to be on the go which means they will be connecting through any public Wi-Fi our other facets of business which includes our pre post

production lifecycle our localization partners which are all across the globe our device ISP our media partners every facet of business who is going to see a dramatic shift in how think about security and to add to the mix the kind of applications that were built or that were going to be build they need to go through this culture like cultural mind shift in terms of not just those applications of scope no longer like for employee and contractors but now they are the scope has expanded meaning our every other facet meaning all the b2b users are going to be accessing the same applications and they have to be made available outside of the corporate network so we clearly had to rethink in

terms of how we think about the perimeter based trust and how do we detach ourselves and there are a lot of reasons why we would actually want to do that the cost and complexity that's involved in setting up and maintaining the infrastructure all across the globe is just going to be very expensive and not very efficient and that not to mention like the the wrist etiquette exposes of having just a parent and a network-based parameter so we clearly had to shift our direction and this is very shifted towards more of a zero trust model and any of you guys like our have seen last year's talk at alig MA around lisa which gives a great

insight into Netflix's approach on how we think about zero trust and a lot of other examples beyond corp is a very good example on how you think and move away from an network based parameter so what exactly is zero to us the core idea behind that is access to any resource the internal resource or external resource access to those resource coming from a corporate network should not imply any trust it should not give you any privileged access you should think that an attacker is already present on many of those parameters and has enough access and can assimilate and just as you you should start thinking about your corporate network as just as any other public Wi-Fi and I think coffee shop so

this way is basically led to in section of our identity platform which we term it as Meechum the core idea behind the platform is Federation meaning what we want to do is we want a plication to delegate the authentication and authorization process to us and application developers just focus on building applications and once we strongly authenticate and authorize a user we will give applications assignment verifiable piece of information which they can actually verify this is a lot of advantages if you think about this particular model from our team perspective now we can start adopting standards and standards a very crucial asbestos specifically around like Sam will open a highly connected one we can use the same

standards to gate our applications and also we can use the same standards to create you to gate like the third party applications as well we can leer our security in terms of what we think is is the right level of assurance we need from who the login user is and we can provide the best in breed of end-user experience as well from a team that is building application now they can just purely focus on that and leave the authentication and authorization process to us they can get pluggable security so that there's zero room for any error especially if anything about security and they can get do all of this for your cell service what about the end user

itself we can give the best of breed of experience we a single sign-on so a lot of advantages to think start thinking in this direction the core to the platform is the identity itself what do we know about you who the user is and who is trying to gain access to our applications and when identity is at such a epicenter of our platform we don't want to be in a situation where we have silos meaning identity scattered across our ecosystem and different identities used to access different applications it exposes a huge list of any organization different identities as I mentioned to access different applications different security practices followed by the silos and we cannot have a consistent authorization

model as well what about the end user it produces the worst experience now people have to think which which what should I use to access which app and we cannot have single science design arm again so this is where we have streamlined in terms of how we get away from these silos and I have a consistent view of an event of a person of an AA or an identity so from an employee and contractor perspective you've streamlined where the identities resides so we completely manage them inside Google's like cloud identity for all the other b2b use cases which I mentioned which includes a pre post production lifecycle our studio partners our content or device ISBN media partners all of that

other ecosystem we have built our own IDP for that and we manage all these identities inside that but one aspect is how do we streamline whether identity reside the other aspect is how do we make sure that identities come in and go out of our ecosystem when when it's the right time so in every every identity that we issued they go through a strong provisioning and de-provisioning process and this is very crucial not just for our organization for any organization when we think about if you want to move trust towards identity so the core building blocks of provisioning and de-provisioning processes will tap into some external events and when we receive those events trains we will process

those event streams and provision the right identity that is needed and once we have done processing and provisioning those identities od provisioning them with let me talk notifications so that other application services in our ecosystem they can actually tap into the same events to carry out post processing but what type of events so we tap into a whole bunch of HR event sources work days for employee and contractors ESN for all of our production use cases so we tap into those HR event sources what about all other b2b use cases which might also emit other types of events that we might want to tap into for example production related events what's the start and end date of a production

when is the right time to get people in and get out of our ecosystem the other b2b use cases like vendor contracts right can we use that as an event what about the activity of the user itself just because we provision identities that doesn't mean that we let them be in our ecosystem forever if there has been no activity it might be better to move them in and get them out of our ecosystem in a staggered approach so so provisioning D provisioning and making sure like identity silos are not there as one of the very core fundamental building blocks of our platform so now how do we start establishing stronger authentication and detecting like and

tagging who the user is and how can we start establishing some assurance for that so multi factor auth is at the at the core of the platform so users go through the first factor using the credentials and they go through the regular stuff process as well and as we have a wide variety and diverse population it is important support a range of options based on various use cases so we support push notifications we support TOTP security keys and any any possible method that's available out there but if you think about multi-factor authentication should be multi-factor factor ought every user every single time that they try to go and access applications it would create such a bad

experience is the risk level associated with a user trying to access a lunch menu application is the same compared to a user trying to access application that's exposing data what about some applications that are for our financial applications each one has a different level of risk associated with it so this is where we start thinking in terms of an attacker like what an attacker might be doing when they're trying to compromise some of these credentials and trying to get access to our ecosystem and we have taken this model and built an adaptive authentication system which is based on identifying what is the riskiness associated with an authentication request and there are various core building blocks that go as

far as discovering that the first core ingredient is what is a likelihood of a user trying to access an application we have a lot of interesting data elements available about the user they have we have users previous authentication history in terms of what applications that they have gone through we have other attributes about the user in terms of what organization the user is what team what business you in it and what production the user is part of we're various different combinations of data that's available and we can come up with a nice heuristic in terms of if it's dangerous likely to go and visit some engineering application versus a studio app versus a financial application so that's one of the core

fundamentals ingredient of our adaptive authentication what about the previous authentication activity itself so if this is an attack you're trying to compromise my account and trying to gain access to Netflix resources in those cases I might that I come abrogated by a lot of step ups all through the way and in such cases those requests were either we failed or they might have been left abandoned can we use that and feed into a without adaptive authentication to bump up our risk score to a more moderate level right which means like we need to do more step up for this user we might on time box certain applications and make sure that users are going through regular step ups for example if

I'm trying to access AWS console I might want to just time box that every six hours I have to go through a regular verification as well right so timeboxing application is another key aspect are to our adaptive authentication what about every application that is built in our ecosystem each app has a different level of risk depending upon what kind of data that they expose and depending upon what is the availability of that application meaning its internal external and other interesting aspects about the data in terms of if there have been previous vulnerabilities that's covered with those applications various data points that can actually be fed in terms of a risk level associated with an app itself

if a user has been accessing our ecosystem from a known device then we might want to actually reduce in terms of the riskiness associated with the authentication requests itself what about the location from where the user is trying to access our application right so like if if the request was coming from certain risky locations which we deem risky in in in in that case we might want to just outright deny the the authentication request in other cases we might want to step up but in such case we might want to just deny so a lot of interesting ingredients these are some of them apart from a bunch of other things but what are the interesting ingredients that go into our

adaptive authentication so adaptive both indication is one of the core ingredients around layered security and all of this what I just mentioned happens real-time runtime as and when we see new requests coming in what about the health of the device from where you're trying to access application and can what do we know about that because that is another interesting signal that we can actually feed into our adaptive authentication so there are a ton of endpoint security software which work really great and especially if you're on a completely managed environment then those who would really work really great in terms of your defining policies and based on that making sure the devices are in a particular healthy state

but with Netflix things are very different especially where our b2b business model is going and as and our culture how that plays as well meaning people have complete freedom in to going to walk into any store buy a device and use it for work so thinking of that specific culture and where the business is going what can we do in terms of still having a stronger assurance in terms of the health of your device itself so this is where we built a native application known as stethoscope which basically based on the policies that we said it recommends people in terms of what are the best practices in terms of the health of a device itself so me as a

user get educated I can self correct it and I can get my device in a healthy state but this information goes that far if you cannot put it to use especially to drive our adaptive authentication so during our out process we do like queries to the app itself and once we query those when once we query device related information and if it detects certain things which are not based on the policies that we have said in that case we recommend those corrections to the end-user now people can self correct it and take appropriate actions and it's kind of a win-win situation now we were able to get devices in one state in a recommended state and users got educated as well

right so it is an interesting approach in terms of going away from a managed by small and how you think about BYOD specific so device health is yet another core ingredient of our adaptive authentication and is also another core ingredient in terms of our layered security as well so now we are strongly authenticated our users based on various different factors now how do actually applications integrate with a platform so one common approach that you see in the industry is having a mega proxy where every possible authentication request goes to the mega proxy and and that's where you apply all the things that I just mentioned but with Netflix we want to keep to keep few things away

one of them is at the cost and investment of setting and maintenance infrastructure we don't want to go in that direction and we want to introduce single point of failures so this is where we want to actually push the integration right to the application layer itself and then we want to go in this direction we don't want to be in a business where we have centralized devops team which is actually managing these integrations and becoming another choke point in our ecosystem so cell service is very crucial in terms of how we want to empower our engineers to be able to drag these integrations themselves but in a way that it's simple intuitive and people don't have to think

about the complexities associated with the things that I just mentioned so this is where we have build an open ID connect compliant cell service which basically guides engineers through single click in terms of what are the best practices you get by default and you can go through like various the advanced configuration to actually update things accordingly so this is great and helps reduce the support burden in terms of new apps coming out new integrations needed it just makes it a distributed problem but UI goes as far as far as as people can actually use but now we want to drive these integrations far fashion to our ecosystem so API versus is what is something that will

get us there and at Netflix everything is API first so this is this is a classic model to use but when you think about the culture again in terms of freedom and responsibility developers are free to use any language any tool to build applications so we are in a polyglot environment so how do we go about making sure these integrations are done in a right way one one approach would be to recommend a lot of open source libraries and and leave the burden to the to the developers in terms of how they would actually go about configuring those integrations but this is a problem in terms of new languages coming out new libraries needed or existing libraries

having like Wallen abilities like how much chasm game are we gonna play so one thing you hear common in Netflix is a notion of pay approved when you're on the road you get best of the things in terms of best of best tools best support best documentation and best of everything when you're off then you're on your own so how can we use that to power our integrations so we one approach that we take is especially if you're on the college our track you get Apache by default right so how can we use that we use the model open IDC open source library to wire up these integrations that's that works for large chunk of

applications but can we push the authentication even upstream right so Netflix is largely on AWS so what can we use when we think about AWS in general so all the year year and a half we work very closely with Amazon's alb team and now Alvie support authentication directly from the load balance layer itself so it does support Amazon's kognito but the cool thing about this is you actually wire up your own IDP and this is where we can actually wire up our own IDP as well and as mentioned api is crucial to for us to get our integrations like far fetch into our ecosystem so if any of you are using our CI CD platform which is panikker so we

have integrations configured from there as well and which basically talks to Amazon to set up the load balancer it's solely leave all the complexities in terms of what you can get from AWS and rather just for white single click configuration setup similarly spring boot is another classic example when you just do Java do project generators like you can actually setup those integrations by default so we are strongly atena cated and users we have an interesting way of like setting up integrations which requires zero configuration from from any developer but does that mean like every person who goes for access our application gets access to our applications because it's wrongly authenticated might not be a good idea

so in that case from the same cell service what we do is we basically provide a base set of access control rules that you can actually apply in terms of which user base you want to apply which group of population you want to actually allow access to your application and so you use the same service to actually wire up those integrations but if if this does not work for for a certain portion of our developers then what we do is we package up information in terms of what you think you need from attributes that we would like to give you and we package these up and give those as access tokens store to the to the application so they

can verify that and apply other fine-grain authorization controls in there so access control or authorization is another core ingredient another interesting aspect about access control is we provide the base level access control and people can use or developers can use access tokens to actually get information and apply authorization rule but the same piece of information can be used in other authorization systems that we are building within Netflix as well right so same piece of data but in different formats can be used to apply access control so we have a large population of applications to get access to and we want to make sure that identity is at the very epicenter to everything we think about Netflix

followed by having a stronger adaptive authentication and figuring out whether it's actually needed for us to actually even do any step up to our end users and when it's about to do doing step up then range of options is important because we have users not just here in the US all across the globe and various different way rock and from various different diversities as well an authorization is another crucial aspect in terms of anything about layering our identity authorization is not a crucial part to it and all of this by adopting standards standards super Krish a crucial because now we want to use the same thing to protect our own application as well as the party sales apps and everything we

have cell service because we don't have be in the business of how the business how different parts of our business are moving we want rather empower our engineers to basically manage those integrations and we want to spread the integrations as Farfetch'd into our ecosystem so that there's zero room in terms of making it of developers making mistake and so we want to drive these integrations Farfetch'd into our ecosystem so these are the core ingredients which we think are important for us moving towards an identity based parameter and a detaching trust in the network based parameter as well so anytime how do you take any questions now yeah so in in terms of a sorry so

the question is do we have to write different states in terms of what the end point was and and how do we basically feed that into our adaptive authentication so we don't have to maintain different states associated with the device and we don't maintain different states associated with the device it's at the point in time when when we really think a device trust is actually need to be figured out is that is when a live queries done to the end point and if the query gives us the right information that's what is actually used so not know previous states of the device it's truly actually needed to make decisions any more questions sorry I can't hear

the question thanks for presentation can you talk a little bit more about how you implement a device health and was that a sliding scale across your ecosystem or was that a hard brute force limit if people didn't meet the minimum of criteria they couldn't access any of your apps so it is in the state in which we are right now so there is no hot force like blocking rule that we have applied especially with respect to device trust itself so it is it is basically a point in time as when when the device health data is needed and if you have not collected like a previous state of the device over a next period of time that's when we basically re

prompt the user to to query the the health of the device itself so it's it's based on the time period in which we had last seen the state of the device and whether we have done actually a step up if you do step up then doing a device I'll check simultaneously might not be a good experience as well so so either of those two and the time is the factor yeah

so that's an interesting question so I'm sorry so from a GD it sorry GB PR perspective do we request consent from the end user especially when we are talking about device related information and what do we extract from the from the device itself so the device runs in a non admin the app runs in a non admin mode meaning anything that is available that can be read in a non admin kind of setup that's what we may say we tried to query in terms of consent no we don't ask for user consent because right now the application that I'm talking about the the stethoscope app it's purely for internal employees and it has been deployed in that specific scope but yes

it'll become an interesting aspect as we start deploying it to our b2b use cases all across the globe and some of those things might be things to be looked into quick one more question anyone right there so so the question is heavy heavy looked into some other stronger like endpoint protection software and before we started building our own application so we wanted to move away from a managed device model as I indicated because last population of our user base is not moving away from just being purely employees but rather an external population so we really want to shift away from a managed device model so previously we were using Japan Landers for for setting up those policies but we

haven't detach ourselves from that particular model question any more questions we're still good on time no thank you thank you [Applause]

BSidesSF 2019 - Building Identity for an Open Perimeter (Tejas Dharamshi)

Related talks