← All talks

Lyft Cartography: Automating Security Visibility and Democratization

BSidesSF · 201930:178.3K viewsPublished 2019-03Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Lyft Security Intelligence team mission is to "Empower the company to make informed and automated security decisions." To achieve our mission, we invested in our cartography capabilities that aim at keeping track of our assets but most importantly, the relationship and interaction between them. The talk provides insight on an intelligence service solution implemented by Lyft Security Intelligence team to tackle knowledge consolidation and improve decision making. Attendees of this session will be introduced to the platform we implemented along with a broad set of scenarios that allow us to burndown security debt, detect assumptions drift, and enable teams to explore their service and environment. Furthermore, Lyft will release the platform to the open source community as part of the conference and provide details on how it can be extended to adapt to each need.
Show transcript [en]

so a good morning for some of us it's a little bit early but some of us is not my name is Sasha Foust I am a product security manager for lyft I presented last year and gave kind of a quick introduction of what my thinking was when I was started building the team this year I'm following up sort of a v2 version of how we're looking at automating some of our security decisions but also enabling democratization of that data as well quick background on in terms of the agenda I'll give kind of a quick background what we've done a little bit of the platform overview we are releasing a platform or piece of our platform today

what are the modules and enrichments that are currently available and what are the extensibility available to poor people to add their own context into our systems as well as upcoming release what are we thinking what it's coming up next so on and so forth if you want kind of a recap of the the initial thinking I would recommend to revisit last year presentation I'll just give kind of a quick recap this year so most of the business owners are taking risky or basically accepting risk without clear information it's very often hearsay or we're making kind of like somewhat educated guess but for the most part we're kind of blind a lot of the security reviews are done in isolation

we actually go see the subject matter expert they do the security there things are kind of addressed but they're not globally shared so there's kind of a communication challenges there as well dependence of view and transitive risk view is also very challenging we don't really have kind of a we don't have the ability to zoom out and look at blast radius of particular asset being compromised or as we are accepting risks we don't really know the blast radius of transitive risk of for example a low value asset that gets compromised what does it really leads to think about a single user being spear phishing in domain environment is there rather than that Spearfish user to domain and then

like we really don't know so to try to solve this thing when I was kind of building teams over the last year one of the core thing that came to mind for me I didn't really want to fix everything because that's not scalable as a team so I defined the team mission as empowering lift initially and now the community to make inform an automated security decisions so my goal is to enable the decision-makers to make good decisions and also find a way to automate our decision loops so we want to codify the goal is actually to codify our thinking as much as possible and for that we needed some sort of decision loop framework I personally use Oda loop for

some of you that are familiar observe orient decide and act and a lot of our pieces part of it we're releasing today is part of that mostly on the observe and orient piece and moving forward will give out more and more decision enact piece as we're releasing some of our components so the idea is to take subject matter experts and get them together if they're solving pieces of the puzzle on their own why don't we enable them to share that as much as possible make that cohesive view available to all so that one security can be observed but also other things can be done as well so if you have good like good visibility about

your all infrastructure and you've solved like asset attribution which is like a big problem to solve yes security can kind of leverage that and says ok if we ask that X is compromised we can go reach out person B but also from Finance they can look at building information like who's spending more money and compute for example so it's yeah now it's not a narrow view of just security we're using it for security but it can be used for a bunch of other things as well so the idea is to get everybody to work together in a cohesive consistent language that can be observed validated and also augmented over time as we're kind of figuring

things out so the platform that we're releasing is called cartography and the main objective of it is to build maps that's all it does Bill Maps refine the Maps take notes on the maps and then share that share the map enable the map viewing for a bunch of people but also enable automation because we're kind of building up the GPS capability for our decision loop and codifying them as well so we're releasing this it's been I think the bits been turned on public last Friday so if you want to take a look at it today it is available now we will continue updating it and also adding individual projects that leverage it so that's really the core of our of

our thinking of our system at lift at least in terms of security I'll credit Mike Johnson for coming up with a name we had like a bunch of other names but cartography was like the one that kind of stuck and I think I thought it was a good idea really represent what we're trying to do from a platform perspective it's pretty simple we're talking about Python code we kept it pretty generic the way it works is the user calls a CLI either a user's or a cron job or some sort of automation the CLI runs a global sync process that can call multiple intelligence modules so it is modular where you can actually stitch part of

your global map into individual components that are responsible for creating that schema and maintaining that schema and all that schema kind of connects together we are backed by neo4j graph database I personally started using graph database to solve automating kind of attack flow about five six years ago and I felt like graph database really solve my problem in terms of transitive analysis having the ability to compromise one asset and move forward that's when I was like heavy thinking on red teaming and Azure so we are still backed by a neo4j that can be replaced over time with different graph database but neo4j is kind of a low entry point it is free there's a community and

desktop edition if you want to move to cluster and so on you can move to enterprise and that would work so once intelligence modules are gathering the data and putting the data and part of their schema into neo4j we do a cleanup job so think about like taking a snapshot updating the whole graph and then deleting basically stale nodes and edges as well so our doc project documentation details like how we actually achieve this but we actually do a cleanup job so think about like the graph is constantly updated as frequent frequently as you want and then the last peace is are its enrichment job which is basically the ability to run a series of

neo4j statements to do admitted labeling on nodes and edges and so on so we do have two extensibility point one is the intelligence module so we can continue adding new Intel modules in there it requires a little bit of code change in updating the core sync service but that's definitely durable it's pretty low entry the easiest route and what's available as of today you can just drop a file and we'll work is the enrichment job and I'll give some detail in terms of how that works so the remark Leasing today looks like my mic not always working so the first module that we're releasing today is our AWS I've give kind of an overview last year what we're thinking

about that we have completed it or at least a v1 so we're releasing AWS lyft is heavy investment in AWS so we definitely invested there and the module supports or basically provides visibility of dinamo tables the whole compute ec2 elasticsearch load balancers mi Alma going through a mic change okay let's do that now before I get started

[Music] all right hopefully we're good so going back to AWS so we have diamo DB ec2 instance load balancers elasticsearch the complex world of I am which we've simplified for now and there's definitely more work there RDS route 53 or DNS and s3 quick overview of I am we kept it pretty simple because we weren't really focused on those questions am questions that much I basically the policy worlds and I am language is very complex so that's an area we're keep investing but at the base you'll have an AWS account our platform does support cross yes so you can have multiple a DBS account so we have a relationship from AWS to the users the group's AWS roles

the infamous access keys and AWS policy as well so we kept it pretty generic on I am just the basic things for s3 we have the relationship back to the database account the s3 buckets and the S ACL as well and we do some enrichment on that so we analyze the bucket policy and on also the S ACL to infer that that bucket is anonymously accessible or not as well as providing information which methods are available so that's kind of a very simplistic enrichment that can provide immediate value you can get some of that information from the console but it's not necessarily as detail is what we provide so very simplistically in neo4j you do match a DBS account

resource as three buckets that is an anomaly accessible equal true and then you can return account name s3 name and anonymous actions so you get sort of a table view results which you can consume as an analyst but you can also consume as basically building some automation ec2 is an area where we had a lot of question initially like what is exposed who owns what and so on and so forth so we invested a little bit more our schema is a little bit more I would say mature in this sense and has better coverage so we do say the same thing as AWS accounts ec2 reservations ec2 instances and the ec2 instances are connected to their

network interface auto scaling group subnet security group as well and we also connect security group to IP rules either inbound or outbound and IP range also so you get kind of a quick view of network the networking infrastructure as well as the ec2 instance so a very common question is people are asking is like what is exposed like you can run in map and so on you may get some decent result or a TPS is going to block you at some point so you'll get partial result or you can ask the graph a very simple question of okay if I have I want to see everything that's available that IP range that have basically an

inbound permission so you can craft that question and then automatically infer and enrich your your easy to know that instant that says it's in an expose yes or no right so what it looks like in neo4j and this is not a primer in cypher language it just to demonstrate the capabilities and the type of query that you can you can do so the way we've implemented that enrichment is matching the inbound rules of like 0 0 0 / 0 basically open to the world and create that mapping in the schema down to the ec2 instance down to the AWS account easy to instance or not always real connected to auto scaling group so we do

have an optional match there if it's connected we're going to get it as well and then we return the auto scaling basically the AWS account name order scaling group group names the rules like what is the what is the port range that's open as well as a number of instance open to that specific exposure in addition this is kind of an example of how we're leveraging this for automation is we end up building and exposure signature out of that so we connect all these all these dots and says ok let's create a signature that represents that exposure if we've accepted that exposure let's keep track of it let's say you have port 22 open on

certain machines and you say ok that's totally normal and you want to build instead of keeping this as tribal knowledge you want to keep track of like oK we've actually reviewed this we see we think it's ok let's create a signature so we end up building drift detection out of that where every time the graph sinks we have a baseline signature of all the exposure that were there initially on day one that we eventually kind of review and every time something else shows up in our environment we actually get notified it's a bit late you should catch these things as they're being pushed but it's better than nothing so at least we have that drift detection so what it looks

like again it's just a blurred view we have kind of a table when we run that query we have a table of the AWS account s G group and so on and as long as well as a baseline you can further extend that to exposed to something right so let's say you say okay we've globally identified their assets that are exposed to the Internet well there's other assets that might be exposed to very specific IPS out there so you can change that in and basically modify their enrichment or basically enhance it or complement it by looking at all basically excluding non-routable IPS and says okay give me everything that's not a goal so as you're doing

that further analysis you can start saying oh okay we've identified these IPS as part of a partner X or vendor Y and then you can start augmenting and labeling that and says exposed to partner X for example so you can start adding this we cannot solve the problem of figuring out the context for every company but what we can do is give you a language give you a method for you to actually augment that baked-in we can provide some common enrichment and the schema and then we enable you to actually make in your contacts within that platform so that's another way that you can do it if you're also interesting as an analyst hey I want to run the same

query and I want to kind of look visually at the data with instead of looking at kind of table and charts you can leverage an ear for J in terms of instead of returning nodes and edges attribute you can return a path so you can say P equal that path and then you can return it so this is what we're doing in the in terms of the P equal as well as our optional match p2 and in what we've done is p1 a P and P 2 and visually what it looks like is like this this is comes and bakes in as part of your neo4j stack and in free you can still connect this to more advanced

visualization I'm a big fan of link curious you can also do I think Tom Sawyer as it or other API as well but baten as of like day one you can get that by with our project quickly visualizing this there's we have three atps accounts which are represented by purple nodes I don't actually I didn't expect the screen to be so big so I guess guys can see so you have abbs accounts in purple and we can quickly infer that there's a common pattern in the middle or across those AWS accounts in a comic pythons are related to IP range and inbound permission so as an analyst you can run this query and quickly get

visualization of like hey there's a pattern that's being defined there maybe I can identify what those IP range are and then augment the labeling to expose to whatever that is and then over time you can continue again go back to drift detection over time you can sort of build and bacon that knowledge base in the graph instead of here say for example or tribal knowledge when I joined lyft there was a humunga-truck of knowledge and this is my attempt to start capturing that furthermore you can also look and I'll go fairly quickly because I think the concept is fairly hopefully understood where you can go back to the video we also have at live a

common naming schema for our order scaling group hopefully you have this in your environment if you don't maybe you have other problems that you need you may want to address first but if you have a naming convention which for us is basically service role usage and some sort of grid for our deployment you can do the analysis and do the enrichment and infer that this ec2 instance into production or staging and you can start looking at aggregation per service right so now we're starting to build kind of our own customizable map of our infrastructure in our platforms support set furthermore you can propagate that information to other nodes as well so if you have an ec2 instance that our

production staging or basically so either staging or production true you can propagate that information to your ec2 see a security group and ec2 subnets as well so this again goes back to enrichment and propagating that information and customizing the graph for your needs so in terms of enrichment is pretty simple you can take neo4j statements and combine them so the first one that we have here is basically identifying an auto scaling group that has a name that match a reg X that is looking for production and then if there's a match we label the group and the ec2 instance as production equal true and then the next one is okay give me the subnet and the ec2 security

groups that are associated with that instance and label them too as well you can combine these things and represent them as a ground what we refer as a graph job which is adjacent serialized version of our graph job class which is available in our project and you drop it in the analysis folder and every time the graph sink these things will run every single time so that particularly enrichment for example will enable you in a very simplistic way to look at Isis gaps across your environment you can ask a very simple question is mass or ec2 security groups that are both production and staging and return them so if any of them return for example you have

isolation gap or your basically your staging environment can talk to production you may accept that risk and again go back to creating some sort of signature for it but at least you have visibility they're going briefly moving forward a little bit on other components we have load balancer load balancer associated with AWS accounts easy to instance DNS records and load balancer listener so we have that and we do look at the internet scheme internet-facing scheme to infer in turn expose true or false and then we propagate that to ec2 instance we're trying to drive consensus consistency and in our labeling route 53 again that's your DNS world or at least part of it you have your DNS zone DNS records and

they are associated with elasticsearch domain ec2 instance and load balancer this came in particularly useful for a common scenario of we have vulnerability on service comm like where does it where does it land in our environment what's what's basically is affected so we can ask that simple question of a we have a bug bounty on you know API that live comm for example where's where are these things basically deploy like let's start doing analysis of the impact so that came in very particularly useful and I'll have another demo later where we actually even go further than that we have elasticsearch as well in an elastic search we analyzed the access policy to infer in an ex post or fall so

you you do get that labeling there's a lot of elastic search out there that gets that gets exposed and unfortunately many of you actually realized a little bit too late there's a lot of scanners out there that are looking for these things so that can become particularly useful RDS is pretty simple it's basically RDS instance and ADB SRA provides a public publicly accessible attribute which we leverage and label in and expose as well you can either further go down and see it at the RDS instance which security groups are enable and start we visit are going back to exposed to partner X&Y as well so you can further dig down the RDS instance is currently not connected to easy to

security group I think the person working on that will make that happen I think we didn't hopefully within the next week I can't see him but I wanted to point finger anyways connection from RDS instance to Institute security group should come in pretty quickly dynamo very simple what's your dynamo DB tables this can come in particularly useful to some degree for privacy looking at table names and so on and start digging a little bit further maybe at some point but overall that's what we're releasing as of today we've somewhat opened the floodgate now we have several Intel modules that we've completed internally we just need to kind of massage them and transfer them to open source some of those includes

github Jeff jump is kind of interesting connection of humans computer and policy in applications as well so there's good visibility there pager duty a good way of actually sending up attribution between service and owners like who's on call for a service usually gives you a ballpark of which team owns it workday HR world like who reports to who who's a director and so on and so forth becomes very useful for attribution and consolidation of that at some point envoy if you're using one of our other open-source project on void this became very useful when you start going beyond wws api that live comm slash something then we can actually have that map back to the

specific service because envoy is a reverse proxy G suite is come in Google Cloud as well so I'll go very quickly github this is what it will look like so we're taking the a BBS world and we're connecting it to github together so you have an ec2 instance and order scaling group and then we through our enrichment we've connected the ASG with the service in the github world what we've basically done is map github in terms of the information that we're interested and then connect the project to the service which is more contextual to us right so now from ec2 instance you can route back to the actual branch and you also have which may become very useful the branch

dependency in terms of Python library we only started with Python library we'll have like go and other ones as well but you'll get the picture in terms of like some of the information that's available so if you're interested in like which service has which dependency you can ask the graph and get that fairly easily 0 so the concept here is like I keep adding just nodes to the graph based on what we look meant it so we've connected AWS and github we're introducing JIRA which basically is a service has a JIRA project so we added a single node it's a little bit more don't look at to see sometimes single node will thank you a long time to achieve at

least 80% of coverage but we have service to JIRA we've added pager duty now we're diving into kind of the HR world with humans where we have a service now we have on call to the humans so that's another augmentation of the graph that we've done and then we introduced a workday which now has human can report to other humans we also have teams and then we connected the teams to the service that they own and membership of that of that human as well right so as you see it drives forward momentum every single investment that you're making and every module that you're we're either adding or that the community is adding will augment the

view and the possibilities and questions that can be answered over time so nothing is lost every time you just keep adding adding adding what does it mean so if we have a graph like this where we have a service we have four unfortunately some of you may not be familiar with on board but Unger is basically a reverse proxy so if we have a service you have the reverse proxy of which part of the Earl actually maps to a given service you have your JIRA project you have the human who's on call yet the team you have the ec2 instance all this information can become very useful in terms of incident response right so common questions like okay we

have an incident or an alert popped in like who do we bring on board like who do we call so we've actually leveraged this graph and we basically hopefully okay sorry this thing goes red some time so we build a slack bot and connected specific commands on the slack bot to queries so now an incident or anyone at the company that is interested in specific services can say who is the owner of that particular Earle and then the slack bot understands that question runs the query and returned the data so that's becomes useful for instant response becomes useful for bug bounty handlers as well like they know exactly where route and they also becomes useful

to actually kind of like bring the right people in place another piece of the graph which is basically connecting JIRA workday in the service world where we have or we have teams we have humans and we have JIRA project so if you're having JIRA project and you say ok you have all my dear project I know how to look at security bugs but I want to measure like how we doing you can still leverage this graph and because you've solved the JIRA project to service and then service the owner you can now measure like here's all my security bugs and you can zoom out to the org owner basically zoom out to org Warner or individual team so

you can take that and actually start doing some measurements furthermore like another example that we've had is through our investment with Von D B which is a vendor that provides a data set of all vulnerabilities across multiple assets so for us when we ingest assets we actually try to normalize through CP Eve naming convention and then we can query that that particular data set and get the vulnerability that avoids us from running like necess for example on our infrastructure like it scales a lot better we don't do networks camping at all we don't need to so if we've connected the HR world and the github in JIRA and Volney be together then we can at the bottom we have a risk

that affects a specific dependency so now we can start measuring that dependent basically measuring the debt that we've accumulated in a way that we can zoom in and zoom out zoom in is okay what service or what library zoom out is like who's the owner of that debt right and then you can start building out charts and so on which is what we've done so we call this basically intel juice we take intelligence information from the graph we have some jobs that are basically requesting those specific pieces of the map and then we store it for in a database for historical data and then we use our standard business intelligence platform which I lift this

superset I've done it with power bi at Microsoft you can probably use it tableau as well I've seen people do it so you can actually take that graph data set create some snapshot historical data what you want to measure and then use the other more powerful charting and so on and a platform that people are kind of accustomed to at the exact level to actually go and look at that data I see you I know I'm always done I'm almost done so moving forward we want to move more to close real time graph update for example cloud trail like instead of taking a snapshot like how can we connect as close as possible to resource

lifetime as they are created as they're being basically destroyed so that the graph it becomes like much more of a living thing cloud trail is an example of AWS we're looking at GCP as well and so on sokovia in AWS I was very impressed with zelkova presentation reinvent on the work that Bridgewater has done in terms of analyzing I am policy role during check-ins to see if it deviates from expectations so I'm looking at sokovia right now how can we actually leverage that or other components to codify more and more of our decision-making more automation much more module coming out and hopefully we get more from the community as well and I'm looking forward to do that to actually see that

this is more the red teaming thinking a Mayan is like looking at leveraging more of our Observatory and decide platform to drive actions so a grafter ring draft driven security chaos monkey or extension to bloodhound and so on is something that I'm kind of like interested in even though I'm managing kind of a global team in terms of security I still have that that very red team kind of automation automated weapon thinking so that maybe come in in the future thank you we're hiring if you're interested in helping us solve these problems I need help we need help so we're definitely always hiring as well thank you you