Mary Racter - Vault on DC/OS: Secure Secret Management on Budget

BSides Cape Town35:33225 viewsPublished 2017-12Watch on YouTube ↗

Mentioned in this talk

Tools used

Consul HashiCorp Vault

Platforms

Docker Kubernetes

Show transcript [en]

hi everyone we'll be stopping the next talk now so Mary will be telling us about vault on DCOs secure secret management on budget friends I see I got all the devs yeah thanks for coming yeah so my talk is going to take you through some basics of good management if you are used to working with a in-house solution of your own you're probably gonna roll your eyes a little bit because this is a different approach but I hope there's still some valuable things that I can show you great so Who am I I'm Mary I work at precalc org which is a non-profit and it works in the health and youth sectors so what it does is it builds some mobile

technologies and it helps people with supporting their lifestyle like maybe if you're pregnant there's a service that can help you with that says hey your baby is like six months old these go to the doctor don't drink alcohol that kind of thing the other one is the youth portfolio which is very cool this is aspirational picture of what people using our service might might do you know log on to service with their phones and that gives interesting discussions about youth topics like my blessed does not want to use a condom what I do which is you know relevant to these today so what is the purpose of the talk to anchor this talk I'm going to give you a bit of an

introduction to secret management's from very basic first principles and hopefully introduce some cybersecurity tools and primitives for secret management scale and then the very last thing is like some learnings about how and how not to handle secrets because you can do it the right way and then you can do it the wrong way that makes people happy and do talks about so very basic stuff what is a secret you all probably have an intuitive idea of what a secret is basically it's some kind of knowledge or piece of information that's hidden from entities I've not supposed to know them and the knowledge of the secret is usually used to authorize people and really confirm that the identity is the same there are

examples of secrets in computing passwords are a private Keys encryption keys API took and so as I secret you shouldn't be telling them to boyfriend's so there are lots of ways to actually break into some software infrastructure but one of the most I mean one of the low-hanging fruit ways is to compromise passwords and to look for places where people have been handling them carelessly so if you're you know either being an offensive role getting passwords has always been you know one of the things that you do first just to see if you can get in that way cool and so let's talk about the anatomy of a secret not the regular Nats of you

but the anatomy that hackers actually care about so let's look at the attack surface so the attack surface can be comprised of many parameters but three very important ones that we're going to look at today is the temporal attack surface which is the period of time for which a secret is valid and that that means that there's more time for you to get the secret and win a lot there's the also the spatial attack surface so that's the number of interfaces for which your secret is balanced so if you're using the same secret on a lot of different interfaces you're increasing your spatial attack surface which means that there's more opportunities and more places where

people can poke and get at that secret the last one is the algorithmic attack surface which is the algorithmic determinism of the secret and accounts like people are likely to use admin admin as a algorithmic determinism and so the algorithm ik determinism is basically properties of that secret generation algorithm that allows you to maybe sequence it and figure out how it's generated and then you make your own values and yeah so a full 80 percent of data breaches you know like around that area are caused by silly mistakes and people being careless when they're handling secrets it's not that you know people breaking crypto necessarily being extremely sophisticated in their attacks it may be is secret management simple

injection very basic stuff so to anchor this talk I'd really like to tell the story about our infrastructure and if you've been to the dacha talk earlier be really great because there's some continuity cool so here's the example so I freaked out that bug we use a lot of open source software nonprofit not so much money things so what happens is we usually run out Python web applications as docker containers on the open source version of missus PSD CRS and what that is is it's a container orchestration platform so in the previous talk about docker you were there you know you were introduced to maybe how to run docker containers from the command line but what if you want to

do that at scale so what if you want your web applications to be able to serve lots and lots of people so we are ambitious in that way we're not huge scale but we are medium scale so we expect to serve millions of connections we expect to be sending millions of messages every day and so ultrastructure needs to scale to accommodate that so nice thing about class container orchestration is that it helps you to put your docker containers where you actually have resources so maybe you have so for high availability kind of things you have several servers and according to the resources and according to the availability of resources on those services you might provision

docker containers on one or maybe on the other so that's what an orchestration framework can do for you yeah and the other important thing is we heard we host our code bases on github so both our web application code bases and our configuration database is like like puppets we host those all on github and the conflicts are a private but it's still on get up so what does this architecture look like in practice this is vastly simplified but what happens is say somebody wants to launch a web app and it says ok so here if you're not very familiar with a class to set up or yeah if you're not familiar the Kuster I hope this is not

reasonably intuitive so what happens is you have some controllers over here and they have knowledge of basically what resources are available I mean worker nodes and some staple services what happens is somebody says hey please run this docker image with these parameters to the controllers and the controller is like hey everybody who has capacity maybe you work at Sears like y'all like it so then you end up running your stop and continue on working to attain orchestrations pretty awesome and what happens is as you can see from that particular container example 1 it might connect to some stateful services so in general like I don't want to be too prescriptive us about this but in general you don't

really want to run stateful services inside doctor because doctors it's meant to you know be kind of fail faster but you know if it's poop like you need to be able to restart it very quickly and if you have stains in there even if that state so usually we have stable services outside of that paradigm that those containers connectors if you need like a post or is our base where you rabbit and cue the hosts they do connections to those persistent services great so I'll containers run routes that need stable services as I mentioned and yeah I mean these are faces message queues but now they're ready for another stuff so how do these web apps get

access to the staple services why they'll with them to keep them against them with a secret of course oh so what do we do at the moment at the moment this is um this is what not to do you think at the moment we create some configure stateful services manually um using puppets so we write the public our theories get to push to github and then you know the puppets agents on the host wanna configure run those convicts up any usernames and passwords required on these services are described in the puppet config which is a nugget helping poke and you can probably see might be an issue but first what we do on redeploy is we then copy paste those

credentials from the repo into the environment variables when we try to launch our containers so what happens is hey please run this docker container by the way you need to connect to the Postgres database on particular dress it's called Postgres and as an environment variable username is admin and your password is admin that's how we do things these days as you can tell this is pretty risky but just clarify on why that is risky so there's two big issues here the first thing is that we're storing static credentials and github and the second issue is that we're passing in the credentials as environment variables and if you have been on the offensive side of things one of the first things you

get when you get a shell is like so the risks of passing secret as secrets as environment variables somebody manages to break into your darker container they can get your secrets environment variables are coming the experts and application logs so sometimes like you don't expect them to do it but do that many way about frameworks debug mode will display environment variables right before them by way about permits I mean like Python Django that can be put debug mode there's some there's some filters you can put in on what variables it displays or not but if you're working with arbitrary environment variables that's very difficult to control so having a lot of thing is the credential leaks

should happen if your process folks to interact with the third-party application and the third-party application hasn't does it access to your environment so this doesn't really happen that much in practice but it can happen the next thing is storing secure secrets and github static secrets in github so can I get a show of hands like does anybody do this like in a private repos anybody done this I appreciate your candor so here are some risks you probably already know this but just as like ready spell it out github is designed to preserve history so I'm remote credentials and git history are a point of exposure and as I'm going to mention later even if those

credentials are revoked they can still expose you to a little bit of risk asami I could have really wanted your C codes they could get them it's quite possible to make a private repo public by accident and that has happened and it is not good and it's very easy to expose more secrets than either to third-party contractors and interns it's like the intern it's access to maybe the one Postgres database but you're like okay I'll give you access to the some puppet repository right that's got credits to like all other databases but like just pretend and see them so it's very hard to isolate access that way whoa so it's really easy for those with

access to four can clone secrets wholesale that has happened with us before and it was not pleasant like this that's so weird but in the end if they wanted a database potentially and then as I mentioned earlier this very coarse-grained access control so you can't really say like oh you get access to this portion of secrets and you don't if they're both in the same gets out repository because that's how the controls are enforced the last thing is you need to rotate and revoke credentials manually which becomes really tedious for more stateful services you need to manage which brings me to the very important part which is it does not allow for security best practice of scale so things like key

rolling things like revocation that's heavily manual with this set up and in general just not a great idea so it's trying to that's right okay so let me outline some of the secret management tasks that you should be looking at so the first thing is creating secret teams created good secret admin Adam good storage of secrets so secure storage storing them in plain text the distribution of Secrets which is the the really fun stuff is to try to get the secret from your storage to your consumer security and the last thing is to manage the lifecycle of this input so when I mentioned the temple the taxon it was for the secret lifecycle management helps narrow that

so in terms of creation and storage we're migrating our secret creation and storage to hash table sports again of hands who's been looking at both poking at both ooh so both is very very exciting it's a secret management solution that is well maintained it's open source so guys with us and it's designed with high availability and container orchestrators in mind so it doesn't have components that does that but it has API that plays well with those kinds of sir what can it do for you can generate secure store and control access to tokens password certificates API keys and other secrets it's gone through a couple of independent audits and orders came out pretty good what it also does is can help you with

leasing here in vacation key rolling and auditing because it keeps some logs and this leasing system is tied to like the time where stuff was issued so auditing it becomes maybe a bit easier because you can narrow down the times where credentials were issued or revoked and the nice thing is that it exposes all of the stuff through a REST API which is the fashionable thing with which micro services might like to communicate these days right so vault is only one piece of the puzzle as I mentioned it pays well with all of these other workflows but it doesn't actually do anything in terms of um secure distribution so which brings one for the next part

secret management which is distributing secrets so what does that entail well its entails getting the secret from vault to the correct consumer you need to keep the secret safe from exploitation during transits and then scaling that secret distribution with large numbers of consumers because some with this is kind of container orchestration and things like that we're talking about so before I go any further I'd like to talk about some secret management primitives so one of these things is trust so I think I hope everybody has a reason the intuitive idea of what trusts might be but just to spell that out trust between software actors refers to maybe waiving the frequency or rigor with which an authorization routine is

conducted for privileged requests so an example of this in real life is you might consider installing some burglar bars or an alarm in your house but once you're inside the house like putting burglar bars on your bathroom as men you just prove so usually what happens is once you've gone through the front door and the alarm doesn't get set off that is a trust zone that you're in and you know you don't need to like authenticate yourself or use any keys or codes to get into any other rooms in terms of software some example preconditions are you know the same network trust zone so what this happens is like well if you can SSH into this

particular service your internal cluster and you can go to any of the other servers which is something that people always exploitable natural movements in contests and yeah the same kind of thing even in web apps a valid session token is also kind of like trust so you don't have to keep on putting your username and password there as long as you have that valid session token the web app trust those you and you can go around the web app without real pentacle to think about trusses I can be one-way or mutual so with like mutual TLS that's a mutual trust with just regular TLS that's one way and so what I want to introduce here is coupling in terms of

security which is a combination of mutual trust and degree of interdependence so let's talk about the cash dollar solutions that you could be deploying maybe you work for Amazon or something like that and you've got your cool empower secret management systems but in general what's out there is that many enterprise container orchestration platforms actually like have their own way of distributing secrets and most of that actually leverages exploiting some trans relationships to get the secret recipe so yeah it's mostly secret injection or maybe mounting like a shared volume in the case of Cuban 80s what you do is you didn't cook the storage and then mount it to the container and decrypt it and that's how

it gets its secrets or some of these other solution was mostly injected in environment variables and then stay with me there's also some open-source tools that leverages paradigm so it requires ecosystem buyer so there's end console that works on systems that use console as service discovery module so a console is a key value store and what happens with end console is that the key value store is then coupled with the secret storage and in the end it injects containers indexes secrets into the containers at but what do you do if you don't have control over your scheduler logic you can't afford an enterprise license or a product and or don't want to invest in the new orchestration ecosystem

yeah I guess you checks all three boxes so if you want to reason about this from first principles you might be like okay cool how about I launch the container and the container makes some calls to ask folks for secrets of the launch so the problem is that once you've launched a container you shouldn't be trusting it so with trust you can trust a container as long as you haven't launched it yet but once you have launched it and it's making calls your internal network that could be any object so you shouldn't be trusting your containers if the content authenticates devolved with the secrets how does that you could get there in the first place

so if you're not really doing the cool trust stuff beforehand you know it gets really difficult to get that first secret that makes sure that those containers legit the other question is how does well confirm the identity and permissions of the client container right you know they're solving this by trusting all the actors you can connect over a private interface is like possibly a solution but it is too coarse grain so what I mean by that is you might have different container classes that need different different access to different different levels of access to your stateful services if you just be like okay one size fits all if you can kick in a connect over this interface

then you get access to everything that's how that rule is natural movement happens so let's not do that well this brings me to this overarching concept in secret distribution which is called the security reduction problem and what that basically means is that if we can securely get the initial secret granting the container access to both or your secret management system in the container can securely fetch all subsequent secrets but how do we fetch this very secret so Jeff Mitchell which is one of the engineers at BOTS had a really good talk about one of the patterns that you can use which is the security option agent if you're interested here is his YouTube talk and he has like my clean pictures

over there and yeah I really highly recommend it the talk it's called secure introduction at scale think like a vault developer so what does this acure introduction agent do and what does it look like how does it fit and stuff so what it is is it's closely coupled with the cluster scheduler and maintains a mapping of the container properties for example launched app name or contain name - both policies what this happens is to minimize the attack surface of the initial secrets we use wrap tokens what is a wrap token you might pass well it's basically a single-use token whose purpose is to encapsulate a token value so this is one of very nice things about

wrap token is that it can bypass some of the concerns you might have with passing things on as environment variables it is single-use once the true token value is extracted the wrapping token is useless so okay who cares of an attacker gets it like it's not useful to that yeah and it also knows the risk of exposure through logs or lots of intermediary services that that wrapped up in my pass through so how does this look in practice so if you are familiar with DCOs or miss I saw it touched that ecosystem before this might make a bit more sense because I've got the schedule as matter from there or it can be nice if you don't you've got

to save some services as an example of Postgres here volts is all friend here and then the new actor here is the secrete secure introduction agent and at the moment we're using gatekeeper but I'm sure there are other agents out there that can do that so if you go ahead and say to the scheduler hey please run this target image by the way it means access to this Postgres I schedulers are cool and they launched the container so once I containers launch it's like hey everybody container you know I need a boat so I can to get my price grants credentials and what that does is it asks the secure introduction agent for that scary

introduction agent by the way like it's very closely coupled with the scheduler so yeah and the secure introduction asks the schedule ahead Cara is verbal rain container and that's how identity identity is verified so as you might probably start thinking now you're really into like look at your secure introduction agent make sure it's actually coupled there and secured so once it gets confirmation that Purple Rain is a little container then then it looks to its own internal mappings of the purple red container and they're both policies so what this does is it says okay well the container that is prefixed with purple rain' can log into vaults and read these defaults and Postgres credentials then a ghost of alton it says hi vault

it's me the secure introduction agent please make my friend a token with the default and Postgres policies here's my earth token by the way which I got at some points and if all goes well vault will pass back a wrapped token to the si agent says yep give it to your friend who can redeem it for the real token value it's important to note that at this point the secure introduction agent does not unwrap the value and that's that's for good reason because if it has any logs or if something intercepts that value and you know can't get access to vault that means that it's not useful for them who then the SI engine passes it to the

container as there you go Purple Rain have you taken and what the container can do is make some calls to vault to be like please unwrap this trick and give me the real value because at the moment you can't really do anything with it at value on bolts and then unwrap once that's unwrapped the real token value goes back to the Purple Reign container happy days it can now make Costa vault and ask it for credentials which it does in this next step high volt please give me secrets for Postgres here's my token by the way that's the wrapped value all goes well and both says there you go Postgres the username is admin isolate the admin and then you

can use those credentials to make constant sequences that's pretty cool but some you might notice that it's admin admin again like surely we can do better than this for all clients and connecting to that resource marry that's how lateral movement happens so let's try and kill this burn and a different bird with one stone next so once we've distributed our secrets it's time to manage the secret lifecycle so as I mentioned this is to narrow the temple attack surface better doesn't have some other surprising advantages what you need to do it miss you need to revoke secrets from entities no longer acquiring them these are of a compromised secrets and issues and new ones which is a key wording destroying

valid secrets and prevent reuse of the secret value and you can actually do all of this in one in a couple of fell swoops and here are the benefits that you get so you reduce the valid validity period of Secrets to narrow its temporal attack surface you can reduce the algorithmic attack surface by not exposing expired potentials with the same generation method you can reduce the reduce fulness of compromised credentials to the malicious parties and then you think about automating this at scale because scaling is interesting so how do we actually do this I'm going to introduce some primitives and some glue that puts everything together cool so the first secret management firms earth is dynamic sequence so our

dynamic secrets they're lazily generated when they're needed from one master secret so I'll explain how that works in the next section but the advantages is that it prevents hard-coding of secrets it prevents secretly reuse by automating new secret generation it supports automated renewal and rotation of secrets and the nice thing about this it does actually scale well for unique passwords in 1 to infinity resource client scenario so if you have like n clients that are trying to connect your single resource all of them can have a unique password that can be rolled that can be audited and you don't have to reuse the same credentials cool so an example of this is maybe trying to connect Postgres so secret

management services holds a master secret which is maybe username and password to a Postgres database so this master secret is authorized to create new roles it has the like create role privilege on Postgres when a consumer needs access to that database it requests a new set of dynamic secrets from vault both then authenticates to the Postgres database with the master secrets then run some queries to create the dynamic secret so it runs actually like create role queries and then post greater spits out some new credentials and the important part is familiar with Postgres there's a password expiry and so it spits out new credentials with an expiry period on credentials and then Bob wraps the new

secret with some metadata and returns it to the consumer what is this metadata you might ask is the second parameter which is visas so theses are metadata for issued secrets that describe the validity so each dynamic secret and auth token issued has a nice ID and it also has some info on you know like how how long does the secret have to live which is the title of value is it renewable can I renew the secret and extend its time to love renewing it this is allow the validity period of Secrets to be extended for secrets to the universe and then this is with a short TTL forces consumers to check in with both continuously to keep

the secrets from expiring so that is a really great advantage because it can automate secret cleanup if your consumers did it no longer renews those credentials and the credentials expire you don't have to do that manually at all so how do we put all of this together so in a very general sense but container launches it goes through the dancer it gets its credentials and maybe in your container you have a helper process that fetches the required secrets to a file or to the environment file maybe a bit better but honestly like if you've compromised the process of the application that's exposed to Internet like you know you can read it help the process also makes cause to vote to

renew the leases on the secrets so while the consumers still alive the helper process keeps thinking about and be like yeah please renew those fees in use like a library book that never gets returned and so as long as the container is alive presumably the secrets remain valid it's really great but if the content dies help a process start to renew your nieces and it's this if it's expired so in conclusion secret management and medium-scale open-source systems is still relatively unexplored there are a lot of solutions out there already but in terms of already mature solutions that's it's more of an enterprise space or like you know an in-house solution for things you want to do things open-source it's a

really great playground to have a look and think about these workflows in a pinch you can use your schedule as identity server for client side consumer secrets moving beyond storing secrets and cloud repositories is possible without you know paying fiat currencies let's spend some time to like make it work but you don't have to pay for your license or anything because you can also do this with open source tools and the last thing is pretty interesting which is that most secret management solutions for container orchestration platforms they exploit trusts and companies to distribute secrets so you're not I don't think you're gonna see containers authenticating back to your custom you'll contain orchestration apps and proving its own identity what it does is

it usually and usually exploits a trust relationship that happens before the container is launched so see if you can spot where it happens and if you can hear some open source tools that have to cobalt is there you can check that out for a secure introduction agent and it's reasonably reasonably okay it's called both gatekeeper Vsauce you can pull that and have a look at it it has a docker image I'm not very easily there's also both keeper so when I mentioned the agents that's inside the content of the benches and renew secrets there is a tool that we both caught both keeper which which is that kind of tool for docker containers and it's

especially good because it solves some problems that might be presented on Python and unicorn applications and then very lastly there's an console that's very cool that's also one of the things that is coupled with the scheduler but it injects secrets as environment variables thank you very much if you're King feel free to drop me mail at my email address cool questions thanks very much question Oh [Applause] anybody has any questions yes

no I think that's a very good point because if you want to be automating things at scale you want to be able to store these things involved so the one solution that we we want to be using is happening bolts which you know it leads you back to the same problem so at the moment I haven't really seen any good solutions for that but that is definitely the next step to make is useful for people any other persons oh no I wish yes yes actually bolts can use as a instead of an initial north token fault does have integrations for LDAP and earth as well so you can use those solutions as well at the moment like we do today to

have our own LDAP server we swallow these tokens but yeah you can't so feel free to yeah yeah

yes you can use it for others so when I made an example with Postgres both supports a lot of different backends that actually can use the tech hammock dynamic sequence from that both even access Leica it can actually be a certificate server view really why's it so it's got a lot of support for four key generation and stop

sorry yeah yeah so at the moment we we try to reserving HTTP from the cluster and this it's a bit of a pain trying to distribute this certificates on to all our load balancers actually so using bolt as a central place to store and distribute certificates is the next step and that's in the fantasy league we really like to do that

that's right

that's a really good question south of awe at the moment both has a lot of different modules that you can manage sequence with and you're right if you want to manage arbitrary secrets you will have to it needs a new secret back-end involves the process at the moments that they don't accept submissions or not so that's maybe a bit of a weakness they have a general generic storage back-end but if you're wanting to do dynamic support stuff you want native support for it at the moment they're not accepting requests because they don't trust people who you know roll their own secret stuff

yes and no so for databases there is a plug-in system and you can write your own plug-in and that's that's the judge for not that database things it's it's a bit trickier but it's got some people featured um you know the list of back-end offerings this is pretty good so maybe have a look stuff yes there was a question back there yes yeah

yes yeah

hsm I'm not so sure about but it does have does have support for encrypting like data streams and generating he's for that but maybe have a look at it because as always it is always adding yes hmm excellent yeah this as always there's a Enterprise version and then it's it's like yeah maybe a bit out of scope for the budget talk but I'm very glad to hear that any more questions but thank you very much [Applause]

Mary Racter - Vault on DC/OS: Secure Secret Management on Budget

Related talks