
feeling so I'm gonna do this talk in English because I can't really talk about technology in Portuguese it's really hard to say i'll be two words with some kind of buzz word so let me know if you can't hear in the back or if you have any questions I love for you to give them at the end I'm gonna leave a lot of time for you to ask whatever you want I always do demo so I have two small demos for you today and then at CN feel free to come by i'm going to be after the conference checking out a few talks so if you want to hack on something or show me something or talk
about more about certificates i'm going to be around cool so the title of my talk has an emoji which i think is really adequate because the audience that I talk about these days really is into snapchat and all these other things so I think I want to be more millennial and I really want to like connect with the audience so micro mutual TLS in microservices world mutual TLS is obviously for those of you I mean it's a security conference it means mutually authenticated transport layer security and it's effectively the way of using transport layer security would client certificate server certificates and authorizing each other the baseline for this stock is assumes that you know what
TLS is and then talk a little bit about the details and about the advantages but assumes what you that you know this and so if you don't please put your hand up and I will very quickly explain what this is about onward so a little bit before wow this was a dumping way too many slides before I go into the talk itself let me talk about why am I here and why I'm going to talk about TLS so for those of you that don't know me my name is yoga Monica I'm Portuguese which is great I am an alumni from technique is d I did my PhD in computer science and then throughout my PhD I was
actually flown to the United States San Francisco my company called square square is the startup that was founded by Jack Dorsey the founder of twitter it's a small credit card reader that he plugged on your phone and can swipe credit cards and essentially right now is moving 48 billion dollars a year I was the second person in security I went when we were 50 and I left when we were thirteen hundred I was there when it was a series a company and I left when it was a publicly traded company and now it's worth five billion dollars on the public market stock a change in New York throughout all of my experience at square the reality was that there were
no open source or closed source tools that were adequate to actually have security for a payments company there was a start-up and there was living after 2000 everything that was built was built on top of HTTP and had really no security for us so across my existence in the four years there was at square we had to build a lot of softer in one of the things that we had to build and that we did from day one was we realized that we should not be using the network where we should not be trusting the network and so every single call at square every single service to service call even to localhost used mutually authenticated
TLS with client certificates it doesn't matter if it was on the same host across host across switch across datacenter everything used mutual TLS so today I'm going to go over a little bit on why we do this and how we did it and where the advantages and disadvantages after square I left to a company called docker I assume that by now you know what doctor has have a few stickers for your laptop these are the small version of the sticker they're really cute so if you want them come at the end so my transition from square a lot of people ask me why he transitions the reality is everything that I built that square was
always going to stay inside of spark in a talker everything is open source and so every single thing that I've put in security into the container all of you are getting hair and so I went to doc her in the way that I looked at this was well why was it so hard for me to have a payments company and to actually have a secure infrastructure why wouldn't this be trivial for everyone and so my job at square right now wherever doctor right now is effectively leading the security team and making sure that all the great stuff there's been used at scale for millions of users and for 48 billion dollars a year it's actually available
to everyone else but when I talked her for two years docker has as you know a lot of traffic is very successful as the effect of expectedly de facto standard for containers and so we also following his top and we're building these things into orchestrators and to our images perfect so now let me actually talk to you to you about micro services and I'm not going to describe what our micro services were all sick of understanding and knowing exactly what the advantages and disadvantages and micro services are but we all know the concept of monolithic application bad home little micro services you can scale independently good that's effective what it is if it's a monolith it's bad if its
micro services in their small it's good whatever but from a security perspective there's actually some very concrete things and goals that you have to keep in mind while you're migrating your application from a big monolith to small components in micro services to use throughout your organization the first one is that from a security team perspective you have to think differently beforehand if you had a monolithic application all of the security authentication authorization had to be built in into the code business logic and authentication authorization security result was all part of the monolith because that's the definition of a model which meant that the security team did not have the ability of effectively create common infrastructure in common applications
that all the teams could use in benefit so one of the goals for micro services is if everybody needs secrets if everybody needs TLS everybody needs a tokenization service then you as a security team should build it and should provide to everybody else as a service the second thing is that for a long time we've known especially in the military that least privilege as a principle is something that should be followed but somewhere along the line we kind of lost our path and the security people just started building firewalls and just saying oh let's block IPS and block boards and now we have the crap that we have today unfortunately or fortunately microservices provide us another shot at
this and another shot and actually really starting to think about our infrastructures as least privilege meaning that each service should only have access to the exact resources that it needs to do its business works no more no less the third one is obviously better security monitoring if you have smaller services means that you actually have behaviors that are more contained and instead of having an intrusion detection system that monitors the whole network of your company or even monitors a whole host or a whole machine now with containers in our wits by small micro services you can mater one application so it becomes a lot easier for you to do effectively recognition of malicious behavior and
effectively pattern matching on what this application should be doing the final one is obviously the fact that with a ton of micro services it's effectively impossible to do manual deployments manual rollouts manual scalability and it's also impossible to do security it's impossible to effectively issue manual certificates we're trying to manage these things at by hand manually and therefore you need to automate the security of all of your development cycle not just of your applications for your production deployment in an end to infection okay so given all of these goals the major realization that I mentioned was that we have to stop thinking about layer 3 and layer 4 so network layers as the security binary this is no longer true
it hasn't been true for 10 years so we should stop insisting on buying firewalls the reality is that in the previous world we've all seen what's on top over there which is effectively we have an Internet and then we try to segment and segregate all of these different regions in all of these different areas and zones by effectively just plastering a stateful firewall in between them well turns out there's a lot of problems with stateful firewalls and if you actually go to any of the big players a facebook google so on and so forth the only place where they're using stateful firewalls today is the corporate environment nobody uses simple firewalls in production for the simple
reason that they don't scale there's really no simple far wall over there that skills today if you want to actual match the top of the line arista switch in terms of bandwidth and you want to have a stateful firewall that does layer seven inspection we firstly have to buy two hundred hollow Alto firewalls or whatever vendor you have for each one switch so effectively you have to buy two hundred times the cost of what 350 thousand dollars just to actually keep at line rate with what a switch is so the top is no longer feasible and therefore we come into a situation where the internet is effectively split between your internal network and your external network and everything inside
of your internal network is effectively one switch a flat natural a flat hierarchy the way that I think about this is if you know of Google beyond cork and the effect of the ideas of exposing everything to the outside having really got really good strong authentication authorization at the edge what I'm calling this is effectively be one prod where you should not be isolating your production environment in thinking that it's behind a hard shell that is your firewall and you should be doing every single service as if it was explosive internet and the security requirements for a service to be inside of your network are the exact same requirements for it to be exposed to
Europe the internet and this is effectively the model ladies thinking and you should be thinking and so the reality though is if you're in this environment and if you have a flat network and if every service can talk to every service and your database is listening on the same network as your crappy wordpress blog for marketing then you have a problem and this is why we have car rolls in the first place so the solution for this is obviously bring security to lay yourself and bring strong authentication and authorization to the application layer every single call between every single service of every single application of every single one of your data centers should be authenticated and authorized so you
should be you should be able to know exactly what application is calling in to you and you should be able to know exactly what privileges that application is supposed to have and to do this the major understanding or realization is you have a very difficult problem in your hands which is every single one of your nodes in your infrastructure and your data center now requires an identity right and I don't mean like a fancy like hostname like a cute hostname like KitKat dr. calm or I remember technique who had like girl names so Joanna or via threes dot is decom that's not what I mean I mean strong identities but our cryptographic and allow you to
effectively have a way of identifying them in sharing confidential information why is this useful well because if you have a strong identity then you can obviously establish strong connections secure connections and then exchange things like secrets later on for your application so it's obviously really useful and identities should be per machine at least we're going to talk a little bit about services and about how service and obligations should also have their own identities but the baseline is every single host or virtual machine that you have in your datacenter should have one of these and i am using certificates there i'm not using identities but it's effectively the same we're trying to go for an identity in general and something
that I despise the host but now obviously this comes to the old paradox what what came first right was it the egg or the whale and I know that a whale is an ml so it isn't just gone but what came first how do you would trap this yes I have a machine but how do i put the first certificate what is the secure connection that bootstraps the trust of this sure like if i can securely put a certificate and identity in there then i already solve the problem of the communication so there's a couple of ways that you can bootstrap this whole thing right well there you go there's a couple of things that you can bootstrap
this whole thing the first one is the understanding that you have a registration system or at least you should the registration system is effectively what is responsible for adding you notes your network and given a registration system that registration system should be responsible for actually minting your identities for every single one of your notes how does that actually work in practice well the way that I've seen done in the past I've seen it done a couple of different ways at square we had secure networks and vlans and switches where a machine would be indoctrinated and then serial numbers would be verified from hardware to an actual registration authority that would have been manually added by user when a
new Iraq is actually put into the data center in that network would effectively allow a machine to enroll itself in the registration authority by self generating a certificate csr obviously the component of private key and public key submitting a csr for provisioning system in effectively having a signed certificate come out of that so machine comes up trust the remote system to provide a certificate and then uses that prep another way that we can do this is if you're using a virtual machine you can make this into the image so technically could bake this in de mi you could bake a public key that is the public key that your system when its bootstrap is going to trust to act
we do this dance and obtained itself certificate so those are two ways another way that today people are using a lot is trust on first use so effectively like a duck the first thing that a duck sees it's his mom so even if the mom is well now the Ducks are all following the whale everywhere right this is effectively a system that is working pretty well today in a system that works for a lot of web browser web browser trusts like HST s and things like that are a little bit used a jesse s enter you can preload it but they're based on the fact that you can do trust on first use then from the woman on you
trust all of these coupons hbk p there's another example of something that you're being told exactly what it is so these are the ways that you bootstrap all of the trust on all of your notes and now that you have this now i have to explain to you why I wanted certificates in the first place again identities are not necessarily certificates but certificates are really useful as identities because you can simply use neutral TLS everything else to go from an identity to an actually secure secure channel that we can trust right and so now we actually have to go over the advantages and disadvantages of TLS and mutual TLS has a lot of advantages which
are pretty obviously mostly the first one it's supported by effectively everything everything these days has the ability of doing TLS communications and providing client certificates the key material stay secret both when you're actually trying to get an identity so creating a CSR sending the CSR which is only public content to get sign but also when you're doing a connection because you get to use diffie-hellman and you get to actually use your public key component to prove who you are as a client and improve for you are as a SERP so there's never private key material that is actually going over the network all these things are computed independently on each side the client and the server and obviously it provides
the CIA so our confidentiality our integrity in our authentication to the process in the connection unfortunately it has a lot of disadvantages so it is incredibly confusing for engineers so if you've seen an engineer trying to configure TLS or certificates be even on the non mutual TLS case just like serving a certificate that is correctly signed by a CA doesn't present a browser red warning saying you have to expect to accept this self signed certificate which we've all seen it it's actually really hard for engineers to understand what it is how to manage it I've seen people put the private keys inside of their pens I've seen all sorts of crazy stuff happen right and people obviously committing
private keys to get up which happens every single day everywhere the other thing is that if you're trying to do this on a machine basis if you have hundreds or thousands or tens of thousands of machines that is a crap ton of certificates how do you manage those how do you rotate them how do you distribute them securely so this is a huge huge problem that this system has another thing is that you are now running a PK up because we effectively need to sign certificates so you need private key that needs to be maybe in hardware at least protect it in a way you need to understand what revoking that private key means or how rotating
the product it happens or what is your panic situation something got compromised press the red button something has to happen so you have to have organizational maturity to actually want to be ki successful for example I didn't know if you knew but certificate revocation lists the crl that TLS automatically the majority's clients use to check if the certificate that they're using is revoked or has been revoked actually has an expiration so I've been in a situation where some unknown mobile payments companies infrastructure goes down because the crl ran out of and it was was effectively expired I who'd have known right now I know it's burned into my memory every time I go in and do any
kind of audit or any kind of consulting I go and check the expiration of the crl or its existence in the first place but the reality is that you think you're being smart by having a crl or instead of not having crl not bothering your think you're being smart within your bidding because TLS is to the next point completely unforgiving if a crl expires if a certificate expires I can guarantee you there is no amount of reboots or restarts of your service that is going to bring it back up this is now how computers work with TLS you reboot it and it's still down because it's just that and forget however this is actually the best that we have today and so given
all these advantages give all the advantages we still should be using TLS because it's everything that we should be used and have I mentioned that it's unforgiving because you really really have to understand that the moment you turn this on it's impossible to turn it off okay so now let me talk to you about how we actually are solving this at darker darker 112 came with built-in orchestration what that means is that when you use docker effectively instead of just running containers and stopping containers on a local host you actually have a full-featured orchestration system built in so why just doing swarm in it you can create a manager and by doing swarm join you can join you
workers to the to the swarm and effectively have a distributed system that runs all of your containers in the distributed fashion and so when I was working on this project I actually thought about a lot about this problem of certificates how to do the indoctrination how to do rotation and I decided look I really want to solve this problem once and for all and for everybody else at least for all the doctor users so let me talk to you about a little bit about how we actually did it and how the system actually works so in swarm there's two constants of nodes there's a manager note which is really important managers are part of raft they
have a quorum you can add more managers the usual number of managers is either one if you don't care about high availability three if you can tolerate one failure or five if you want to tolerate two failures you can also have seven but it's effectively a raft forum all the managers are the highest privilege nodes on your cluster in there really small compared to all the work so three managers can run thousands of workers which is the second kind of note that you have the first thing that happens is automatically out of the box when you bootstrap this you do swarm in it you get a CA for free it is a self-signed CA I'm going to describe how
we do the bootstrap but you get to see it for free so the manager has its own certificate has a certificate authority and it's constantly starting to rotate the certificates and I'm going to show you a demo with this it's like a thing that happens is when a new node wants to join what it does is it does is yes artists generates a public/private key submits to csr and it gets assigned identity that is actually issued note that there is actually a token that is involved there I'm going to describe how the token actually allows us to authorize joining the cluster so this is how a worker talks to a manager to get an identity so at this point both
of those nodes have TLS certificates that allow them to use mutual TLS with each other the third thing is obviously workers and managers are always identified by certificate what this means is that the certificate actually has embedded on the oh you and other attributes what the privilege of the note is in the cluster so the certificate of a manager describes that this note is a manager in the certificate of worker described it as others a worker which means that there's that doesn't need to be a centralized place of authority when you say these are the managers these are work the workers because the certificates themselves prove who you are because they were minted by a certificate
authority that says no this note is this privilege and then finally the effect we have all the communications between the cluster that are using mutual TLS out of the box you had to do nothing to configure it and I'm going to show you how easy that is I mentioned a token so when a Joe and when the node wants to join the swarm wants to join this orchestration in this network it has to provide a specific talking in the way that we did to token was pretty interesting it's not that complex but the first thing is we started with a prefix that all tokens have that are common an easy reason for this is that
your security team now has a specific string that against search on CBS or on CBS aren't VCS github com etc to see if any one of your developers actually committed a secret token it to get out so it just makes grep easier that's all it's doing the second thing is obviously a version that's self-explanatory but that's the third and the fourth I'm more interested the first one is actually cryptographic hash of the actual routes yet what this effectively means is it turns the issues of downloading a certificate from all the CAS to all the notes easier just because now you only have to share this token and this dokin allows you to ensure that you're joining
the right CA and that you're bootstrapping that you're downloading the root CA of the right identity that's effectively how the bootstrap works once it's still going to share it securely now you know that you're joining the right swarm because nobody can fake that sha256 and then it's a randomly generated secret that you can rotate at any time and that secret is presented to the managers the managers are in sync using raft so they all know what the current token is valid and if a node presents a valid token then it's authorized to to the be issued a certificate so it's easy as this we turn the bootstrap into sharing of a token that has a few
characteristics that are always too good strap the system security and so I mean we're in terms of bootstrap you have a worker the first thing that the worker wants to do is to join the cluster so it provides the token sorry it downloads the actual root CA and with the token the component of the hash checks if the root CA of the remote manager is actually valid and if it is it now knows that it's not joining a fake manager but it's not being men in the middle because the certificate that actually got is the valid one the second thing that does number two is generates a public/private key generates a CSR and submits is csr
to be signed in the manager doesn't trust anything from the csr except the public and private key and obviously all the attributes are related to that everything else inside of the certificate or inside of the csr is ignored and the certificate that is issued is issued by the manager from the point of view of saying no you are a worker I am Telling You you are a worker and this is a certificate that you should be using they should be presenting everyone the third one is after you retrieve a signed certificate now as a worker you can talk to any manager and I don't need to actually put anything on the back end store because
any manager that receives a TLS connection with this new certificate can understand that this is a note and now the manager can you shoot a randomly generated identity inside of that certificate so now we solve our call every note that joins gets a randomly generated ID from the manager that is signed by CA and it can't be faked so effectively we bootstrapped are an entity for all of our workers in terms of rotation one of the things that is really really hard in TLS is the fact that you have to or Katie certificates and how many of you haven't had downtime or issues with certificate exploration where how many of you haven't had to scramble last minute because the
certificates in X part of two weeks so one of the issues about TLS in mutual TLS is that revocation is really really hard so what people are doing is we're shortening the length of certificates from years before to one year at 23 months to maybe one month maybe two weeks to in our system you can bring it all the way down to one hour but if you have certificates expiring every one hour for every now there's no way in hell that this can be a manual process and therefore i needed to built in a good very solid certificate rotation for you so this is what we did we don't we no longer need a certificate to the token to do the
original bootstrap because you already have an identity so what you do to rotate a certificate as a node is you connect to a new endpoint you do need to TLS you prove that you own a certain identity and now you get issued a new certificate with a slightly longer expiration time so it's effectively woody validation of certificates as time goes on because you can do mutual TLS and if at any point you're down on for long enough for your certificate to expire very effectively out of the cluster that is totally indistinguishable from an attacker stealing a certificate and then trying to use it sometime in the future so automatic certificate a shin is a dolt
for you worry we increasingly effective leery validating and adding a new exploration we're not increased all the time it's just reissue a new certificate and we obviously wanted not to have you depend on RCA the reason why we have it is so bootstrap is absolutely trivial we wanted you to also be able to take these to your companies and support an external CA so you can just point the managers and say manager do not use your local certificate authority just use this for about one that I have so if you are sophisticated enough and if you have a company that already has a certificate authority then you can effectively use all the managers and all the same path
think of us as last mile delivery for the certificates that come from the back in and we still manage all the rotation of all of the internal nodes and the only thing you have to expose is right now a CF ssl kind of API for certificates on the back end and we do everything by just forwarding the CSRs back and forth so effectively we allow you to run automatically out of the box with a CA or you can bring it up so now let me show you how easy this is to set up in practice let me show you swarm kit swum kit is the base component that is actually imported into docker so a
docker doctor actually imports some external components they're all open source on their own swarm kit is a toolkit for you to build distributed systems that has all of this and now it's used into docker to actually provide this so I'm going to sit down first thing I'm going to do is I'm going to go over here is this big enough kick and make it a little bigger we'll have three terminals so this is warm kit this is effectively a slightly different branch of master the only difference over there is that I'm rotating every certificate every five minutes every saw five seconds instead of actually respecting the validation time the way that I do rotations in
swarm by the way is you define what the rotation period is imagine one week and then what I do is I renew the certificate from fifty to eighty percent randomly assigned within that one we can't window and the reason why I do that is because thundering hurts I don't want all certificates if you issue a thousand notes and you bring up a thousand Isles at the same time they're effectively going to have their first certificate issued on the same one minute or two minute or 30 minute window so by doing randomness on the actual rotation I get to uniformly distribute them across time and as time goes on I will not have spikes of issuance of
certificates and I won't have my managers go down because all of a sudden everybody decided to finish your duties so these are the small details that I've had to do in the past but I've had downtime for in the past and that I baked can has very opinionated default into the actual swamp so the only difference here again is this is sort get from a circuit with five second rotation window so the first thing I'm going to do and docker is a lot easier than this this is like the underlying component so it's a little bit more a little bit harder to digest but the first thing I'm going to do is I'm going to run a note this is going to be the
first manager this is going to be the manager that one ran is going to create a CA issue its own certificate it start rotating its own certificate for now it's going to be isolated it's going to be a net and manager of one with a network of one so once i run this you can see a bunch of things here it actually created a certificate it scheduled a certificate renewal in now every five seconds you're going to see a new certificate with a role manager being issued so you see these certificates are being renew old and all these are being scheduled and it's taking care of itself okay so now this network is not very interesting right
because we have one note and it's rotating it's on certificates so what actually need is to join a new node to the network in this it for this we need a token right we need the secret token that I showed you how it was composed so what I'm going to do is I'm going to inspect the cluster and I'm going to do swarm CTL which is a controller in select inspect the default and it's going to give me joint Owens one for the manager one for the work so if you want to join a worker no you just provide the token if you want to join a manager no you just provide the manager took it everything else is transparent
free and you can also do promotions and emotions as I'm gonna choke but for now let's copy this working token we have it and now let's join a new note and how I how do I join in your note well it's as simple as saying create a new node join to this address and this is your joint Tony and the joint token is the thing in white so once i press enter now i have two nodes and our swarm and you can see that this note is an agent is actually a worker and it's renewing certificates every five seconds you see the role swarm worker and every five seconds is renewing surge of got automatically and
i can show you this in two ways number one going to list the nodes of the network we have a manager here which is the first thing that we created it has a certificate authority it is the one that is responsible for issuing certificates we have a note that is effectively not doing anything here and it's actually not responsible for running anything right now the cooler part is if i do this i do openssl inside of a watch and now we're actually looking at certificates so now what you're seeing what you're expecting to see you have a note that is a worker and therefore the oh you of the note is worker vou is here
this is when we expect it because we're expecting the sir ticket of the night of the worker the second thing they were expecting is a randomly generated identifier that is going to identify this note for the lifetime of the node in the cluster and so every five seconds you see the certificate rotating but notes the portions that are actually being rotated which are the portions that show and what the only thing that is being changed is obviously the public key product key material and the serial number of the certificate because all of the other attributes are attributes of the note the identity is constant the worker is the rule is constant everything this constant except the
actual certificate so you see that this is the actual rotation that you would have to do manually happening every five seconds for every single node in the network with no connections dropped you could have 10,000 notes connected to a manager and the manager can rotate all the certificates wrong under the connections and nothing gets dropped because everything is being handled in a transparent fashion and in atomic fashion so this is really cool but now let me show you this is actually working so i'm not going to stop this watch i'm going to go to a new window over here and i'm going to do the following i'm going to do know what LS we see our
manager we see our worker and i showed you on the swatch that this oh you is worker correct because it's a worker note so now I'm going to do is I got to do something I'm going to do a promotion swarm kit and docker has the concept of promotion which is as I told you managers you can increase the number of managers to have a high availability cluster when you made this super easy for you to build this and so you can promote any normal node to be a manager or demote any manager to be a worker now and when you promote them to be a manager now they become privileged you just replicated your CA you now have a
high availability CA and you have high veloute system because they're all participating in a quarrel so the moment I promote this note by providing the ID a thing their way in a seat is we're going to see this changing oh you now is for manager nothing else changed but now the privilege of this node in the network is for manager and it keeps rotating the certificate every five seconds and we can actually go back and actually just demote the note and say you can no longer be a manager and at this point it no longer has any manager privileges in the actual in the actual cluster anymore and it just got demoted to the enormous and you can do this
transparently scale up and down and all the certificates always being rotated in the identities are being taken care of this free so you never have to worry about certificates if you notice we did two things we needed an it and the manager and we didn't join in a worker from this moment on everything else is they can care of for us so why is this important well the reason the manager question if the manager crashes no certificates can be renewed and this is why you should have multiple managers because if you have multiple managers and you have a high hill though you see ever automatically you can't be promoted because a manager has familiar because
if a worker could promote itself to manager then the worker would be imagined so from a security perspective workers are completely and privileged which is by the way I follow these privileged here the worker only has access to the resources of the work it needs if the worker gets scheduled a container it only has access to the secrets the resources the IPS and the code of the containers that needs to run this nav access to other networks is now have access to whether I imagine isn't have access to anything else only the managers are privileged and I'm actually going to go forward with this and I'm going to go least privilege all the way I'm going to make the managers
eventually even trust it so I'm going to make the workers not even be able to trust the managers such that the manager can't force the worker to run arbitrary code anyway from a first pin but authenticate it uses mutual TLS it has a certificate connects to an end point % supplying certificate thus the dance you create a manager that need to be to work talk it what why don't like because you don't have a secret token the secret token has a component of a secret that only you have that you only you provide it to your your Amazon Cloud watch whatever or to the expense to the thing that scales your notes and only that
token is allowed to actually join you nose into the network so that's effectively a secret randomly generated code that nobody else has except you and that's how the initial communication goes at least it should know so yeah on the manager with a manager privilege so if you have access to the epi of the manager and you're an authentic administrator with root on the note that is running the manager then you effectively can list the code if you were a worker I couldn't list a good perfect two now at this point let's wait for the endin at this point we effectively have workers that have their own identities so every single manager and every single worker has its own
certificate that it can't participate in the system and say this is the random idea that I am in this is the privilege I have in your network but this is not what we want this is always has always been about application identities this has always been about applications and services connecting to each other so what we actually want is not one node one ID we needed one node one ID because we want one app one ID what we actually want is your orchestration system to provide you automatically identities for every single one of your applications automatically so what should be happening is you have an orchestration system you deploy an engine X you deploy a proxy you deploy your Ruby on Rails
application do deploy whatever it is and your orchestration system should be responsible for getting a certificate with the properties of your application and securely deliver it to your node with the certificate of the note so now these certificates that we built for notes were just a way to look trap what we really want it which is which is every application has its own certificate when its own properties exactly like the notes have and everything is rotated automatically this is the Holy Grail this is the square model this is mutual TLS between all the services and so we should be using mutual TLS for every single service authentication and this effectively removes all of the need for your
firewalls and allows you to have flat networks because you're effectively exposing TLS which is the same thing that you'd exposed to the outside in the first place so you already have the same surface of exposure and you're actually concentrating it into one system of authentication authorization the way this works is in the CN of a node you have a random ID that describes the node identifier in an application the CN disk law describes what the application actually is so in this case you have a go app that is called a Pio one for example it's a payments API or it's some kind of authorization API and then you have a database which is my sequel and
it has DB 01 these names have meaning the moment this go application wants to connect to database it connects the DB 01 and it will verify that the mutual TLS handshake is providing a certificate for something called the video on it can t be 0 to kent BTW bo2 can be DB 0 3 no other application or service or node on the network can fake that because another application it has that exact certificate and now that we have authentication on the applications we can go even one step further and we can actually have authorization so it's not just about me knowing that I'm connecting to a database or the API knowing that it's connecting to Redis
it's really about knowing what kind of properties the person or the service that I'm connecting two hats so we know that this API has acts as this DV sure we know the identities but we also want to know what kind of credit card what kind of database is it in what kind of environment hesitant for example this case that database is a production database I should not let my API connect to DB a one in stage because they're totally different systems and I also know this is a credit card database on you owe you the organization so if i connect to DB 01 on another of you this should not work regardless and if you
extrapolate from this you can now have something really cool we built something called guard dog which was the framework that were effectively a framework that was across all of your applications everything was transparently taken care for developers developers had to create one file to have mutual TLS work the moment of file like this was created they were describing how their application behaved within our network number one they were describing that the resource user slash user can only be can only be accessed through agate and it can only be allowed by those three kinds of applications so if there was an application there wasn't web fulfillment or payments it could not access this info the same thing for post on user the
same thing for delete on user so I just restricted that the only application that can delete users is the web application and so this ACL is using the properties of the certificates of the services which in turn is using the properties of certificates of the notes to actually have a secure chain of exactly what your application should be doing and by the way this does not replace higher level tokens with its innovation and macaroons and things like that what this replaces is it totaled replaces your firewalls it allows you to do micro segmentation to the HTTP verb all reliant on self-regenerating and rotating certificates that aren't handled for you automatically so now we need one more thing for this to succeed
we have a crap ton of certificates now we need a sane way to get them to the applications and this is where i also get i get opinionated again and let me show you a demo of what are we actually building into square and something that we just merged on docker master so we go back to our actual demo we still have our certificates rotating in the background we have our worker in our manager we're going you here as I'm gonna create a secret I'm gonna create secret create this is kind of big what I'm doing here effectively is an echoing some kind of string into a command called secret create do go txt okay so now I have a
secret and I can show you that I have a secret by doing secret LS and you can see that I have a secret that was created has a certain size and has a certain digest so what we created with docker was we created a secret command that allows you to add secrets remove secrets and associated secrets so if i am to create a service services the concept of containers in swarm you're not running one container you're running a service which could be read as could be my sequel could be whatever and then you can scale it with multiple instances or lower instances so if i create a service if i do service create you can
see here two things that are interesting number 1 i'm going to do service create i'm going to say that this service has three replicas the name is test one and it's an image that is alpine it's a Redis alpine so I'm running three instances of Redis and I'm saying that I need secret do go txt but more than that you can actually say I need super deal with tea XD but inside of the actual container I want to expose it as a file called yo go to doc txt so if you run this you effectively have a service I can show you that we have a service call service LS this is our service and now
if I do inspect of the service we actually see that the service has two secrets it has do go txt exposed as do go txt and yoga to txt so what this actually means is that inside of docker when you're running darker you're effectively going to have this once you run a container of Redis you're effectively going to have a secret call under VAR secrets and whatever name you want in my case it was do go txt in this case is a SS credential that idea why am L the way that this works is it's a temp FS that is created inside of every single one of your containers it only contains a secrets in memory these
secrets are never never sent to disk they're never written to disk and their dynamic they're created when your container is created in there only sent down to the worker nodes that actually need to run these containers no other worker node can have access to any other of these secrets so now we have a same way of distributing secrets that magically appear as files this could be your certificate CRT or certificate or key or whatever it is the appear as a file you can load it on your NJ products you can lower load our new engine X you can load on your postgres can older on my sequel it's effectively just a file that you have access to and by the way
you can also control privileges and what users represent it as as the files in memory whatever it is but now that you have the actual way of distributing the secrets and you have these certificates that are automatically being generated swarm will in the next version create for every single one of your applications by default a certificate that you can use that is signed by the ca of the whole swarm and that everybody has access to so effectively where I'm telling you because the next time you want to run or when we release the next version of docker the next time that you want to run an ancient proxy that does mutual TLS for your application or the
next time that you build an API and you want often get your API the only thing you have to do is you have to point it devour run secrets service the CRT service sake and you have to verify that you actually have an ACL that matches the properties that you want that's it you get your certificates they're rotating automatically for you they have the description of the services that you describe and that you set up and everything else is taken care for you with these two things service certificates and secrets distribution you actually finally can get to mutual TLS by default in your infrastructure so that was my talk today thank you so much
more questions but where's the lifetime of the six tickets so yeah you get to define them by default nodes exist for three months three months and you can turn it all the way down to one hour the reason why I didn't allow you to do less than one hour is because TLS has a big problem which actually didn't describe or didn't say that I should have said which is it uses wall clocks and wall clocks are out of sync very often so in the original POC and the original code that I did I did certificates could be turned all the way down to 15 minutes but immediately we noticed that when we are applying across multiple clouds AWS
digitalocean as your google cloud platform and we have like a thousand nodes with swarm or rotating certificates immediately started seeing issues where nodes would be dropped off because 15 minutes they would have clock drifts that would not allow them to rotate in time and because they thought the certificate was going to be available for a longer period of time but the manager on the other side said nope this certificate is expired I no longer want to touch it so we effectively allow you to work we prevent you from shooting yourself in the foot and we made I made the call not we I've made the call that one hour is enough for you to have this if you want a more
conservative do a week early fall three months and what if ugly ones what's the key length of the Killens everything is using ecdsa it's the only thing I'm supporting right now and it's v 256 for all the certificates that are being a chic and one of the reasons why this is P 256 and not 384 or 512 because all the certificates going to be rotated every three months and all the certificates can be rotated every hour so if you are actually wanna be paranoid you don't have to be paranoid about key length you can be paranoid about a certificate duration which is actually something that I prefer for this for the reason that in go p 386 and 35 to 25 12 is not
a sm optimized so it's going to take you 20 20 times more to do verifications an issue and settle as a proposal my question correct so i used to 56 you should control your security level by controlling the time of certificates instead of controlling key link in the future as soon as assembly comes for p3 384 i'm going to help the CA and of every single certificate to that and then it's also a configurable paralyzer but it's again like I made some super reasonable assumptions so by default you have to care about nothing you spin up a manager you join nodes and you never think about again more questions I'll be when the name is illusion of the notes okay name
resolution of those Atlanta trips so swarm actually does that for you so the moment you create a service you already have an internal dns resolution within all of your containers that actually allow you to have name resolution automatically so there's a dns server that you query for DB one in DB one sends you what the IP address of a container is and we also have a routing mesh so it really doesn't matter so we use different ports for different services if you hit imaginary have 100 notes and you have a service running on 20 of them 148 eh you can hit any of the 100 notes it doesn't matter if they're running the service or not internally
what I do is if I receive a request on port 8080 on a note that is not running the service I do internal routing mesh and I for that to the actual nodes that are running service out of the Andaluz hitting him she sees the answer from know that he didn't work so actually everything is transparently it's essentially triangular routing so effectively see the original IP address and you receive from the original IP address from the from the perspective of the caller in the receiver it's effectively indifferent it's like the request comes directly but that's a great question other questions doctor about certificates of the security services so secrets are coming for dr. 113 and then service identities are
coming from 114 so as a doctor 114 your probly going to be able to just run a container and then lists VAR run secrets and you'll have certificates there for you that are ready to use perfect I'm gonna be here at DN who free to ask me any other questions thank you so much