
this talk on extensible dev second pipelines with Jenkins docker terraform and a kitchen sink full of scanners by Richard McGuire I'll let him do the self introduction but he's from motifs create Richard hi I'm Richard Bullington McGuire my wife and I - aided so on the Bullington part and she's the McGuire part and I'm a principal architect at Lotus create we are a software product consultancy out of Reston Virginia and were one of the sponsors of the conference and over the last several years I've been working with dev sac ops concepts and practices quite a bit and I've wanted to show you some of the work in progress and share the repository that contains all of the consolidated
demos so go ahead and see if it'll present for me yes okay so how does all this stuff fit together okay you might have all these tools you might wonder no does this even make sense that you need to pick something that is going to run your pipelines provide an auditable path that other people can see now there's so many different ways you can mutate infrastructure out there and if you don't have a good trail what happened you're gonna be in a in big trouble when the next person comes along and tries to mess with things so I'm going to show a little bit of the I'm gonna show a little bit of the the the setup that
powers this okay so we have a we have a Jenkins server that talks to our github account and we have a repository in it that has a jenkins file so our folks here generally familiar with Jenkins is a CI system yeah okay you know it's it is not the sexiest CI system it is it's getting getting a little older a little creaky but it's in place in a lot of different places and it's very flexible despite its shortcomings it's it gets really heavy used in a lot of the accounts that I work with so one of the things that I've learned over the last decade is that you can use Jenkins to to act as like a task
runner and if you use task act as a task runner and you set things up right you can have it run anything and this includes running your infrastructure as code tools so the pattern that this repository uses is to be able to run a whole host of infrastructure as code tools to deploy and maintain the little baby applications that live in this repository so no I can I can run Packer to build a new am i using ansible scripts to to build up you know the right set of packages to support the applications i want i can create code deploy archives so AWS code deploy is one of these tools that lets you pull applications that maybe have a
complicated install process maybe there are legacy applications maybe they've lived in the datacenter under someone's desk for the last 10 years and now lucky you you have to get this stuff to the cloud and there is no way you're going to spend less than say a year twelve factorizing this thing and splitting it into a fleet of containerized micro services what do you do what's a practical approach this provides kind of a middle way so you know I could check some of the boxes here to tell it to package a new code to play archive for my latest for my latest codebase say deploy the current deploy the current one apply terraform if there are any
changes to infrastructure you know I could even tell it like if I've had terraform rotating a new ami to actually rotate all the servers out so that those use the new ami instead of the old one I can also run load tests from this so I'll run a little baby load test on this just like a two-minute one as part of this this load test won't run unless the software deploys correctly so and that's been an issue you know for me since about two o'clock this morning so the nice one of the nice things about code deploy is that it will roll back to the last known good deployment so you'll see in the code deploy console with all
the deployments that there is a huge march of failed succeeded failed succeeded where the new things I've been trying get rolled back and then the old the older version that was good gets put back in place this is one of the other nice things about code deploy is it lets you do this okay so I've got this in here and I'm gonna let er rip except of course if I'm I'm doing things that like would let me destroy my entire set of infrastructure I don't want to do that by accident so I've got a little CAPTCHA system in here that relies on modular arithmetic it's not RSA but it's very effective it like I don't just
press the button by accident so 8 plus 10 minus 5i I think that's their team all right okay I get it wrong it's gonna fail fast nothing's gonna happen which has happened in multiple of the demos that I've done you know with this stack okay so this is now happily running a bunch of software a lot of it is running as docker containers so this setup doesn't use docker to deploy the applications but it relies on docker very heavily in order to spin up Tara form itself the spin up ansible or to spin a packer rather which will then run ansible on an ec2 host in order to provision it and I'm just going to go
ahead and let this run right now it's doing some validations on this and it'll take a little while and we'll come back to this so why bother with dev sekai ops at all what you want to avoid is the swirling snowflake hell of all the servers that are maintained by hand that are patched by hand that nobody really knows how to know how to make them secure okay at best you'll be able to get a golden image and keep punching away at that and maybe you'll take your golden image and deploy that in your fleet and but you just don't you won't know how to get it up there or if you're really unlucky and I know that some of
you given the sponsors here must work in defense or intelligence you'll have to deal with what used to be called information assurance and I guess now is risk management framework so I hear daya cap is dead which makes me very very happy because that that security framework relied a lot on checking the boxes and not not necessarily assessing the risk of realistic security threats there are some benefits to doing that you know you want to make sure that you have a checklist driven system in some in some cases now the new way is to write infrastructure as code treat your servers as not as pets but but cattle you know you might nurse the pet back to
health but if the you know if the you know if one cow and the in the herd is sick it gets called and you hop a new one in its place or you know since since I'm a vegan I really prefer the house plants versus crops and allergy but it's never gonna catch on so the thing with dev sock ops is you use the same tools that you use to build out the infrastructure in order to ensure that you have a secure environment okay that's that's really the it's that simple so well how do you make the system secure no there is no one answer but a great answer that people will keep coming back to is defense-in-depth
you have firewalls you want to have intrusion detection systems maybe you definitely want to make sure you know what ports your your system is listening on so the code here has two different checks on this it does a check when you first build your your baseline image and in the architecture here you build kind of a baseline image that maybe would be useful for deploying one or more applications it's not the like tightest most focused way of doing this but it's very practical if you have an enterprise that needs to trust that golden image and make sure that it's gone through the security checkpoints it has to have before it makes it to production and
then maybe you put a PHP application on it or a go application or an ojs application on it or a java application whatever it doesn't really matter what goes on it after that but you this lets you focus you know your your efforts on that one image okay otherwise you have to do and images for you know for n runtimes and that's that can be really painful the you know I've presented on variants of topics from this repository before and the innovation here in this one is this also attempts to check things when it deploys they will find out if it actually work when we come back and if it didn't well we can dig
into a little bit and find out why it didn't work that this is an incremental iterative process so and with the safeguards that are in place you can continue to hack at it until you get things to work all the way through and then you can try to call them up the next ladder so what's not in here is checking on schedule or checking on demand there are other products that deal with that but it does not in scope for this so about me I've been doing stuff on the internet for a long time I've had my nonprofit the obscure organization on the internet continuously since March 1995 I had a twenty eight eight full time slip
account when I started out and now it's you know Verizon FiOS with you know tens of tens of megabits a second bandwidth you know and I I personally sort of shuttled between technology and business I've run small consulting companies I've a versatile list I do I know a lot about a little about a lot of things but when it's necessary I can dive all the way down to the bottom of the stack to to solve a really thorny problem and in the last decade a lot of these thorny problems have been in cloud deployments and scaling performance improvement so you know in the last last five years I've done a lot of work with AWS and
infrastructure as code I did a lot of work about performance improvement you know where I got to I got to improve the forms of these systems and then like performance and security are very closely related now you use the same kind of toolkit in some cases to hit performance as you do security you measure it's not quantifiable then your grasp on it may be weaker than you think and you also have to worry about validity you can do as many load tests as you that test your your site under a 10 user load but if a thousand people come and hit your site on the busiest day of the year you're gonna be in big
trouble in 2017 the scale of the challenges that I got to tackle got much larger and I helped a large education company migrate their 14 mission-critical applications from a data center environment where things were managed to pop it over to AWS and use the stock that was similar to the one you see here now more sophisticated in many ways but but it's very similar in the concept now it has Jenkins it has code deploy it as terraform has task runners you know so this this kind of thing is a very reasonable way to to migrate some legacy applications now with that app with that endeavor mandate was don't make security worse okay we already have enough
problems don't make it worse for us so like okay we'll take this a cops approach we'll make sure that we have CIS baseline remediations to our golden image we'll work with the security team get that scanned I have another talk that's also linked to in this that goes into that in a lot more detail if you're interested in that and there's a brand new thing I learned just this morning that it's kind of an impediment to doing the same kind of work that I did like over a year ago which is unfortunate I'll get to that you know and in the the year since you know I've done more work along these lines and built out this
demo which is MIT license you can use it and adapt it for whatever purpose as long as you give credit you know and retain the copyright notice so the education client that I dealt with I learned that they they actually called the the combination of terraforming code deploy Richards stack which really bugged me you know so I tried to find a better name for it the only a gram of Terra form and code deploy in the English language is corporately deform okay this stack is an ugly baby it's got a horn sticking out of one eye but it has muscles where it counts and if you need to scale to massive size you know and have tens of thousands of
simultaneous users you can use stuff that's in this toolkit to get there so you know this large education company ended up with no over 100 running ec2 machines at the at the peak and you know you can see just a little bit here like a terraform so terraform has whenever you make change to your terraform stuff it presents you with a plan to execute those changes and in this case we were making huge numbers of changes quite a bit it's like oh we were ready with the first six applications okay we have the next three out in two weeks what do we do I have new databases they have new new requirements for caches all kinds of
weird wacky stuff so we would do things and we would end up with with dozens or hundreds of changes with terraform and we were able to do this with successive environments now having a dev environment having a staging environment having a production environment and if you make sure that you use kind of the same kind of discipline you'd use when pulling an application to the east then you can have the same discipline about your infrastructure as code and the same discipline about your security controls so this was really large you know like the at least this is the probably the largest system that I've worked on in my career so far and you know some of you
are like that's not large hi bill if exabytes data all the time but for me this was a big deal and the peak load on this like for this the system that I dealt with it at the highest peak load had to have 2,000 simultaneous users and I spent two years getting it to that point very painfully like inch-by-inch and with this one it's like you know they already had tens of thousands of simultaneous users at peak and it had to perform when it was migrated off of their old infrastructure into the new one at the same level or better because you know like the whole company's futures on the line given past mistakes and this helped get it done on New Relic
to monitor the whole thing which was very important to be able to get it right there's probably other tools that could do it as well but this is the one that I've seen bring the best results when you know you have something at a scale that's on the line like what about docker what about kubernetes what about dr. swarm what about all these other things that's great but you might not have a year to rewrite your applications and twelve factor awesome okay you know the benefits that that stuff brings is huge but you're gonna have a huge refactoring in order to do it it might cost you know if you have a dozen applications it
might cost you three or thirty million dollars to redo all of that stuff in a containerized micro-services twelve factor off world and then you'll have a 30% decreasing your operational costs well you know like that's nothing compared with the development time sometimes so and it's really hard to stuff legacy apps into containers if you're if you need a writable store somewhere or you need something really weird in the setup like it's just not gonna happen no you're gonna have to find another way this this way works pretty well this repo supports local development it has vagrant and a ansible set up that allow you to kind of massage your golden image using ansible scripts you know before
running it on AWS through packer and then you know once you've debug your scripts a bit you can use your local dev to initiate a Packer run run Packer through docker and have it create a template image on ec2 a fresh ami from a vendor image and then apply all the changes on it so this particular stack is like a fairly simple nginx Python stack with a little sidecar of Ruby to help with some of the automated testing stuff so I showed you a little bit about this but but really it's not good enough to just do this on your local system if it's just you then maybe you can get away with it but if you're working with
the team of size two or more you really want everyone to be able to see every run of the tariff one does especially in your pradhan firemen especially when you apply the changes okay the the way that that this works for getting the applications to scale involves AWS elastic load balancing and auto scaling group so the modern apps should probably use the application load balancing and I think I had some of that in my notes in the abstract but this one uses the old-school elastic load balancer which has a few advantages and a lot of disadvantages compared with the ALB but it works now in all of the code deploy bits like they get assembled in Jenkins
no there's just a script that creates a zip file and it has a manifest and the manifest says where the where the code goes and it and it shows it shows it has scripts that have hooks like Oh before the application starts run this script after the application has started run this validation script and with with that with that kind of structured approach to deploying applications you know you can take very different applications and write scripts that will work for all of them under under Linux or under windows and I've used it in both environments it's easier under Linux by a factor of about 4 you really want to use Packer know or some equivalent tool to create those machine
images this is the bakery pattern now there's some companies like Netflix that take the bakery pattern to a further extreme so they might bake their entire application and all of its code for every release into a separate machine am i okay at least they used to I've read some articles about this and that's that's a nice way to do it but it's it's also inconvenient because you're going to have dozens or hundreds of a.m. eyes floating around you're gonna have to deal with a lifecycle of those they all take up storage you know the you're gonna have to audit which ones are running in production if you had one golden ami it's really easy to see like
oh I have 40 instances running in ec2 and 38 of them are the the new one why are those old ones hanging around I don't care I'm just gonna kill them off and let get replaced by the new ones during the bakery process run security scans this is where the dev SEC Ops comes in and at least part of it so the this repository has two types of scans built into it open s cap and gauntlet so open s cap is one of these security tools that's come out of the kind of Red Hat world and the world of kind of big checklist driven security so this will this will spit out a beautiful checklist
report for you if you if you use it correctly and gauntlet is like a Swiss Army knife is anyone used cucumber No maybe with selenium to do automated testing for web applications it's one of these domain-specific languages that makes it easy to understand kind of what a what the steps in a test are so so you know so getting a lot of blank stares on that one the the nice thing here is you write some simple features with scenarios given and when you know this is the result and and when when you're done you know the test runner goes and runs steps that correspond to each of those things and the test is green when it's done and you
know that it's worked well okay so I have a couple cucumber reports here for the interim failures that I encountered this morning while trying to publish my talk okay so this these are from gauntlet so you know I started out over here I'm like oh what is going on with my with my stupid testing tool this was working great before and now it doesn't work at all oh wait I didn't try to run it like this I'm trying to run it a new way so I have to make sure the environments good in this case it couldn't write to some temp directories so I did just a bunch of C shown and chmod stuff on that and got it to the
point where you know the only thing left was like it can't find in the XML file okay so the last commit I did it's intending to fix this up so I hopefully this one will run clean we'll find out soon enough you know like but you can see that there's plain English in these scenarios okay like well I do this the output should match blah and all the gauntlet stuff is built like this so you know the people have built like attack components for many different tools I just have n map and a like a file verification tool here but there's all kinds of great stuff that you can dig into for that the the open s gap stuff
has has another set of report so I have a good Packer build here which will show one of the open Escott bits like is this gonna show up as raw HTML or is it going to show up nicely if I click click on it I guess we'll find out oh sure it up nicely okay so secure configuration of Red Hat you know 1 X 7 this is this is a report that that a manager or a security officer would probably love especially if it had more than 9 items on it okay one of the real bummers and that i found is that the the definitions for the security scan I had been using or no
longer ship with CentOS this is like a much much weaker set of stuff and if you want the full-blown security test definitions you really need to be a center for internet security member and that is kind of expensive on an individual project basis you like they'll give you a license to do it on one project for about a thousand dollars but if you're an organization it's like membership on a sliding scale so if your giant defense contractor you're gonna pay them I don't know how much hundreds of thousands of dollars a year and if you're you know a mid-sized contracting company you're gonna have to fight for that budget it's gonna be pretty dear
when you use these scans you want to make sure you understand kind of where your scan results are before you deploy like no like what risks you're taking that your scanning tools come up with yeah it's just you don't want to go in blind with it so gauntlet you know I showed a little bit of that now it's a ruby framework tool so you have to have Ruby in your test environment somewhere I chose to put it like on the host that's being tested which you know is being in but has drawbacks to maybe you don't want to do that but the extensibility of that is really wonderful the big bummer again is like the the c2s profile doesn't ship with
Santa c2s was derived from an earlier Center internet security baseline it was a US government security profile but it's no longer there like weird my profile goes I had it I had like 60 different controls that were imposed with this and now all I got are these seven lousy controls so that was disappointing you can see more known all this stuff is linked into the main talked to so you can see all the links for the different the different talks in there and their slide shares you know in in the link to the the repo ok bakery scans gauntlet and open s cap so I showed a little bit of this about kind of how they were showing up but I'm
gonna show also show kind of how that how they're hooked in and where so the you can see here that this Jenkins output is from a Packer run that worked hi I'm like yay no this is in UTC is all good server times should be and it's embarrassingly recent but that's it is what it is so when you run Packer I'll make this a little bigger so you can sort of see it ok know it you know it like I have I have a bunch of scripts in here that do more than just run Packer you know I got I'll in tall the things I'll int the shell scripts with shell track ok so if you you so much as have some
missing quotes and you're in your shell scripts this thing is gonna be like I'm not gonna run any of your garbage until you fix the problems in your scripts and you should look in every linter you can for every language that is used actively in your project here so that you cannot have the development team screw it up by like being super careless so they can still screw it up but they have to work it harder so hacker when it runs spits out a whole bunch of output hey I'm spinning up an Amazon box hey I'm you know setting up all the key rings hey I'm gonna transfer the stuff over and then it runs whatever script
you give it so in in this system
the this comes in through through Packer Packer has a JSON file it defines some variable wiring and then it just runs some shell scripts on your on your target machine that you're building the image so you know there's some scripts here that will install the tools and then run ansible to do to do the tests now Packer has support for ansible like is one of its provisioners but if you're gonna walk your system down it's not gonna work you have a non writable temp directory that provision is not gonna work okay so this has a lot of stuff to work around that because I when you have to lock it down tight you also want it your
your scripts to work in an idempotent way you're like I'll lock it down once can I rerun the lockdown suite on it if the answer is no you're gonna have to rebuild it every time whenever you make change even locally you won't be able to do the nice thing where you have you know a terminal window that's open in your you know your nice little you know local vagrant environment here where you can run the same scripts it's really nice to be able to do that it's really nice to be able to do you know a vagrant provision and have it kick off the same ansible script you'd have running through Packer [Music]
and in this very end here like it does a whole bunch of stuff at the very very end it ends up scanning a server with gauntlet okay this makes very heavy use of ansible as a task runner but you could use whatever you wanted you could use shell scripts you can use chef you could use puppet whatever whatever corporate standard you've got for that stuff you could build something like this and then at the end it copies the files out of the host that it's just templated and back over into the workspace that's been checked out to run things so that maybe your CI CI system could archive it so you know the CI system archiving it is
also great to have and it's one of the huge benefits of running it running a stuff in CI because if you didn't if you didn't have the CI system archiving it you'd probably just throw these away every time you do this you can come back come and see oh this is this has worked did this one use that that lovely you know that lovely system for for doing the scans you know I could come back to one of my older ones and you know if I found one of these like this is probably it yeah the base line so you know from my previous thing you know the the C 2's profile you know this has look at this
this is great right it's like wow there's a lot of things in here yeah but this doesn't ship with it anymore so you're kind of stuck not having that now it sucks I don't know how to get it get it but if I've spent three hours of internet research I would probably find out moving on any any questions so far as audience has been very like quiet is it just that you're tired after lunch or am I really boring I don't know a lot of information there is a lot of information here this this like this is essentially like the quick version of the last like four years of heavy duty work that I've done for multiple clients you
know hardening okay if you are going to harden your image you have to decide are you going to do that hardening before or after you install your software maybe you have to loosen things up a little bit to install your software maybe you'd have to do that at deployment time if you're running SELinux if you're running it at legacy app you might have to disable SELinux temporarily to get the darn thing installed hopefully you can turn it back on afterwards but if you're using this CIS hardening baseline like there's a bunch of ansible scripts in here there right now are turned off because they're in it interferes with code deploys correct operation I have a bug in my
repo to fix that and I wasn't able to get to it before the this talk but we have had that working before my code deploy and the CIS baseline remediated images play nicely together but it was tricky it was very tricky because of things like oh there's no more writable or you can't have an executable file in your temp directory or there no it no sqv file is allowed on that in the home directories yeah so you know you run into some rocks and you need to do a lot of work and here's the big question do you want to fix all the upstream bugs in your vendor software that prevent them from installing correctly probably not
is that a good waste is that a good use of your time probably not so and if you're going to run some of these scanners you also have to decide am I going to fail fast on every violation or is there a certain level of violations that's acceptable a lot of Jenkins plugins classically like the violations plug-in have a threshold they say you know if you have 90 percent success on these tests it's good enough to pass if it gets worse then you're gonna fail so you might want to consider doing something along those lines in order to like start in a good place and then be able to push things out farther terraform terraform is super
awesome how many people have worked with terraform now like this how many people work with say cloud formation now and the equivalent on say a juror or you know Alibaba or something else okay like all of these systems let you provision cloud infrastructure database servers you know load balancers all that stuff by writing code and it is the way to do it in 2019 you do not want to be spinning up stuff in the console except as proof of concept test while you're on your way to writing this stuff terraform is also great because it has plug-ins for every major cloud provider out there it'll do amazon it'll do no Ezzor it'll do Google Google Cloud and many more you
can use it to provision your VMware machines if you want and if you're not using anything to provision your VMware machines you should strongly consider using something like this to do it you can use packer to spin up VMware machines you got it you have to run VMware machines locked you know
hi folks our geek here unfortunately we had another video slash audio freeze track - seems to be a problem child at this con I let the person attending the video station know to stop and restart OBS if this happens and we'll just do a part one in part 2 and combine them sorry for the inconvenience
you