← All talks

Tyler Parrot - Shifting Security Left

BSides St. John's35:4920 viewsPublished 2025-05Watch on YouTube ↗
About this talk
BSides 2022
Show transcript [en]

All right. So, uh, next up we have, uh, Tyler Parrot with the Great Shift Lifts. Am I using this? Doesn't matter. I can use this. I probably won't venture too much from here anyway. I'll just Okay. Yeah. Let me see. Okay.

All right. So, can everyone hear me? Yes. Okay. So, first off, um I wanted to thank the BIDES committee for having me here. Um it's been a while since I've been in Newfoundland and I'm a big fan of both this conference and the province, so it's always a no-brainer to try to get back here. Um I'm going to be giving a talk today about how we can leverage modern technologies like CI/CD pipelines like infrastructure as code and and cloud computing in order to get security involved at earlier stages of the development process and also how to make the security related data that's available in your environment more visible to you. Uh so I I know shifting

left isn't exactly the newest topic to be talking about, but um I think it's a really important one from a security context and I also think uh it's it's an area where a lot of organizations have room to improve upon. So I've been around security for most of my career. Before joining AWS in 2021, I spent 15 years with the federal government at CSE, so at the the communications security establishment where I was a cyber security analyst. Um, and in my experience, most of the focus for security tends to be on the detective side of things rather than the preventative side of things. But if you think about all of the different breaches and hacks that have made the

news over the last several years, many of those aren't the result of some hypers sophisticated nation state sponsored zero day, right? They're the result of poorly configured or unpatched resources and sometimes both, like Pat was talking about one this morning. Um, so I view cyber security a lot like I view my health. If I take time and effort to take preventative measures, then it's a lot less likely that I'm going to be experiencing an adverse event as a result. So the the CTO of AWS actually has a pretty famous quote that goes, "Everything fails all the time." And he says that all the time, too. And when he says it, it's normally in a a

high availability or a uh redundancy resiliency context. So basically what he's saying is when you're designing architecture, if you design it with the understanding that any component within your architecture can fail at any given time, then your design is much less prone to failure. And in my opinion, you can use the same mindset with security, right? Assume breach and work your way backwards. So we'll start with a definition of shifting left for those who don't know. It's it's a practice intended to to find and prevent defects earlier on in the software delivery process. And in a general context that means getting testing involved in the software development process. But for today's talk I'm also going to be

talking about how we can shift security left by um baking in security and compliance checks into our deployment stack too. So in my mind that that involves getting your your security teams, your operations teams, and your development teams to to not only spend a little more time together, but also start understanding each other's languages a little bit better. I find it it it's quite common for for these teams to kind of be off in their own corners, in their own silos, doing their own thing until there's a problem with a deployment. And I think that's not just a missed opportunity for for these teams to understand what each other is doing more and why. Um, but it also has a

penchant for making those interactions between those teams a little more negative than they need to be. I'm not nervous. You're nervous. like so. Anyways, this is a quote on screen that I really like and I I think it's a good primer for for this talk and I kind of want you to keep it in mind as as we go through the presentation. So, if you're getting up there in years like I am, you probably remember a time where you you come in to your desk with your coffee, you sit down, you SSH into your favorite server, probably named after some Battlestar Galact Galactica ship or something, and then you type uptime and you were hoping to see, you know, a big

uptime. It was kind of a point of pride, but it it really shouldn't have been seen that way then and it definitely shouldn't be seen that way in a modern IT environment, right? So instead, I think teams should be looking at their their environments and their infrastructure in a more ephemeral way where we're where patching and iterating on your environment take precedence over uptime. And unlike when uptime was the point of pride, thanks to technologies like CI/CD pipelines, like infrastructure as code, and like cloud computing, we now have mechanisms available to us that make that patching and iterating easier without affecting the availability of our deployments. So, I'd also like to point out, it's no

secret, but I am an employee of AWS. Um, so some of this talk today is going to be centered around AWS specific technologies and services, but everything I touch on is universally applicable. Doesn't matter if you're in cloud, on prem, whatever. So I just happen to know AWS the best. Okay. So at the core of this concept is data, right? And having data available at all levels of your deployment. So can we can we design deployments to answer questions like are we deploying in Canada or are we performing backups before we give the green light to actually deploy anything. So as it turns out depending on how you design your architecture and your deployment the answer to that question

is a pretty surprising degree of yes. There's there's actually security information available at every layer of your environment in your deployment. Right? If you're using infrastructure as code to manage your infrastructure layer, then you have data that is both human and machine readable. So that means it supports both manual and automated analysis. Now at the language level, your choice really matters and knowing what languages you're using in your deployment also matters. Um so for an example there, since it's a memory safe language, Rust tends to be the language of choice for Amazon security sensitive projects. So for example there our firecracker VMMM which is what underpins our lambda and Fargate serverless services are written in rust.

Now next is understanding your software dependencies. Right? So understanding your dependencies is really important from a uh from a security posture standpoint. Right now according to Sonatype's state of the software supply chain report up to 90% of modern applications are composed of OSS components. Right? And we already heard one this morning log 4j. I'm sure that's seared in most people's minds now. Um so it's important to know like where are you vulnerable right? is that there in my in my uh infrastructure. Um, next we have workflow information, right? So using CI/CD principles u means it it gives us insight into not only who introduced changes into our environment but how and that's often via an immutable audit

trail, right? and using tools like GraphQL for your API that not only allows you to make input handling auditable from the outside of your environment, but it also gives you insight into where you can be more specific about the inputs that you're accepting. So, we'll be taking a look at how to kind of leverage all of these aspects throughout the presentation. Okay. So, let's take a look at how we can shift left uh at the the infrastructure layer. And I'm just going to level set first. Right? So what is infrastructure as code and how can it help? Right? So as a formal definition, infrastructure as code is is the process of provisioning and managing your

infrastructure and your resources by writing a template that is both human readable and machine consumable. So the templates are usually in YAML or JSON format or JSON adjacent format. Um and some examples you might be aware of are Ansible, Terraform and Cloud for something AWS specific. So basically it allows you to to provision infrastructure using code, right? So genius naming convention. Uh so infrastructure as code has a lot of advantages such as making it easier for you to to edit and deploy environments in an automated and consistent fashion. So that makes it easy for you to set up multiple environments for things like UAT, for prod, for testing, uh deploy entirely new environments in the case of disaster

and it can also help with uh compliance and security as we'll see in just a minute. Now uh some of the other things it can help you with is uh returning your environment to a known state at any given time. And I think that's a pretty powerful con concept from a a security point of view. And another thing is testing at scale. Okay, so I used to work on a production environment that had something like 700 cores, several terabytes of RAM. We took millions and millions and millions of files in a day for for analysis. And on the other hand, our staging environment had uh it was a 2U server, something like 12 cores, maybe 32 gigs of RAM, and um it took a

tiny tiny fraction of our deployment traffic. So, I can't tell you how many times we push a new feature or a bug fix to staging, have everything look hunky dory, only to have everything catch fire as soon as we push to production, right? So, cloud computing, infrastructure as code really help with that kind of problem. Um, IC can also integrate directly with your CI/CD pipelines, which means you can manage your whole deployment from top to bottom or bottom to top um, using CI/CD. Uh, and since it is code, then you can treat it as such, right? So that means things like uh, you know, feature branches for for new functionality at your infrastructure layer. And as I just

alluded to, IA is multi-purpose, right? So, not only can you use it to to edit and deploy environments in a a consistent and automated fashion, but since it's code, it can be linted or it can be ingested into other services for compliance purposes. Now, so in my younger and more daring days, this probably would have been a live demo. Um, but now I'm older and risk averse, so I'm not going to attempt the live live demo gods today. So, you're getting screenshots and I apologize for that. But this here on screen is some JSON code. And what it's doing is setting up a storage bucket in our S3 service. So if you're not familiar, S3 is our object storage

service in in AWS. You'll see I'm I'm giving it a name. I'm making sure to block all public access. And then I'm making sure that if the cloud formation stack or the IC stack that creates this bucket gets deleted, then my bucket sticks around. So this here it's infrastructure is code in its purest form. Okay, so let's chat a little bit about the AWS cloud development kit or as I like to call it cheating at IA. So while AWS does use cloud form for deploying infrastructure, our CDK allows you to to define and build your infrastructure using programming languages, right? And then it automatically generates those cloud for cloud form templates for you as part of the deployment process. So

what it's doing here, it's providing you highlevel constructs for pretty much anything you can deploy in AWS and it's going to be using sane defaults for those as well. So for example, if you're using CDK, in most cases, you have to explicitly enable a publicly facing component. Um, CDK enables you to not only have all of your code in the same place, but also in the same language. And since it is in a programming language, it enables you to do things like apply software engineering concepts to your code, your infrastructure code, like unit testing, for example. So, we have another snippet of code here. This is a simple Python stack, and it's doing the exact same

thing as my JSON code was a few screens ago, um, with the bonus of creating an access control policy on my bucket as well. Now from here it's as simple as me typing CDK deploy on my command line to have this bucket created for me and CDK destroy would tear it down for me. Now the deployment process itself it goes through several steps before it actually deploys anything. So the first thing it's going to do is it's going to instantiate all of the constructs in my code and link them together. From there it's going to verify that my constructs are actually in a state that can be deployed in AWS. um then if they are they're going to be

rendered into a set of deployable artifacts. So that is my cloud for code or a lambda application bundle for example if you're using a serverless model uh and then it's finally deployed. So by the time the resources are actually being deployed in AWS your CDK app has already exited and the code actually the code here on screen is what I used to generate the JSON code from a few screens ago. So that was generated as part of the deployment process. So infrastructure is code in a nutshell, right? I have defined a piece of infrastructure in code which I can then check into my repository and treat it like any other piece of code which I

think is it's a pretty powerful concept in and of itself, right? But like I alluded to earlier, defining your infrastructure this way provides opportunities for compliance and security more code. So enter the CDK aspect. So aspects are a feature of the CDK that let you apply an operation to all constructs of a given scope. Okay? So they can do things like add tags to the resources you're creating or they can verify the state of a given construct. So in this case they're verifying the state of every S3 bucket in my deployment. Um so in this aspect actually we're doing a couple of checks. The first check we're doing is we're ensuring that encryption is enabled on

my bucket. Right? That's pretty important, encryption at rest. Next, we're making sure that if encryption is enabled on my bucket, I'm making sure that it is a KMS managed type. So, there are several different ways to encrypt uh S3 buckets. One of them being KMS managed. KMS is our our key management service, and it it just gives our users more control and oversight on their key material and the keys themselves. One of the other types would be S3 managed, for example. That means the S3 service itself is the one creating, owning and rotating your key material. So a lot of our customers prefer KMS manage just for the oversight and and the management they have on it. And as a reminder,

these checks would occur before any infrastructure was created or altered. So here's an aspect in action, right? it. I included the aspect with the CDK code I showed a few screens before and I ran CDK synth as a build step in my code pipeline. So since I didn't have KMS manage encryption enabled, the build step failed. So that's pretty cool, right? So let's take that coolness a step further. And yes, I realize that this is an uncomfortably nerdy concept, but I still think it's cool. So I'll defend that. Um, CDK nag is what I like to call compliances code. So, it's basically mapping aspects like the one I just outlined to compliance requirements in various different frameworks as

rules. And we call these sets of rules nagpacks. Okay. So, this is an open source solution and we have nagpacks for frameworks like NIST 800-53 HIPPA PCIDSS um, AWS solutions which is just a grouping of AWS best practices. And you also have the ability to create and manage your own nagpacks based on your own context and your own compliance requirements. So all of this can be baked right into your CI/CD process. And it it it ensures that every single time you create or alter anything at the infrastructure layer, then it has to pack pass compliance checks before it gets created. So, um, because I'm a sucker for punishment, I also ran the same code with CDK nag checking against the NIST

800-53 framework. And as you can see, it didn't go very well. Um, some of the things I missed were encryption at rest, encryption and transit, bucket versioning, and replication across uh different buckets or regions. Uh if you I don't know if you can read that there, but you might notice um there is errors generated for both S3 encryption being missing and KMS encryption being missing, which might leave you wondering, Tyler, if I need both KMS encryption and S3 encryption enabled on my bucket in order to pass my compliance checks, how am I ever supposed to get a compliant bucket with CDK nag? And I'm actually super happy you're wondering that because um CDK nag includes the

ability for you to suppress rules uh as part of this process. So in in scenarios like this and it also will print out uh a full run of all of the rules that were run against your infrastructure including the suppressed one with every build. So I actually found out about this suppression thing when I was stuck in this weird circular rule-based loop where I I basically I created an S3 bucket. I checked it in it. It failed my build step and it said, "Tyler, you need a log bucket for this S3 bucket." And I said, you know, fair play CDK neg. So I added the log bucket for my S3 bucket and I check it into my repository again.

Build step fails. It says, Tyler, you need a logging bucket for your log bucket. And it was then when I I sought out this solution and um you can you can see I was kind of suppression is useful in scenarios like that and and in the the encryption one as well. So, all of that to say, it's not perfect out of the box by any means. It does take some fine-tuning depending on what you're deploying and what your compliance requirements are, but it's a very very useful tool in terms of of compliance and security. Okay. So, now that we've shifted as far left as we can infrastructure as code, let's see what's possible at the

application level. So, I think this is an area that gets a lot of attention already. So I'm just going to touch on some topics that I think are important at at a pretty high level. I'm start going to I'm going to start by mentioning the concept of GitOps. And GitOps is usually brought up in in a containerized or Kubernetesbased context, but it can also be achieved with modern serverless architectures or or pretty much any architecture that can be managed with CI/CD pipelines. So the idea here is to manage uh or to treat your git repo as as a sole source of truth. Okay. And then you would have your change management occur via pull requests from or pushes to your

environment. And you'd have your servers and your infrastructure adjust themselves to the contents of your repository. Okay. So the security implications there are uh an environment where it's more easy to to tell the exact state that your environment is in and also an environment that is more equipped to to quickly answer questions like are we vulnerable? So it can also enable you to shift away from from pushing code to your deployment environment to pulling and having your environment request those pulls and that reduces your network attack service and uh it also helps protect your environment from drift. Now the reason why this concept is is usually brought up with Kubernetes is because Kubernetes can can really

streamline something like this through utilities like Flex CD for example. So, Flex CD is an operator that will consistently check for newly built containers uh that are the result of a merge and it never lets your environment deviate from the latest approved containers that are built via your CI/CD process. So if if you had a a a developer for example, they're developing a new feature for your UI, they would check that into your repository, create a pull request, it would have to pass build tests, code review, it would get merged down to main, uh a new container would be built, and then something like Flux CD would say, I have a new container for the UI.

It would pull that into your cluster and then replace all of the UI containers in your your cluster. And then your environment would never let your UI container deviate from that one until that exact process happened again with a new feature or a new bug fix. And I also wanted to mention tools like the the GitHub API and and the CodeQL analysis engine. So GitHub isn't just a place where you can centrally manage your code and and have your workflow data. Um it also understands the code it's hosting, right? and it records every single action taken on your repository, every every issue that's open and closed, every pull request, um you know, all of your

workflow data, and then it takes that and it represents it as data via its API. So that means with API calls, you can answer questions like what programming languages am I using? What are my dependencies? What are my vulnerabilities? And you can also pull your workflow data as well. Okay, so we just had a chat on zero trust and I think we have another talk this afternoon touching on it. So I won't spend too much time here, but the reason why you're hearing about zero trust so much uh is because it's a very important concept especially when you're talking about more modern loosely coupled microser architectures, right? So you should start by assuming that every component in your environment has

a direct connection to the internet and work from there. So that's the the everything fails all the time mindset, right? Um and this means that there's no areas within your deployment where any component should trust any other component. So gone are the days of the mode and castle security model and thank the stars for that. So from there every interaction between your components should have some form of authentication like mutual TLS and you should have encryption everywhere from your client to your cluster everywhere within your cluster or your environment and then strong encryption at rest. And I will say that too HTTPS no matter what I don't care what the context of your application is if it's unclassed you

should still be using HTPS. Um, if you're using a containerized deployment, uh, every commit you should make should create an image. Every image should be tagged with a hash of that commit. And then every image should be checked into a registry. From there, you should be continually scanning every image you have in your registry for vulnerabilities. And the reason why I suggest you tag your your containers with a hash of a commit is because with CI/CD, you already have a strong mechanism for rolling out new containers. But in the case of failure, having these tags on your containers makes it easy to roll back. So not not only easy to roll back, but also it it

enables you to have a an easy mechanism for to start your investigations into bug fixes and and incidents as well. Now we should also be scanning your dependencies for vulnerabilities and updates. So utilities like dependabot are very good at this and they can also help automate the the remediation process as well. So this one is more is more mentioned in in a cloud-based context. Uh but I think it's a really useful one and and it's also one that not many people focus on. Okay. So just to be clear when I mentioned account in an AWS context that is an entire environment where you would be creating users and deploying resources. So um you don't have to make any trade-offs

or compromises with your design decisions. you would have a very predictable and as a result controllable blast radius for your workload. Um the users that have authorization to modify your environment would be very obvious to you because they're the only ones working on that deployment. And it's also easier to determine what consequences uh changes would have to your environment since you're the only one using it, right? So that means you can also have different guardrails and controls in place depending on the workload, depending on the context and and depending on the use. So as an example there, I I don't think I know of any customers of mine that use the same guardrails for their dev and their

security environments. And and that makes sense right now in your accounts. Uh Stefan mentioned this earlier in a different context, but you should have continual audits of your users to ensure you're enforcing the concept of lease privilege. So your users should only have the exact amount of permissions they need in order to perform their tasking and nothing more. And the other thing you should be doing is purging users, right? So if you have a an employee leave your team or leave your organization, then you should be sure to revoke those credentials. Um I think last December actually there was a breach at Cash App which is pretty well known and they had uh an attacker breach

their infrastructure exfiltrate data that had customer information on it and as it turns out the the attacker was a former employee that still had valid credentials on their infrastructure. So that's one of those scenarios where that hypers sophisticated nation state sponsored zero day isn't required right that he did plenty of damage without that. And finally we have kind of a hodge podge of remaining concepts that are still very much important. So the first is automating your certificate provisioning. So this is pretty easy to do with utilities like Amazon certificate manager but equally easy to do with utilities open source utilities like let's encrypt. Now, if you're using Kubernetes, you can go a step further and terminate your TLS connection right

inside your cluster. So, there's that zero trust topic coming up again. Uh, next, you would be using minimalist host or container images. So, only install exactly what you need in order to run your workload and nothing more. And if you're using containerization, ensure that you you use a minimalist base image to start, something like Alpine, for example. And finally, test-driven development. So test-driven development is essentially relying on your software requirements being converted into test cases early on in the software development process and then you you track your development um by continually running these test cases against your software. So too often I find the test cases and your your your unit tests that you write for your software are kind of

tacked on at the end of a feature development process and that leaves a lot of room for for gaps and for errors as well. So all aspects of your application from the lowest level of infrastructure all the way up to your HTTP headers should have test cases written for them as part of the feature development process. And then from there you should be continually running your tests during development. You should be running them every pull request and then again on merge. Okay. So let's see what all of this has given us, right? Like how does this shift left help us overall? Now in my opinion at at the infrastructure level it means that all of your infrastructure is defined as

code which at a bare minimum gives you a strong mechanism for automating your deployments and for your disaster recovery. Now on top of that since we have infrastructure as code and infrastructure as code can be read and interpreted elsewhere um we have the ability to bake in our compliance checks right into our our deployment process which helps ensure that we never deploy any infrastructure that doesn't pass compliance checks. Now I added a bonus here of a library of compliant resources which could even be broken down into different contexts or different classifications. So why bother writing a new S3 bucket or a new load balancer, a new VBC if you already have a compliant one that you can just copy and paste

into your application, right? Um I think the other really important factor here is is building compliance and security into your infrastructure layer has your developers thinking about security right out of the gate, right? So let's say for example, you have a new developer on your team. They create an S3 bucket. They check that into your repository and it fails build step saying you need encryption on this bucket. Then they'll add that encryption. They'll check it back in. It'll pass the build step and next time they go to create an S3 bucket, encryption is going to be in their mind, right? Uh next would be combining your your IA, your pipeline to support zero downtime deployments. So

deployments can support a rolling model that can replace your images or your functions or your containers without affecting the availability of your application. And you should also be able to do the same in reverse in the case of failure. And that's where containerization again comes in handy and so does versioning. Now, uh, push button compliance reports. So, one of the last things I did in my career in the public sector was a compliance exercise. And some of you may know that the term SNA or a security assessment and authorization. Um, I had to take screenshots of my console settings and I had to fill out what seemed like a never-ending spreadsheet full of different security

controls I had to have covered. And I remember just thinking how difficult that would be for the compliance team to determine whether or not the controls I'm approved to operate under are actually applied in my environment using those artifacts. Right? So the the whole idea of an SNA is that you're supposed to you you get authorization to operate within certain boundaries and then you're supposed to continually monitor your deployment to ensure that you're still in those boundaries and if you exceed those boundaries. So let's say you open a new port or you add new functionality to your your service then you should return to the authorization step and then reassess. So either you rein your deployment back in or you

change your operating parameters. So in in my ideal world, every time that we would run a build, then we would get a compliance report printed out for us. We would have the exact controls that are covered in that environment at that time, right? And if you have your infrastructure defined as code and you have your compliance checks built in, most of the work is already done there for you. Um, for the application layer, here's what I see is kind of an ideal state. So it starts by using test-driven development, right? So basing your development processes around testing rather than having testing being an afterthought is going to have your developers thinking about sound principles right out of the gate. From

there we start using our code repository as a single source of truth. So if we know what is in our environment is an accurate reflection of what's in a repository then you can leverage your repository for useful security information. Right. Um from there we're also scanning all of our containers, all of our images. um for for dependency or and our dependencies for for vulnerabilities and updates, right? And we're making sure that we have a process in place that if any issues arise, then those things are remediated in an automated fashion. And finally, we're being paranoid, right? So, we're treating every component in our deployment like an island and we're making sure that every island has a pass

or every visitor to the island, sorry, has a passport. Okay. So what does this all mean for our live environment? Right? To start, the components of our production environment are are treated as ephemeral and immutable. Right? So as I mentioned at the start of this talk, we look at uptime as an indicator of risk, not as an indicator of pride and patching and iterating are our new indicators of pride. From there, we would have limited access to our production environment. Okay. So again, if we're using the the one account per per workload principle, then we know exactly who should have access to that environment and why. Um, I added a a bonus point here for an

environment that is only writable by your CI/CD pipeline. Okay, so when your application that is following test-driven development principles passes a new build to main, then that should trigger automatic replacement of all of the affected components in your environment. you shouldn't you don't really need a reason for users to have access to a live production environment if you're using CI/CD. Um and finally, um all of the existing threat detection and incident response apparatus you have in place still applies here, right? We're not replacing anything. We're always going to need doctors and prescriptions, right? Um what we've done here is just try to make it a little less likely that we get that far. But as an example

there, let's say if we take what we have above here and we have an environment that should only be changed by your CI/CD pipeline, then you would be monitoring for that. And if you know that something outside of your CI/CD pipeline made a change to your environment, that's an immediate red flag that would need to be investigated. And you know, at at the end of the day, this is what it all boils down to, right? A few years ago, NIST released this guidebook that looked at organizational cyber security from from top to bottom. And basically what they did was they they broke down all of the typical groups that you would see within an organization like your your

management, your IT, your security, your HR, and they outlined the cyber security responsibilities those groups have. And it's it's a really great read if you haven't read it. Um, but basically we're we're always going to have security specialists. We're always going to have security operations centers unless you're in Ireland. Um, and we're always going to have incidents to respond to. Um, but that doesn't mean that your operations teams and your development teams shouldn't be tackling security in their own way, right? So, making security more visible through data and throughout your environment is helping everyone tackle that challenge together. And I I wanted to wrap up by um just talking about my favorite thing about this conference so far. So, I I I

arrived in St. John's, a city I love. And I I get to my hotel. I go up. I open my door. The first thing I see is this big round window with this view. And, you know, I see the city. I see Signal Hill, the Narrows. Um, I just I love it here. But this isn't my favorite thing. My favorite thing is I sent this to my account manager, Jason. So, Jason's sitting over there and he replied with his view that in in his hotel room. So that's my favorite thing. All right. I thank you. And if if any of you have questions, I'd be happy to take a few. Yeah. Go ahead. Yes. Yes. Like anything like that. You

know, there's a lot of stuff I I I see a lot of organizations just take the path of least resistance. And a lot of times when you talk about things like that work but also add value then the answer is no because that's extra work that's extra things I have to learn and extra things I have to manage but anything like that like why wouldn't you right it doesn't cost any money and it it has plenty of advantages especially from a security standpoint right so yes please anyone else okay I guess you're all hungry so thank you very much for your time and enjoy the rest of your day.