
Check one.
Check one. Check two. Check it. Check. Check one. Check two. I'm like shaking a little bit. I'm a little shaky up here. Uh, that's okay. It's 3:01. I'm gonna start the timer and the backup
timer. Okay, ladies and gentlemen, welcome to Bides Rochester. You're in for a little bit of a treat today. It's 3:00. Uh it's a little bit after lunch. We're all kind of like nice and sleepy. If you've seen the weather outside, it's nice and snowy. We got some some real calm, real chill vibes out here. And that is not at all what this talk is going to be. Okay. Okay, I hope you're buckled up and ready for a wild ride. All right, seat belts on. Okay, are we ready? Okay, let's talk about this talk. This talk, it's going to be a great talk. And I'll tell you why. It's because this talk, it's being given by me. Did I
guess right? I probably got 50% 50% accuracy. You can't even see that arrow. That's okay. Uh, hi. Uh, my name is Ethan Witherington. I work for a company. I have a lot of co-workers in the audience. I'm not going to say the name of the company because I am uh I'm here trying to embarrass myself and hopefully not my employer, but it's easy to make the the association if you try. I'm here of my own free will. I am not being coerced. Uh I am not a North Korean insider threat and I am excited to give this talk. My first uh kind of piece of advice for new speakers, this is my very first bid talk. Uh if you're
nervous, if you have stage fright, just tell people you're excited. They can't tell the difference. I'm really excited to give this talk. I'm really excited to be up here. This talk, let's talk about this talk. Uh, this talk is going to be a great talk and I'll tell you why. Because this talk has a great title. Dev Sec Ops uh isn't real. It can't hurt you and other lies. Lessons learned from letting engineers approve their own pull requests. There's a lot of stuff to unpack in this title here. Um, oh, thank you very much. Uh, starting with Dev Sec Ops very much is real. It very much can hurt you. And I did definitely learn a
lesson from letting engineers approve their own pull requests, but it's not the lesson you think. And that's our our kind of first subversion of expectations here is that letting people approve their own PRs has been going really really well. Specifically, the safety mechanisms we've built in to get to the state where that's possible. And I'm going to talk a lot more about that uh kind of as we go. Uh I had a story that I wanted to tell on this slide, but I forget what it is. Uh pushing to prod is only part of a nutritious breakfast. Uh this is where we go into there are a lot more things here. There are a lot of uh
melon slices and little kiwi wheels on the side to make it a nutritionally complete quote unquote. Um and there's a lot of asterisk. You remember those asterisks on the title slide? So on the way here I had my laptop and my backpack and all the asterisks they fell off the other slides and they landed on the title. I don't I'm not really sure where they go. We'll kind of discover that as we go through, but this talk's going to be a great talk and I'll tell you why. It's because it has a great table of contents. So, let's talk about this table of contents. Uh, dev sec ops is a bad word, possibly the worst. Then,
we're going to rederive the entire field of economics from first principles. Uh, we're going to talk about the theory of constraints. We're going to go fast, but not that fast. This is my uh my cue to slow down. We'll talk about unit testing. We'll talk about integration testing, a very specific flavor of integration testing. Uh, DevSec Ops is actually a great word, possibly the best, but we'll save that for later. And then the talk actually starts around the 10th bullet point where we talk about safety controls and then that's it and then it ends. Uh, so anyone who didn't put their seat belt on in the beginning, now's the time. All right, jumping right in. Dev
Sec Ops is a bad word, possibly the worst. Never use it. It is the buzziest buzzword. DevOps is already a buzz word. Then we take security, which is kind of a buzz word. We make it more of a buzzword by abbreviating it to SE and then putting it right in the middle of DevSec Ops. You might be able to get buzzier with dev sec cloud ops or something crazy like that. Uh DevOps already includes security or it should when you say uh oh we do dev sec ops. I'm like do you did you not have security before? Um DevOps includes everything and I want to talk more about DevOps but in order to understand DevOps
we're going to have to rederive the entire field of economics from first principles. We we will skip some parts, don't worry. Um, so starting off, imagine you and five friends are kind of alone on an island, right? And then skipping ahead a bit, we can offset the collateralized debt obligations with credit default swaps. Is everyone still with me? We're good. Okay. So, taking that whole story from beginning to end, we're going to boil it down, and it all boils down to ROI. Here's the first slide that I actually stay on. We can we can calm down a bit. um ROI, return on investment. This is the goal. Uh this is the goal of all businesses. This is the goal of I mean
this is the goal. If you've read the book, The Goal, this is what that book's about. Um starting off in the beginning of the story where we're alone in the woods. Uh I'm a boy scout. I remember some things, but if I'm alone on an island, it's probably not going to end well. Uh you know, I'll try my best. But what I'm going to do probably in that first evening uh is I'm going to find some sticks and some twigs and some stuff on the ground and I'm going to try and put together a little bit of like a makeshift survival shelter. And it might take me like four maybe six hours. It's really not a lot of investment, but the
return is that I don't freeze to death. That's a great return for the amount of investment that I put in at the start, right? So that's pretty awesome. Uh and it tells us that investing is not a zero- sum game. We take our our people and we invest their time. We take our materials or whatever our inputs are to our process. We do some work and the result is more significant than uh uh just the sum of the the pieces that go into it. Right? I want to make spaghetti for dinner. I could just eat the spaghetti raw and eat the sauce out of a jar with a spoon. Uh or I could cook it and I invest just a little bit
of time to cook it and now it's so much better. Uh so ROI uh we all do investing every day in our day jobs. I I go into work, I put in a little bit of time, I get some return from that in the form of a paycheck. There's different strategies of investing. You know that there's uh you know there's time return or um like speed involved in there. And uh the place I wanted to get to with this slide is that it's streams and not just one-time events. We invest continuously. uh you could take your your day job in in months, break it down to weeks into days, into hours into minutes. It's a
it's a constant and ongoing thing. So, what I want what I want us to think about here is investing in terms of streams of value and not just a one-time thing, right? If you do it as a onetime thing and it works, do it again. Now, you've made a stream. So, uh more faster, more bigger, more better. I have a friend who works at Corning Glass. He works on the things that test the things that go into the things that eventually make your computer chips. These things are really really hard to make and very very precise. Uh they might ship one or two units a year but for millions of dollars per unit. So they're very much
taking the extremely high value but low speed path. They're in that upper left corner. Uh on the other side of the spectrum, I have a a friend who sits at a desk and he has a whole bunch of papers and the papers come across and he just he takes a look. He takes a quick glance, doesn't even read the whole thing. He stamps yes or no on the paper and then and then passes it along. Not a ton of value ad, just a quick check. Uh but he can do it relatively quickly. And that friend who sits at that desk, his name is Apaloto Firewall. The papers are network packets. He's looking at the packet headers, detecting them against a
check engine. And so he's doing millions, two billions of little network packet detections uh per second. Um, yeah. So, that's big value versus fast value. We'd like to combine the two of them if we can. We want to get to the top right corner of this graph. That's our goal. Now, adding value to your system is kind of difficult. And if you're at the, you know, on the factory floor at the ground level, we don't get to make a ton of those calls. Those really come from the business saying, uh, here's what would be more valuable to our customer. Here's what our customers would find valuable. Let's kind of change our practices a bit. uh
and they kind of design the processes and the work streams. What most of us have a little bit more impact on something else we can influence is the speed and the efficiency with which we get work done to move further to the right. So business goals are going to move us up on this graph and our speed and efficiency is going to move us to the right. And to talk about that I want to get into the theory of constraints. The theory of constraints talks about work and this is where we get into the uh kind of study of work and study of flow of DevOps. Uh the theory of constraints makes two relatively bold
claims and we'll dig into both of these. The first one is that all systems are limited by one and only one bottleneck. Uh not zero, there's always a bottleneck but not two, there's always only one. Uh what do we think about that? Right? That's a little interesting. And um the theory there is if there was no bottleneck, the system would have infinite throughput and the system does not have infinite throughput. So there must be a limit somewhere. And if there were two bottlenecks, maybe you could balance them, but one's probably more of a bottleneck than the other. So there's one bottleneck. Our second claim from the theory of constraints is that the throughput of the system can only be
improved at the constraint. And we'll dig into that with a little bit of an example. So this is coming from uh thinking about like uh there's a bakery near where I work and I go there frequently for sandwiches because I forget to pack a lunch and they're really good about taking your order. You don't wait in line very long and this can be represented by our first work stream. Uh big blue box says 10 that has a a work capacity of maybe 10 units of work per some unit of time. They can do 10 things. Then our next uh work center, this is going to be kind of the the cooks in the back. And through no fault
of their own, they genuinely have a lot of work to do. They can only do like five units of work per unit time. And then our last section, once those sandwiches are done, they package them up, they put them in the bag, and that center has a little bit more capacity because it's a little bit easier. They can do 10 again. Right? So, I'm looking at this and it's it's pretty clear to see what's going to happen here. First thing is that we're going to take in more orders than we can process and we're going to put them in a buffer. That's this pile of corn here. So, this buffer is going to start to b pile up
before our kitchen. We have a number of orders and during the lunch rush, that pile of orders is growing. Uh, and then the second thing is that that last work center uh is going to be work starved. Those people are going to be sitting around on their phones for about half the time. And that's again through no fault of their own. They just don't have more work to do. So, how do we how do we improve this system? Right? If we could add 10 more uh capacity units, we put them in the first work center. We're only making the problem worse. We've added more capacity, but we're making it worse because the size of that buffer is
now growing even faster, right? Let me tie this back to ROI. When you when work enters that first work center and you start to work on it, you're making an investment. your the money boomerang is being thrown and there it goes and then it's sitting in this pile and it's going to sit in this pile until it can eventually move through this pile and it has to get out the door at the other end in order to come back as return so that you can then reinvest it. So we want to accelerate this cycle really as much as we can. So buffers are buffer safety buffer sure massive buffer this big old pile of corn
here is no bueno now so we see how improving improvements before the constraint just pile up at the constraint improvements after the constraint pretty clearly you're still just work starved you haven't really added much the only place that we can make an improvement is here in the center in this uh the center workstream this sounds relatively simple this all makes sense so far right Yeah. So these two keys, you're looking for the buffer that's piling up and you're looking for the work starvation because in reality your work stream is going to be a lot more complicated. There are lots of exercises to kind of map your workstream. There's vaccine analysis. Uh this is blurry intentionally. We'll look
at something a little more concrete. Uh and that helps you identify what your your constraint is because you have to find it before you can fix it. There's really good news and bad news here. The good news is if you find the constraint, you can fix it. The bad news is you have to find the constraint first. So yeah, now diving into this flow. This is the flow of what a developer's investment of work might be. I'm glad I have my backup timer. My primary just failed. So this flow developer makes a commit goes to a feature branch. Maybe it goes through a PR to get to a dev branch. Maybe it goes through a PR to
get to a release candidate branch. Maybe it goes to testing and QA before it goes to staging. Then it goes to an intern with a flash drive. Then it goes to another staging. Uh my brother is the intern with the flash drive. He takes it from the first staging to the second staging. And I only recently kind of got the the vision of why they do that. It's that uh they wanted to test and make sure that someone with like no context about the code. My brother is not a programmer. He studied mechanical engineering. And um they wanted to make sure that he was capable of doing the whole installation process. So he was kind of built into the testing method.
uh but we might go through staging again to get through production. Right? So this is an example value stream. Now we'll apply the theory of constraints to this value stream. We want to take developers, we want to take those commits, we want to get them through the system as quickly as we can so that we can reinvest. We can learn. We can get our feedback loop going. We're accelerating here. Developers push to production. Let's talk about CrowdStrike. So, this is uh my photo that I took. I was in the Atlanta airport two days after the Crowdstrike event. Uh not really visible. Off to the side is the line stretching the entire length of the terminal, two or three people wide. Uh
the place was packed. Everybody was miserable. It was really a surreal experience. Uh there were there were attendants kind of in the hallways waving their hands in the air. They were like, "Hey, if you're here for a hotel, get out of line. We have no hotels. We can't help you with hotels." And the people stayed in line because they had nowhere else to go. I was very fortunate. My flight only had one gate change and an hour delay. So, I got to go in, witness this, and then leave. And I was very grateful for that. Not pictured is my checked bag somewhere else in the airport carrying the legal limit of Tennessee whiskey coming back
to Rochester. And uh yeah, crowd strike. The theory of constraints, it tells us about speed. It tells us how to build our product faster. It doesn't tell us how to build it better. It has nothing really to say about quality. You still have to be delivering value. You can deliver it quick, but there has to be value to deliver. Uh so with that, we have to talk about quality assurance. And we're getting closer to the the security aspect of this as you can tell uh and our quality assurance. So let's talk about testing and let's modify that last diagram a little bit. We have devs. They go to some decision process where we say is the code good. If it's good,
it goes to production. If it's not, it goes back to dev to be fixed. If the code is good, that path to production, I know it's a short arrow here. It could be whatever. It could be a CI/CD pipeline. It could be an intern with a flash drive. The point is it's on the road and it's going out the door. And now that you have this process, you can accelerate that. CI/CD is really awesome for accelerating that. uh if the code is not good and it goes back to the dev and you want to improve it, that's where we're going to talk a little bit about unit testing and how you do that. The point is that there's this decision
process. Is the code good? It either is or it isn't. It either needs more work or it's ready to go. That's the determination we want to make. The ideal state is being able to answer that question. So, let's talk about unit testing. Unit testing of your code is awesome because it's really, really fast. It's really, really easy. And it's really specific. It's going to dive down deep into some specific tiny little part of your codebase, some little some little section. Uh they're easy to write. You can say, "Okay, I know what this does. This makes a network call over here, and it needs to pass along three things. Test it. Does it do that? It either does or it doesn't." Uh it's
specific. I talked about fast. They're really fast to run. You can have thousands to like 10,000 unit tests and run most of them in parallel. It it it's really really quick. Uh so this here this example this is source lib components skeleton bone factory templates tibia java we're going to test that it passes push it to prod. What about fibulas crowdstrike crowdstrike had tests and I got to be honest I don't particularly care. There was only one thing that they didn't test and that's the thing I care about. Uh so we do need more than just unit tests. Unit tests are specific and you could say, well, why don't I just unit test everything, right? Who has a
codebase with 100% unit test coverage? There's only like one person in this room, I believe. Do you actually? Oh, that's awesome. That's really really good. I hope you're enjoying this. I I hope I have something to say. Um, but we do have to talk about more than just unit tests because there there is another type of test uh that I like to call a smoke test. Smoke tests are really good and they are a specific flavor of integration testing which is where you take your whole system, stand it up, get it up and running and uh you know poke it and see what happens. You want to make a git request to the app the front page. This is going to go back
and talk to the database return some data as long as it's a 200. Okay, you've implicitly tested so many code paths, right? The CrowdStrike smoke test could have just been we stand up a box, we deploy our update, if it passes, it probably doesn't take that long, but these are slower than our unit tests. This is a slightly slower feedback cycle, but they're really nice because as you add more stuff, as long as it's in the hot path, it still gets tested by your smoke test. So smoke tests are no longer particularly specific about what piece of code it's testing. It's very much more broad. It doesn't tell you where the failure is, but it tells you
if there is a failure, which is what we want for our decision point of is the code good, push it to prod. Uh, so yeah, back to this diagram, our PR review, when we're manually reviewing someone's pull request, this happens as one of the controls in this decision-m of the code. So now it's time to change my tune a bit. Say DevSec Ops is actually a great word, possibly the best. You should use it all the time. uh we know exactly what it means. There is no point in saying that a transmission repair specialist in an automative shop is just a mechanic. No, he has a specialization. We're looking into a specific area of this
field of study. DevSec Ops really is we're looking at how we can put security into DevOps. DevOps covers everything, right? That's really broad. We got to make sure that each piece is in there, including security. We're focusing on that. We're doing DevSec Ops. It is the art of going fast safely. So let's talk about our codebase. This will set up a little bit of context for our next section. So we're doing infrastructure as code, but we're not doing Terraform. We're not doing YAML files. This is actual legitimate code. There's a library out there called Palumi. I highly recommend it. It's an awesome tool. uh this infrastructures codebase is a it's a typescript mono repo under the hood and it defines the
cyber security operation centers of u multiple clients so we have multiple instances of these cyber security operations centers handling everything from log collection through to the sim some of the detection rules in the sim uh the underlying kind of like data lake architecture there was a hint about that in the keynote this morning I really like that pattern uh it's pretty important we are like big target for insider threats, right? If there if an insider threat was going to target somebody, I think it would, you know, we're a good choice. So, this code base is really important. Uh, and so I'm still standing here on top of this really bold claim that we let people
self-prove PRs, and we do, but we do that with the help of lots and lots of other controls. There's lots of ways to put the safety into the system so that we can get away with that, right? And another analogy that I like to use is uh you know I can go faster by making my car lighter. I can make my car lighter by taking the brakes off, but that's a bad idea. Uh but there are other pieces that I could get rid of. Uh if I can have some other lighter system to take over that role, right? So diving into controls here. We already talked about unit test. We already talked about smoke tests. What other controls do we need?
We need things that are effective at answering the question, is the code good? And answer it relatively quickly. Right? That's the the goal here is to go quick. So, we'll talk about automated uh PR checks. I'm making the assumption here that you have a a protected main branch that you can't land commits on. You have to go through a pull request process to get to this main branch. And this pull request is where we're going to apply most of these controls. You can adapt most of this to get flow models. I work in a a trunkbased branching situation. I don't want to dive too deep into to get models. Um but talking about uh control and automated PR checks, you
have built-in policies. GitHub. Uh we work in Azure DevOps, Bitbucket, they all support these. And you want to protect against silly mistakes. That's where you add in your unit tests. That's where a strictly typed language really helps. Uh you could be using something that's not strictly typed and you have to dive a lot deeper into it. But having a type system to to kind of lean on and help you out really helps. I kind of like Rust for this reason. I have no no shot at understanding Rust. I do not know Rust, but I do love it. I love the idea of it. And what little planning or playing with it I've done, I found that
the code really does not compile until all of a sudden it does. And if it does, it works pretty much exactly how you intend it to because it has all those guardrails to push you towards the correct path, which is really nice. Uh maybe we want to talk about code style. Why do this manually? There are built-in tools out there. There's formatterers. Uh we'll talk a little bit about prettier which parses the code into an abstract syntax tree and then remits it to disk. There are no more arguments about code style. It's whatever the tool prints out. So we'll talk a little bit about tools. If you're using TypeScript, TypeScript has a ton of stuff built in already. We talked
about prettier. Uh eslint is a very fast, very impressive piece of technology. Uh we use a package called XO which is a a big collection of default eslint rules. This is communitymaintained. People add more rules to it all the time and so we we get the updates from that. If you're using Python, there's an extension called My Pi which will do some uh some type checking for you. You add those type hints to your functions and you can validate those and enforce that. Uh there's a thing called black. That's the PEP 8 l. And if you're using JavaScript, uh don't use JavaScript, but you could use JavaScript with some of this TypeScript tooling, right? TSLint is
awesome. The TSC compiler can do checking on top of JavaScript. You use little JS comments to say what your types are. JavaScript is valid, but you're kind of going to be reinventing TypeScript uh the further you go down that. And that's okay. I love JavaScript. It honestly is one of my my favorite languages. Um so now let's get into the meat and potatoes here on manual PR review. It is a good control, right? I'm not saying it's a bad control. And the the takeaway here is allowing engineers to self-approve PRs. We're not asking them to. We're giving them the capability to to pull that lever when they need to. Manual PR is a good control. It's 100%
allowed. It's encouraged. I ask people to ask for help and we'll talk more about that later. It is required for some code sections because the code defines some of the policies and in order to trust the policies, we know that we can't change those without review. So, we have a um actually we have a policy in place that'll add me specifically as a reviewer if you want to change some of those core packages or some of our deployment pipeline scripts which are a slow system. I took out the pace layer slide. Come talk to me afterwards about pace layers. I love that idea. Um where am I going here? It's not always a good control. Manual
PRs do vary in effectiveness and they do vary in speed. I want to talk about uh PR review as a cultural control for a moment. And I'm going to do this with a story about the timber framers and their chisels. So, my dad runs a timber framing shop where the timber framers have these big wooden beams that might be like 10 in by 10 in by 30 feet long. These are big structural components to build the frame of like a house or a barn. Uh that's a that's a big chunk of wood, right? That's like the size of a tree because it is. It's the core of a tree. So, if you make a mistake,
somebody in like Northern California has to go into the woods and cut down another tree and put it on a truck and ship it across the country to Pennsylvania. That's expensive and more importantly, it's slow and it takes a while. So, we really don't want to have mistakes here, right? The equivalent control to requiring a mandatory PR review that's manual is taking away the carpenter's chisels until they ask for review of their layout. Right? They'll do layout on the beam and then they're asked to get a second set of eyes on it before they start to cut. Right? My dad doesn't do this. He doesn't take their chisels away. He trusts his team. We're building a high trust environment where
people are enabled to to go through and do the work that they need to do. Right? uh a carpenter who kind of ignores this rule and starts cutting beams anyway. Uh you know, maybe he's a he's a quote unquote rock star and does it right a couple of times, but eventually you make a mistake. You weren't following the policy and you're not going to last very long. So people have this this incentive to, you know, follow the the guidance and the the cultural idea here. Okay, this is going to be one of my the the most spicy bullet points. We need to operate as if insider threats are extremely rare or maybe they don't exist entirely. This is in direct
contradiction to the keynote this morning. Uh but it it kind of is it kind of isn't right. Insider threats. Uh do you trust your HR process? If you do, that's great. That's part of your your perimeter. If you don't, you should fix that. Uh do you have insider threats on your team? If you do, you should fix that. And if you don't, then you don't. And that's great. um we want to create a a perimeter of control right so that we can have that soft gooey center and I realize this is the opposite of the zero trust kind of idea that's that's getting popular we think about like zscaler the overhead of zscaler like authenticating
a network packet as it goes through it can do that really really fast right you can have the guards that guard the outside of the castle all around the inside of the castle because they're super super fast and that's okay zscaler will authenticate faster than the windows 2008 web server sends its response and that is going to be faster than your overloaded work laptop can render the web page. So it's okay. But when the the the guards inside the castle are really slowing things down and we can't effectively work within that unit, we have to create this safe space, this high trust environment where we can be really effective, right? And so that's kind of there are lots of
controls. This isn't a talk about insider threats, but that's kind of a goal that we want to pursue here. But what about the principle of lease privilege? This one I can't uh really add caveats to. That's that's a legit thing. So there are other controls here. I talked a little bit about that policy of you can't edit the code that defines the review policies. There is no circumvention of the guardrails here. This the guardrails are in place and they they are effective and we need to be able to trust our guardrails. So we have review on those. Your automated review must pass. This is one of those asterisks from the title slide. We let people approve their own PRs. You can
approve your PR all day long. If the testing pipeline doesn't agree, it doesn't go through. So, you can approve the testing pipeline also has to approve. You can't edit the testing pipeline without me approving. So, there there are guardrails here. And honestly, probably the biggest two the biggest two reasons that we're able to get away with this is team size. And again, the culture I built within the team, there's a thing we call the talk. So, team size, I'm working with a team of maybe six, maybe seven people, right? We all know each other. That's a really big step to creating this high trust environment where uh you're able to ask for help for things.
Uh number four, the talk. Anytime a new person is going to come into this team or or gain access to this repo, I make sure to talk to them kind of webcam to webcam and and get a feel and be like, "Okay, is this a real person? Um are they located in the US? Are they where I expect them to be?" I say, "Hey, uh you now have push access into production." And they look and they look terrified, right? And that's a good sign. If they look terrified, I'm like, "Okay, this person I can probably trust." And then number five, rapid response is a cultural control, a culture of asking for help. You will get help if you ask
for it, and you'll get that help very quickly. Uh, this is another really big one, right? Because if you have the ability to self-approve your PR, you're under a lot of pressure to get this code out the door. Uh, and you know that, hey, I I have this itch. Hey, can you review my code? But that's going to take multiple hours. you might just push it through and we really don't want that to happen, right? So, when someone wants review, I kind of drop everything to go do that review. When there's a new PR in the repo, I get an email alert and I'll typically go in and look at it and kind of help the person make sure that it
lands right. Honestly, part of the reason for that is because the exolinting rules are extremely strict and I can help you kind of get through those guard rails to make sure the code is doing what you expect and we can land that. So rapid response and a culture of being able to ask for help and you'll get that answer quickly. You can ask for review and everybody on the team is capable of doing review and that really helps us operate and and move quickly. So with that uh let's get into who should review, who should who should you add to your poll request and and talk about you know who can hit yes. And the
people with the most context are probably who you want. So, at the bottom of this graph here, I don't know how well that's showing up, we have the person who's making the change, they have the most context about the change that they're making in the area that they're making it in. Uh, if you're working in a pair programming model, the other person in your pair programming combo also has all of that context, right? So, your original author, their seal of approval should go on the PR, right? So, we always require at least one review. You have to approve your own work, right? If you don't believe in your work, why are you making a PR? And
then your your pair programmer should also have buy in and say, "Yeah, I I like this." Now, as we get above that, we get into this yellow re region where we might have a team member who is uninvolved in the change in the process or like a team lead. These are people who have context about that kind of region of the code or kind of the you know what's going on over here, but they don't know specifically what your change is, but they might have context that you don't have or experience that you don't have. And these are the people who it's okay to ask. you say, "Hey, could you just double check this real quick?
Hopefully, they can come in legitimately real quick and uh and go through that." When I talk about speed here, the the feature branches uh on this project that I'm working on, feature branches live for maybe an hour, maybe two days at most before they go through the PR process to get into Maine. We are trying to operate that quickly. Now, above this yellow zone, we get into the red zone. your uh maybe the director of the service line or a change advisory board or the CTO's office, hopefully not. These are people with zero context about the actual codebase. They should not be involved in code review. They're involved in business requirements review and making sure that
you're working on the right goals. They don't particularly care about the implementation. Right? The gray text down here at the bottom, this is a joke, is that it's hard to read, but I didn't expect it to be that hard to read. It says, "Guys, I just fixed a spelling mistake in the readme. Please do not drag the CTO into this. So that's a little bit about uh who should review our PRs. So we're almost done, I promise. Speed is also a control and I talked a lot about the speed with which uh you get help if you ask for it or the speed with which you get review. This is the raw speed of our repo and the the
feature branch is living for hours, not days, not weeks. Uh and the main branch being continuously recycled and refreshed and it has a lot of eyes on it all the time. This codebase isn't rotting in a corner somewhere. This is like really the the focus. Um, trunkbased branching. I talked a little bit about that and CI/CD because once the code is good, getting it out to production, CI/CD helps so much. We're going to automate that process as much as we can to make it repeatable and not require any manual steps because when we talk about uh kind of compliance frameworks on the next page, you want the change is made by a person and then
it's approved and then from there nobody should touch it because it's approved and now that we know it's good, we don't want anyone to be able to go in and touch it from there on. So our our build server uh has a lot of controls around it. It's really hard to get into the build server. it's extremely locked down because of the credentials and the access that it has. And so from the moment where we say the code is good through the rest of its cycle, nobody can really touch it, which is uh which is good. So with that, maybe I've convinced you, maybe you still think I'm insane. I'll assume the the former. I see the
light, but oh wow, gee golly is it far away. You might be working in a really large organization with a ton of developers and this idea of letting people approve their own PRs is completely ridiculous and that's okay because we can make small and incremental improvements in the the speed with which you can move without compromising on safety with just a couple of ideas. Right? First, understanding the theory of constraints helps you move quicker and identify those constraints no matter what world you're in. Um, first bullet point here, build your automated testing. We love automated testing because it's quick and it's reliable and it's repeatable. Uh you want to build confidence in your testing. Confidence from you, confidence
from the rest of the organization that hey, if the tests pass, it's probably good. If the tests passed and it's not good, that means you need to keep working on your test and fix those so that you can get into accelerated review of what I call standard changes. ITIL categorizes changes into three categories. There are normal changes. These are things that are probably going to go before a change advisory board if you have one. There are emergency changes and then there are the the mythical standard changes. These are kind of pre-authorized lowrisk. We know what's going on here. We're following a process. We want to get as many things into this standards change path as we
can. And we want this standards change path to be as low friction as we can. And our automated tests are are how we get there. Right? The idea of one person approving PR is probably insane in most places. Make it two. Make it three. Like maybe it's three. Whatever it Whatever it takes. Uh you can accelerate and it helps a lot. Uh last note, what if I work for the government? I'm sorry. There's not much I can do for you. So I promised you war stories. This is my time field. Make sure I hit time andor the cut in case I was over time. Uh we have time for a little bit of a war story. The code is perfect and it
does exactly what you tell it to. and the automated tests ran and they said the code is perfect, the formatting is perfect. All the op the options are defined and all the values are correct and here is exactly what I'm going to go deploy to production. Um, a little bit of the flow when you make a pull request the full system it'll run and it'll tell you exactly what changes it's going to do in this uh this infrastructure as code system. This Palumi is really nice for this and it shows exactly the resources it's going to create. You have to approve that comment. Someone has to sign off. Yep, this is what we want to
do. And then you have to approve the pull review and say, "Yep, no, I really mean it." And then it goes through and it goes into deployment, right? So, we had a developer or an engineer who did this, wrote perfect code, said, "Hey, um, this is what I want to do. I want to deploy. I want to stand up this new environment for this new client." It went through. It looks great. It got deployed. And then maybe two or three months later, we discovered that we had turned on one setting uh that cost like $400 a day over the last couple of months. Uh that's pretty tough. Now, that setting was defined perfectly. The formatting was great. If you wanted to
turn that on, that's exactly how you turn it on. It was beautiful. Um so, it's not something that a technical control could catch. Now, someone very high up the food chain was actually very upset about this. They're like, "Oh my goodness, I can't believe this happened. Who could have approved this?" So, we go through the logs. He approved it to show that, you know, technical controls can't exactly catch this kind of stuff. And it really is like you have to have the right intention going into it. And mistakes are going to happen. And how do you recover from those mistakes and u you know, keep that relationship with the client going? No matter we had manual review, manual
review would not have fixed this. All right, so that's a little bit of a war story. And with that, uh I think we could kick it to some questions, some feedback, and uh some deploying. I see a question in the back already. This is going to get exciting. I'm I'm ready for this. Don't be scared. Okay. Thank you. I appreciate your kindness. Yeah, this is my first presentation, so please be nice. Is it really your first presentation? This is okay. First off, Ethan, I'd like to say that it was both a very memorable and a very wellresented presentation. Thank you. So, excellent job embracing this, especially especially for you know your your first time. So, wow, what a way to
pop your cherry. Great job. Okay, so I do have a question. I feel I really do feel like your model assumes a very senior, close-knit, well-rusted team. How do you use this methodology while still encouraging mentorship of new junior engineers on your team? That's a great question. So the um the first thing I'll say is you are correct and one of the one of the big helpers for how we get away with this is I work at a cyber security consulting firm right all the people working on this codebase are cyber security consultants nobody's trying to open up port 22 to the internet with root login enabled which is that really helps a lot um as for
mentorship and uh kind of junior access again it goes back to that that cultural control of being able to ask for help and that fear that people have when I say, "Hey, you can push to prod now." And they're like, "Oh my goodness, really? Why did you give me this?" Um, and I tell them, I say, "Hey, you can push to prod, but you probably shouldn't. What you're going to want to do is you're going to want to work hand inand with someone who's experienced, who's been in here for a while, and you're going to collaborate for a while until you build confidence in the kinds of changes that you know you can tackle on your own and the kinds of changes you
can ask for help with, right? and the rest of the team is really bought in to this idea of collaboration, being able to ask for help, triple-checking with other people. So, when that new person says, "Hey guys, I I think I'm opening this port, right?" um the rest of the team will go in and and continue to build that culture of of fast feedback and and fast results. Uh yeah, I hope that answers it. Did anybody else have a question before I Okay. Um I and I Ethan I'll give you a little more background on my perspective. So I'm an digital forensics incident responder. So I work that side of the house. I also am a small business
owner of a company of 35 people. Um where we also produce tools in addition to doing incident response. So I have both engineers and incident responders, but we always try to bring in junior talent and raise them. Um, is there any concern when you're doing this model that there's maybe maybe this exists and maybe this is the part I'm missing. Is there a threshold, some kind of proficiency? Because even on incident response reports, we do multi-layer peer review before things going go out. Uh, for course creation, we do it. And for code development, and we treat all three of those the same in my organization. I love the idea of optimizing speed. Man, I'd love to get my code out quicker,
right? So, I am encouraged by things that would increase that process. I am petrified because I do work instant response and have worked insider threat. I am I am not sure that I would be doing and and my bigger concern is I'm not sure I would be doing justice to my junior examiners. And this is what I want to ask with the controls in place. I I find that junior engineers who are coming in and are developing for the first time in like a real code base right outside of school, they are scared to admit they don't know things. This is a very common problem of junior people. So I I want to know what control is in
place when you're talking about this encouragement of tell us when you don't know and putting fear of hey this is prod so make sure you're ready. Is there any controls in place to ensure that that threshold is clear? So in this specific codebase, that's a really good question because most of most of the people who are working on this codebase to get access to be in the junior position in this codebase is a pretty high level, right? So the people who get there have usually been with the company for a little bit of time. Um, and so it's it's really tough to relate something that I'm doing in this codebase to your situation. But if you're working with um
some of those junior people who are trying to prove themselves, that's where I would say, yeah, we maybe you should have two people reviewing each PR, right? You should start with um I would have a very broad pool that includes everybody of people who are allowed to make branches and make PRs and a slightly smaller pool of approvals, right? And so that's the model that most places are probably going to go with. They're probably not this extreme is so you have your base layer and then your your approver group. And I would argue like get people into that approver group as soon as you can and build that culture of fast feedback in that approver group because then you can have
a little bit of a more focused impact than trying to to handle a larger organization. And this is this applies to large teams as well as as young teams. Hey, um first of all, awesome presentation. Um that was really cool. It touched on a lot of stuff that's near and dear to my heart as someone who wears both the dev and the devops hats. Um, one thing that I did feel was missing was any mention of security like where's the sack and dev sec ops. I personally feel very strongly that uh like at a minimum like SAS and SCA scans are really important part of your CI/CD gateway to going to production and I wanted to hear what your thoughts were
on that. No, that's a really good uh that's a really good observation. This code that we're building kind of inhouse is is kind of like our code, right? And we don't really lean on on too many libraries for this. We went on like two libraries. But I am starting to get involved in a a different team that is building other code that is building uh like Docker container images or like VM images or those kinds of things where vulnerability scanning is super important. There's a there's some Java under the hood and if you work with some of the Java web libraries, you're like, "Oh no, we definitely need to scan that." Uh and so scanning is part of
that CI/CD process. And that's part of that middle block in the diagram of is the code good? that's a scanning control we could add in there. So, I love that idea. Um, I know I didn't hit on it in this presentation, but that's a that's a really good call out. Thank you. Um, you talk about how you want people to be comfortable asking questions in like in your team and like asking for help. Have you ever gotten into a situation where you're like, man, this kid won't stop asking questions. Like, they kind of just won't like they need to leave me alone. I need to get my own work done. Like, have you ever ran into
that in like this kind of environment that you're trying to build? I gotta be honest, I haven't run into too much of that. But um my I'll say a little bit of philosophy is that my my time is 40 hours a week and that's all the time that I will ever get. But if I can help out other people, I found that the ROI and tying it all back to ROI, the ROI on helping people, even when they have a ton of questions, the ROI there is insane, right? Because I know I'm capable of doing something. I want to make everybody else capable of doing that thing too, right? I'm not just kind of making X units per hour. I'm gonna
accelerate the number of units per hour that we can handle. So upskilling and investing in your team like that is has a really really high impact and I highly recommend it. Uh it can get annoying. I will admit it certainly could. Fortunately, I haven't run into that yet. Yeah. Thank you.
Right. Uh I was going to ask you mentioned earlier about smoke tests and uh do you find value in or do you build or how do you approach building smoke tests sort of post-p production deployment or how do you get feedback loops like that issue that you found that you mentioned? um how do you approach uh you know um getting that those feedback loops working quickly as well after you've post uh pushed to production. So feedback loops post deployment. I do want to talk a little bit about the smoke tests. uh when you when you're using the Palumi library and you push you make a new feature branch and you you make your code changes and you want
to merge that into main Azure DevOps builds kind of a proposed merge branch under the hood that represents what the codebase looks like after the merge and that's the code that runs to do the preview. So all of your code paths are going to run at to generate the manifest at deployment time and we do that exact same thing at preview time during the the pull request. So we run all the code paths to build that preview. So if that preview builds it means all your code paths are good. And so that's our our smoke test there at the pull request time. Um, as far as feedback post deployment, uh, I assume you're talking a little bit about like
infrastructure drift or what if one of these environments has a change that we didn't account for in the codebase? How do we detect that? And we used to say that in order to do any change in the codebase at all, we were going to revalidate every single environment. And we did that for a while. And it got it it became a lot of overhead because you wanted to do a really tiny change in kind of one environment but then some completely unrelated thing somebody had like turned on some TCP reset setting they didn't understand and now you're blocked from deploying. So that was tough. Uh what we did is we made it so that we have to revalidate any
environment that we want to work with or that we want to touch. So if you're doing a change that's really specific to one environment, we revalidate that environment. You have to adopt all the drift. You have to get the code in that environment in sync before you can do your merge. And the uh there's actually a lot of really wild tools out there for for TypeScript monor repos and doing kind of dependency resolution and figuring out what depends on what what's your map look like. So we have like these base layer packages that are I wish I talked about pace layers. You have uh your fast changing systems and your slow systems. to boil it down to
quote one one quote uh fast systems learn, slow systems remember. Your fast systems are experimenting and trying new things and the things that work and win get integrated into your slower systems. So we have these libraries under the hood that define like what does a SIM look like when we deploy it and when we make changes at that level that's going to impact multiple clients and that is where we still get that every client has to be drift detected. So we found that organically our rate of change at that low level that requires full drift adoption is fast enough that we are adopting drift across every environment and kind of checking for changes and revalidating multiple times per week
kind of at minimum. Uh if that ever slowed down and we are getting to a point where there's a lot of environments and there's the background radiation of drift is starting to increase. Um, we have this idea, there's a a future enhancement we call the cache buster, which is going to inject changes automatically on a schedule to force that drift adoption across different parts of the environment. So, I hope that answers your question. That hit that hit smoke testing and that hit uh yeah, ongoing adoption of changes. Anyone else? Where did you get that awesome shirt? Yeah, I got this awesome shirt from the cowboy boot store. I went to Boot Barn in Henrietta. I went to the clearance
rack, the buy one get one, where they put the shirts that are too ugly for people to buy. And I got two fantastic shirts for the price of one. That's a good one. I recommend clearance racks wherever you can get them. So, thank you. Yes. Uh, thank you for the presentation. Uh so you say that your engineers have the ability to push to prod but you don't encourage it. You make it sound as if it's only something in exceptional circumstances. And I just wonder do you have any kind of like break glass procedure that they have to follow? Do they have to like notify anyone? Hey I am going to break traditional procedure and push to prod
or is it like yeah what kind of records are produced when someone does this I guess is my question. and what sort of procedures are initiated if any. So the the goal here is to get everybody on the team able to work kind of independently and in parallel and streamlined with as little blockers as possible, right? And that's the goal for for what I would call those standard changes. The most standard change, the most common scenario that we run into is the infrastructures code defines the incoming log sources that are hitting the log collectors going to the SIM and you might need to open a port or maybe you need to add an IP address to the
allow list for a port. We want to open up a specific port number on a protocol on some allowed IP sending list. Uh it's it's pretty locked down. And that change is very low risk. The only thing you're really touching, you're touching one network security group rule and you're touching a load balancer. We've actually abstracted that. It's like three lines. You can copy and paste a common config, be like, "Yep, this port number uh this list of client IPs, we'll name this Aruba or whatever." Uh that change is very low risk and that's the one that our senior engineers are going to be making and then they'll self-approve that. It's kind of like um I don't want
to get too far down this rabbit hole, but if you look into platform engineering and kind of self-service developer portals and enabling people to go in and follow this approved path uh and then just, you know, it it follows the standard protocol. So, I would say that the the self-approval scenario is probably the most common scenario because we've optimized the hot path. We've optimized the self uh the self-approval scenario into being this low friction thing. uh anytime someone really deviates from that, that's where we start to see those those requests for outside help. So, I'm going to deploy a new piece of infrastructure to a client environment. That's where we're probably saying like, "Hey, could someone look
over this real quick? I want to make sure I'm deploying this infrastructure, right?" And so, those are much more rare. So, it's it's interesting that the the self-approval scenario isn't really break glass. We've we've almost made it normal, and we've made it we've made the normal scenario the self-approval scenario because of how common it is. So it's there's a little bit of like forwards and backwards thinking there. I hope that covered it. Okay. Thank you. Okay. So following this methodology, excuse me. Sorry. Apparently I have allergies. Um this was the constraint and you fixed that, right? So have you identified what the next constraint or bottleneck is if any in the process and are you working
on that next? So, we work in a cyber security consulting firm and we say, "You can't fix the client." You email the client, we're like, "Hey, uh, we we got to roll these TLSerts. They're going to expire in like a month and a half, and there'll be weeks of silence." And you, we can optimize as much as we want. And we've optimized the deployment. We have some of the fastest standup times around. But, uh, at the end of the day, we we really move at whatever the client speed is. We want to be able to make sure we can move as fast as the client's going. So if the client's moving really fast, we want to make sure we can keep up. And
if the client's going slow, we're going to match their pace and and you know, respond and work with them where they are. Uh so there there are other constraints we can optimize on our side, but they're not the biggest constraints. Thank you. Cool. All right, I think we're pretty much at time, uh guys. Thank you very much. I hope you enjoyed it.
[Applause] I just