← All talks

BryanDavies

BSides Calgary44:1549 viewsPublished 2022-12Watch on YouTube ↗
Show transcript [en]

foreign [Music] thanks for attending my talk my Talk's going to be on uh devsec Ops when uh it's not a buzzword when you have the goods right so we're gonna go over a few things it's like the weirdest longest title ever so I apologize if you ever have to type up any kind of uh agenda for today so all right who am I my name is Brian Davies I've been in the software world for over 20 years I've been a developer for most of it actually manager director and sales I have done a lot of Investigation of insecure code I've done a lot of product management stuff in designing secure systems so I kind of have a little bit of experience in kind of a little area of all the areas I guess and I currently am an sc and I work at Splunk all right so what's my talk about well a big thing I want to focus on when you start talking about devsec Ops or stuff a lot of people don't necessarily know what that means but I also want to talk about the idea that production is too late right and that's going to be a big part of my talk today we're going to talk about what can we do about it and I'm also going to share some resources with you guys supposed to talk forwards for you guys for developers operations teams security teams and just sort of curiosity I know there's a lot of students that are going to the today's talks how many students are in here today so like a group okay okay good and the students are you mainly in security World operations cyber security okay okay so there'll be some stuff in here for you so all right and uh one thing to if we got the subs going for later it's going to be a dance party at the end I'll be a surprise but um if you guys have questions during please don't don't be afraid to interrupt me okay I'm gonna be going through a lot of stuff going kind of fast just please interrupt me don't you don't have to say Until the End okay all right so why am I giving a talk well I guess I'm a sucker for a pun for punishment I did one last year um so I'm going to do it again this year it's actually quite enjoyable to do a lot of work to put them together but not as much as what James is dealing with so but the big reason is the containers and micro Services right they help us scale our infrastructure they help us scale our applications but they also massively scaled our problems right they introduced a lot of problems that never existed prior to using those systems okay so what is the problem I'm going to go over first some of the stuff we know also got the little sleepy red panda you know this stuff is the boring stuff I don't have to worry about it too much right we still have security bugs going into production performance issues are still in production right we're still doing last minute changes to infrastructure who hasn't done a midnight patch right they suck I don't know if you've ever had to do those for the students it's it's a terrible thing it always it's always a panic you can never get a hold of people there's no food you're stuck in the office you can't get anything like it's just it's just a it's a nightmare so containers in theory they're easy right it just it just works it's magic they're light they don't need much I.T resources they're set up right away they disappear to clean up it's just it's they're really really nice however that's not the reality container management is super hard right containers can crash and disappear and you don't even know there's no indication that that container is even running unless you're capturing logs which a lot of people aren't they just go up spin up and then they're gone so you have an error in the application you've lost all that information you don't know what's going on another side effect of containers is we started using microservices for architecture that's also really hard it's complex it's like it's such a disaster to try to troubleshoot a microservice because these containers going up and down while you're trying to trace something through like it's really really hard and like another side note that a lot of people don't even realize containers are so lightweight because they share the OS kernel right so if you haven't patched your OS you essentially haven't patched any of those containers right everyone thinks they're just kind of magic they're off on their own oh they're fine they're going to be good but they do rely on the OS kernel so keep that in mind all right so we have this mess and we're going to add people to the problem when does that always work right we're not adding more people to an issue first we're gonna have the development team so what do they do what can they like what are some of the things that we have to deal with the development team right are they actually testing for security problems uh I don't know if we have any developers in here but testing for security was like that wasn't a thing it's even now for a lot of shops that's not a thing right are you tracking the infrastructure changes that you're doing to uvt that's a weird thing about the operations and Dev groups they have this weird understanding that uat is for the developers everything else is the Ops Team so the uit environment developers can change stuff in but they don't always track those changes and then all sudden it gets deployed we have deployed an area that doesn't line up with the uat test area right and development teams don't really monitor containers they don't have the tools too all the stuff around developers is all about the local debug or even some remote debug they don't have a lot of stuff for handling containers the security team so do they actually share what they find security teams like to be secretive they like to take all this stuff but it's like it's like their bonus is dependent on how many times they bust into the systems right you got to share that information if you don't share that information it can't be fixed so that information has to be shared and has to be shared in the usable usable format if you just tell a developer hey we're broke into your system yep oh sorry I'm standing back I'm like if you if you don't tell the developers that you've broke into their system how can they fix that problem right so that's you just got to share that information right and the other thing that security teams sometimes run into and this is the operations team too but is uat and staging are they set up this thing right that if again it kind of goes back you could potentially be testing in an environment and deploying it to something that's completely different and the operations group are they actually monitoring their UAP and staging as I mentioned I should go back for a second uat is everyone here know what uat means there's a bunch of students right you user acceptance testing is typically from a Dev world that's where they do everything that's their last stop before it goes to production okay so is the operations team looking after uat because again that's kind of your last stop to find performance problems right or sort of architectural problems okay and then again who actually owns changes to the uat environment right and is the dev team are they working with the operations team to figure out what the design should be this is actually a pretty common problem devs like to just kind of do things on their own they developers like to work at different hours and operations team too like kind of do magic stuff in the middle of the night all of a sudden they get inspired so you can't really share that information with an Ops guide help you figure out the best path if you're working those hours so you end up with two worlds product world and a corporate world okay product world for the development teams careful that's basically where does the product live where does the product get ironed out right that's the uat staging and then you have the security and operations team they were able to corporate world right they're worried about threats they're worried about performance they're worried about I have a client facing application it cannot go down I have to make sure the data is secure for it I have to make sure that we don't have Insider threats that are trying to steal our stuff and work for a competitor they carry both out of the corporate world they kind of leave the uat world completely alone and vice versa the developers don't care at all about the corporate world they don't care about the accounting system right they don't care about any of that stuff they care about their little uat world so you end up with Silo teams using siled environments so the result is essentially a mess right again the security group only cares about corporate so they only test their security controls against the corporate if you have an application that's potentially being sold into certain companies you might have to follow a certain audit requirements like c-soft audits or PCI audits or whatever so that would be the one time the security team will come in and work with you to do audits other than that they lead you essentially alone the infrastructure performance is only tested again in the course right they don't only care about like do the clients if they're trying to purchase my whatever system I'm selling is my cart going to survive I don't care if it breaks in production the ones are in the staging no one's actually buying that I only care if it breaks in production so you have these two groups looking after two and then they don't really share any of their findings which is another problem oddly enough that's a finding or a kind of a common theme I've been seeing in the other talks there's been a lot about kind of sharing knowledge in the besides this year okay so the real problem is production is too late right you can't I'll get into it but like it's just it's too late you got to solve these problems before you get into production so there's stuff you know about this right fixing production is expensive if you are a company that's selling stuff e-commerce if your production sites down that's bad news right it's very stressful on the teams there's I know people that have quit their jobs because their production environment was too just broken all the time all they're doing is working at night working nights weekends it's a very stressful stressful environment if your production site is broken it's always middle of the night it's always hack fixes who here has actually done a proper middle of the night fix in production oh one got two was it after a hack fix and you kind of figured that I could patch this a little bit better yeah it's always the hack fixes and if you can it works you're like I'll solve it in the morning right you kind of so you've kind of hacked together a fixed and then gone back to sleep for a few more hours have you actually fixed the problem or maybe even introduced another one right because you've just kind of got it limping again so that's also not necessarily the best way and you always need multiple teams right if you're doing these middle of the night things there's always one more guy you need to get a hold of one more girl you got a call like it's there's just there's not enough people and they're never around they know it's broken they turn their phones off so they're not gonna answer again yeah production's too late right back Black Friday outage can result in tens of millions in Lost Revenue right I know one company they had a payment service that went down for an hour during Black Friday and they estimate that was well over 10 million dollars one hour so you know they were able to they they had the proper systems they're able to see it and reroute it they knew within five minutes what the problem was it was just the rerouting of the payment service that took a while but still they were like one hour 10 million dollars they figure data breach cannot be undone right that's if it's gone it's gone it's out there you can't go and collect it all and critical infrastructure it can't recover and I don't mean like oh your network I'm talking about stuff like pipelines bridges that have like the you know like the bridges on the boats go underneath them uh hospitals that's what I'm talking about when having critical infrastructure if someone brings down a hospital with a ransomware attack it's not just the next day it's up and running fine right there's a lot of problems with stuff like that all right we're gonna go on a bit of a bit of a story picking on a little bit of a path here so who uses the weft we had a couple hands okay web application firewall for those everyone know how many rules are turned off I see some snickering so those rules turned off right because it doesn't work otherwise that always just stops everything the proper athletes and zero traffic through that's kind of how they work so out of those rules that you have turned off how many are injection rules right a few couple heads nodding they're the harder ones to kind of work around from the software perspective typically that's where they end up kind of getting left on all right so we have a laugh we have some rules turned off and it's the ejection injection related so are you monitoring your web request logs your application logs potentially you might even have some other firewall logs even though they're not going to see all that information if you're going to turn off a rule how are you kind of watching it right are you watching it you just disabled so you can do things like watching your web requests okay now the other thing this is really important a lot of people can overlook this all the laughs I've ever used anyways there might be some that don't do this you have three options with a rule you can block it you can disable it or you can log it so instead of disabling is what most people do let's turn it off I don't even care about this rule I'm just going to turn it off but instead you should log it that means if the Waf still triggers it saying hey something's going on it at least puts a note in the log so your security teams can find this stuff later right too often you see people I'll just turn it off we don't need it like uh we don't care about PHP injection because our site uses.net let's just turn that rule completely off well you still may want to log it right so that's just one of those little things just kind of keep in mind so in this scenario we have some rules turned off right just disabled completely then the audit comes and they find these they always do it was fine and complain about all the stuff that you don't have turned on the rules are turned back on they test it everything works okay well maybe that rule maybe we fixed it we didn't even know we had that one release when we switched the framework uh it's all good off a ghost production all right now it's a problem okay the checkout process starts to randomly fail doesn't fully fail just randomly and not very often again just random light failures so it's like well maybe it's that rule but at the same time when we enabled the rule that we push any other code and we do anything what's going on and start looking into it you see orders are being abandoned not completed you take it to it a bit more you can start to see that well like what's what's going on what's there being essentially they're starting to tiement so it's starting to happen right so what is that well if you're always popular free-form text box right and you see this on every checkup shipping instructions special shipping instructions leave at neighbors apostrophe s oh apostrophe that always trips up just looks for apostrophe locator but anyway people use special characters in those boxes they'll maybe put like the ten dollars is Under The Rock and there's a dollar sign in there now right so whatever but those special characters will trigger off and now that you've turned that rule back on this shipping box is starting to trip up the WAP but only in certain situations this is a really hard problem to find right and then what happens right you have things that are timing out what happens to your customer I've been sitting here for like 30 seconds and internet time this might as well be three days right they'll go somewhere else oh well Best Buy also sells it I'm going over there their site doesn't doesn't pause on immediately so now you're broke right because your site's no good it's stopping all these things and again it goes back production is too late you have to catch these ahead of time right you can't just wait and hope they catch them in production so what can we do how can we fix this I can use a drink of water [Music] all right stuff we know we're already doing this we've already been we've seen talks in the stuff even today cicd devops unit tests infrastructure's code is a good one because that's starting to be a lot more prevalent which is good to see I don't necessarily agree that people are using it properly but it's more my opinion I'll get into that in a bit threat hunting hog monitoring that stuff is all happening right now which is good okay so what do we have in order to get better we first I'm going to say let's take a step back and see what do we actually have maybe we don't need a bunch of stuff so what do I actually have the dev teams have a crap ton of tools profilers profilers more profilers analyzers like it's it's crazy debuggers every ID you can imagine Source control right they have a lot of things there's tools are really focused on kind of the local World though right their system maybe a Dev test or it's a very minimal container system or if it's a more of a monolith it's a smaller server you know well single server instead of a server farm right that's kind of where their tools really really shine if you're Ops teams they hopefully have infrastructure as a code systems and I really hope their scripts are Source control they're probably not most time but that's a really good habit if you get stuff from Source control you can actually do iterative changes to your infrastructure as a code you can go back imagine you won't have notepad plus plus open 50 tabs across the top so it's every little change you've done while you're working through a problem right Ops teams have really good container performance monitoring systems or at least they have some available to them in-depth infrastructure like they have a lot of really good monitoring systems right and they have a budget typically not always but more budget than a Dev team security teams all right they have they have more open source attack tools than there are like JavaScript logarithms right like there's just there's so many of those tools it's amazing how many there are they have really good monitoring systems detection response systems security groups are really protective of their little world though like they call it a sock right their security operations center no one's allowed in there you're not allowed to see the tools they have it'll tell you a report or maybe send you a screenshot but it's there's some pretty interesting stuff right um and they even have more budget like how who paid for the stock that they have right like so they have they have some money those guys so if you're in Dev or operations become friends with your security teams they have some neat stuff right so now we know what we have what can we do we gotta share this stuff share the knowledge share the tools right I'm gonna go through this a little bit so Source control developers know this inside of note operations team of security teams do not where are our operations teams storing all their infrastructure skills where are security teams storing all their attack scripts right how many of you just have a folder on your desktop with a tax scripts that one oh it's a windows so I have a Windows folder right like like why if if your machine crashes or whatever it's gone like Source control manages it makes it easy you hook that up with an IDE and it's even better Ides talk to Source cont