BSidesCharm 2025 - Inch By Inch: a Case Study in Maintaining & Scaling a Modern XDR Product

Name: BSidesCharm 2025 - Inch By Inch: a Case Study in Maintaining & Scaling a Modern XDR Product
Uploaded: 2025-05-21
Duration: 39 min 42 s
Description: Delivering security products to millions of users is a monumental task. From building & deploying to mitigating performance issues & false positives, securing systems requires constant coordination between multiple teams of researchers, engineers, and other stakeholders. This session will highlight

BSides Charm39:4241 viewsPublished 2025-05Watch on YouTube ↗

About this talk

Delivering security products to millions of users is a monumental task. From building & deploying to mitigating performance issues & false positives, securing systems requires constant coordination between multiple teams of researchers, engineers, and other stakeholders. This session will highlight lessons learned from our experience as an effective cross-functional team building an XDR product. Jessica David is a Principal Data Engineer on the Security Intelligence Team at Elastic. With a background in software engineering and data warehousing, she brings her expertise to the security researchers & detection engineers around her by building data pipelines & services for processing first- and third-party threat intelligence.

Show transcript [en]

[Music]

We have hit 3:00 so we are going to get started. Uh first of all, I do want to thank you all for staying to the end. Um I really appreciate it. I know that at conferences sometimes you get burnt out maybe like 30 to sorry 60% of the way through. Um so I really appreciate you all kind of staying on for this very last talk of track one. Uh this is a both intimidating and kind of relieving spot to be in because it's right before you get to closing ceremonies. But you know maybe first talk of the day is even worse. So thank you all for being here. I just wanted to really quickly um

express my thanks for you all being here. So hello I'll first introduce myself. My name is Jessica David. I am a principal data engineer at Elastic. I am on our security intelligence team. Um in general I'll explain a little more what that is a little bit later but in general I am a career data pusher. I come from more of the software side of things uh where I have been doing data warehousing data engineering slash whatever data related buzzword is currently on vogue for the past about 10 years. Uh I also am a requirements herder. I do a lot of my uh job is uh pulling in everybody's requirements and making one thing. I started in

consulting and then kind of still do it here today. Um I am also a doing cat mom to uh Xander and Rupert over here. If you want to see more photos, you can come find me later. Uh, this is also my very first BIDES, not just Besides Charm, but Bides in general. Um, and this is very exciting. I've had a really good time. So, thank you all for the very warm welcome. And, uh, I hope you enjoy this talk. Um, so I don't have the greenest of thumbs. There is one thing that I really understand about gardening, and that's where we're starting this talk. We're going to start with gardening. It's a delicate balance of letting

nature run its course and helping your plants flourish. Each part of this ecosystem is very important and maintaining a healthy garden means knowing when to let things ripen or when to harvest your fruit, when you let something grow, when you cut it back, do you water, do you wait a few days? And the system is very dependent on constant feedback from everything involved. And while may not seem that obvious, this is how I like to think of my team at Elastic. We're one giant garden with many complex systems that interact with each other every day. So, if you don't mind entertaining my metaphor, in today's talk, I'm going to discuss what we've learned at Elastic while growing

our garden of XDR or extended detection and response. I will provide a brief snapshot of our experience navigating the journey of building and deploying a security product with Elastic Security Intelligence team. While I plan to use our experience as a case study in kind of building this effective crossunctional team, I'm going to highlight uh both technical and team accomplishments that have helped us along the way. Um, and hopefully you can take something away from this that will help your sec security garden grow inch by inch and rowby row. So, a quick little outline of where we'll be going today. Um, I'm going to start by introducing my team and how we fit into the wider context of Elastic as

a company. Uh, next I'm going to review some of the internal tooling that my team helps build and how we contribute back to growing and maintaining our security solution. Then I'm going to kind of go into the meat and potatoes of the talk. We're going to talk about a few case studies of how we interact not only with each other but across the company and with our users. And finally some key takeaways, the fruit that we have harvested and how we have done along this journey. So I'm going to start out by talking about who we are and break this down into two sections. Uh so I'm going to talk first of who we are as a

solution. Um so over the past decade our team has successfully maintained and iterated both um endpoint detection response and extended detection response products. Does anyone here know of the company Endgame or had been along around besides charm long enough to know them? They're all my co-workers now. I was the first non-endgame hireer at Elastic after they were acquired. Um they all say hi. One of them was supposed to be here with me this weekend but couldn't make it. So hi. Um so this experience goes all the way back to when the endpoint product was um the company Endgame and they were acquired in 2019. Endgame was the leader at that point in some endpoint threat protection and

endpoint threat prevention and detection as well as responsebased uh actions take based on the MITER attack matrix. So by combining it with elastic's at the time up and cominging SIM, this was where the security solution really started to grow. I'm going to briefly talk about this not as a sales pitch but mostly as some context because it's kind of important to understand how we fit into this ecosystem. Um so I hope you can bear with me for just a moment. Um so elastic security as a solution helps you protect investigate respond to threats. It's a fully capable SIM but as I mentioned uh we now have our endpoint security and XDR product as well as

cloud security and cloud detection and response. I'm sure there's probably about 20 other acronyms that are involved in here in some way, shape, or form, but again, mostly just a quick context here so we can get kind of the idea of where we fit into this whole picture. So, Elastic Defend is the official name of our endpoint security integration. Uh, this was released in beta in the summer of 2020 and made generally available in August of 2021. Uh, the defend moniker, I believe, came about the end of 2022 or so. Uh so our endpoint security is a robust uh security solution that prevents ransomware, malware, look for advanced threats and you can arm you with uh

vital investigative context. Where the XDR side comes in is we can help collect telemetry from other third-party sources. So let's say you run uh some crowdstrike endpoints as well or you have some other threat intelligence that comes in. All that can kind of be viewed at the same uh pane of glass and various integrated dashboards like in elastic. So you can have the native protections that we provide. You can have other protections. All of it kind of comes together. So you can uh take a look at all of your all the bad stuff happening all in one place. And you can also extend your investigation with OSQuery and other integrations that are built into the

platform. And where I mostly come in here and sort of is my bread and butter are the endpoint artifacts. And these are the protections that are authored by the security intelligence team. This ranges from Yara signatures, uh, EQL detection rules. We have machine learning bas based classification models which will run and help detect on the fly whether or not you're running malicious or benign software. And my team basically is working every day to continuously improve these protections. Um, we update them as new threats come to light. We also just generally like have improvements that come out anywhere from every day to every month. We also always want to keep our um customers aware of the latest threats and make

sure they are protected. But now who are we as a team? This kind of explains like elastic as a whole. But I want to explain where our team sort of now fits in in this whole thing. So the security intelligence team helps our security users by building these artifacts as I mentioned. But my team has a broad set of skills that range from security research to threat hunting to being one of those developers who I guess doesn't get along with the infosc people but we'll get to that. Um and sitting alongside our team we have like I said the data engineers the software engineers. So if you go around the circle we have data scientists who

are developing our machine learning model. For those of you familiar with the endgame days this is something called malware score which was released I believe in 2018. Um we've been still maintaining and iterating on that for the past oh gosh seven eight years now. Um we then have our security researchers. These are detection engineers. They are uh people just looking for what's the latest and greatest uh malware for Linux and every other operating system. We have a really great um section of our website called security labs. So you can see all the cool research that my co-workers are doing. You have folks like me, the data engineers. We're building data pipelines um which I'll go into a little bit more

detail later. um making sure that all that data is available for the researchers. They're the ones looking for what alerts are coming in, what kind of threats are we seeing, and how can we better improve our detections. And I also fall into this bottom one of software engineers where I also help uh build and deploy um services for the internal team that helps automate some of their day-to-day. So did they build a cool thing that takes in a file and outputs some detections or some sort of information about it and then we can build that into our automated pipelines. We then take one step back. Uh so part of the security team is us of

course doing our research protections and memo but we work with these other teams and one of them is the endpoint detection response workflows. So they're looking for how do we on onboard our endpoints? How do we have the endto-end user experience happen if you are um entering your endpoints into our system? Um what kind of response actions you can take? We also have our integrations team who not only help build the endpoint uh security application itself. So they're the ones building the defend app. Um but also all the other thirdparty integrations um the analytics we get back from our customers etc. Um we also have our analyst experience team. Uh I don't know if we have any security

analysts here, but you would be the ones in the sim every day getting really intense with all the alerts happening and things like that. And then finally, our cloud security team because the cloud security is sort of a unique posture within this whole security realm. So they have their own team um and they're working on yeah how to make ourselves secure in the cloud. And then product of course is always talking to us being like users would like this. Can you please do that? Of course. But not only that, now we have to go all the way out to the wider context of how we fit into the giant company that is Elastic because we're about 33 to 4,000 people

now. I can't remember the exact number, but we fit into engineering who also overlaps with infosc and our community team support sales marketing legal. There's so many different divisions. So, we can already see how this ecosystem gets very complex. like while I may be on a small 40-ish person team into the wider context, we start to see like where these complications can start to grow as you're trying to build and maintain your product because a lot of people have a lot of interest in what's going on. It's not just going to be you. So, how do we start to help all these people? So, first thing I'm going to talk about is the things that I build

within my team on threat and and security intelligence. Um so I come from a software engineering background. Uh I got into cyber security and this uh community when I joined this team five years ago. Um so I love DevOps. I'm a big DevOps person and that's kind of like uh my bread and butter. Um, so my team is called threat data services within security intelligence and we embrace this DevOps mindset when we are building not only our internal tooling and how we deliver this out to our colleagues but also we're productionalizing their workflows and as we create and build these protection artifacts. We have a similar mindset here as well. We're going to talk about

kind of this iterative development process um in the context of how we collect information and we put that back into our protections. Um I want to show kind of how the output of these systems and what we do continuously help us build and scale our product. So the first is firstparty telemetry collection. So this helps us better show how we can use our product. So you can see here um so cabana is the user interface um into elastic. Uh we are collecting telemetry uh we have a little web API that helps gather that information. We have this system that is built called supernova which is amazing. It will superiorly take all of the inbound data and help us uh put it into

various data stores so we can analyze the alert telemetry and other kinds of telemetry coming back from users. Um the alert telemetry really helps us see how our artifacts are doing. It's basically our key way of like seeing have we accidentally false positived everybody? Do we release something that's not good? Like can we see kind of what's going on? Um we can see how the machines are performing. Like this is basically used for kind of business decisions and research. So if we can get some first-party telemetry back, how does that help us continue to build out the product? We also have our threat intelligence pipeline and this is my team's sort of like main project. Um

this helps us take our internal research to further improve our insights into the current threat landscape. So we have a few data sources here on the lefth hand side. You can see we have some threat intelligence feeds. We take some of our own telemetry put it through this pipeline or other APIs. Uh in the middle there we have sort of our services that we've built. These are custom services that we have um developed over many many years. We can also join in other thirdparty data with this first-party telemetry to make larger inferences and conclusions. And then we also put it again in any kind of store that one of my uh co-workers needs to then maybe

they take that in and do more automated tasks. Um but we try to automate as much as possible what our researchers are doing manually so they can focus on doing awesome stuff and not have to spend like three hours manually doing the same thing over and over again. Um, for my DevOps folks, this is big toil reduction territory for us. One of the really cool instances of that is a service that we've built called Detonate. Uh, what Detonate will do is you can give it a file hash. It will either download it from Virus Total or some other places we have access to. Or you can provide the file yourself, and it will bring up an endpoint,

install everything it needs, detonate it, blow it up, uh, and then we can see what alerts that we've gotten back on our own stack to be like, hey, did we detect this? What kind of rules are we firing? Um, uh, is there anything weird in the logs? Do we miss something? Do we see a rule that should be firing that's not? Um, this is especially important when like a customer comes to us and said, "Hey, I got like I didn't get any rules on this thing and I was expecting like X, Y, and Zed to happen." Um, sorry, I'm I I am from Chicago, but I live in Canada now, so I apologize for the zed.

Um, so this really helps us with our research and also just validating kind of things we're seeing from customers. Um, did we miss something? Like, is there something interesting happening? and the team likes to use this to help um build out their body of research and make sure it all goes well and good. So, this was something that was a side project that we've helped automate and helped to run at scale. And finally, I want to talk about artifact development. Uh this blog post here, this graphic is from one of our art our um articles on how we mature our behavior uh maturity how we build our behavior maturity model. Um, but this is instead of like you rules only, we also

do this kind of with all of our protections and with all of our artifacts. So we have this iterative process that basically helps us find a problem looking for either the perfect rule or like an update to a protection. You take that rule set, you maintain it, test it, look at a criteria assessment, and kind of going back through this whole process. We're going to talk about this in a little bit more detail in a couple slides, but this is something that we help um not only like our research teams maintain, but also our detection engineers. uh the security machine learning team and kind of we help with like automating this process. So let's get to the the

interesting things now the case study aspect of this talk. So how are we interacting then with all these teams with these internal services that we've built to make to have an awesome product and do what we need to do on a dayto-day and yeah go back a little bit to the garden metaphor. uh we have now talked about how we we have the land we are the tenders to the land we have started to plant seeds but we need to be able to continuously make them grow and this is where we're going to talk about the interaction between sort of the tech and the process um I'm happy to go into more technical details if you have questions

but a lot of what I want to focus on here is kind of more culture focused and like how do we grow everything because when you go to talk about the teams it's especially very interesting given the the way that our team operates. So for example, let's start small and we'll work our way out. And I want to start with the folks that I directly work with in security intelligence and the endpoint team. Um, a small tangent that I haven't mentioned, uh, elastic is completely distributed in our engineering team. So my team of security intelligence is spread across 17 hours of time zones from Melbourne to Vancouver and 10 at least distinct time zones within that. So, not only are we

trying to solve very difficult security and software questions, but also the question of when is today because that really depends on a lot of factors as well as cultural differences, the perspectives and even more like even of our team of not quite even 50 people. This can be a very challenging um endeavor uh as we start to kind of figure out how to build out these systems and continue to scale. So for my first case study, I want to turn return back to artifact development. Uh we have a very good testing process to help us catch issues before an artifact goes out to customers. Uh but this was something that took a while to develop. Uh we're

trying to answer questions like are the artifacts in the correct format? Um when an endpoint loads the artifacts, do we have logical errors that get thrown? Um do we have alerts at least for known hashes? I would think most folks here are familiar with Mimiats. Um, sort of like a basic test hash like did I accidentally add an endpoint exception that makes mimicats not fire because that's not great. You won't want that a customer to have that kind of experience. Um, so we typically refer to this process as a whole as smoke testing. Um, this might be stretching the definition a little bit, but for those of you who aren't familiar, it's called smoke testing because where

there's smoke, there's fire. So, we want to make sure that we put out any fires before we burn down our users or as we were talking before the talk, burn down besides if we lit the curtains on fire. It's cold in here for you watching the recording. So, we're trying to go from predictions to productions. So, when I was helping design this artifact testing workflow, these were some of the questions I was asking like, what information do artifact developers need when you promote new detections? What's a good baseline for determining if an artifact or protection is impacting a release? We talked about mimicats. Like we don't want to have like something that should be caught slide under the

radar when we make an artifact update. And finally, if there is an error, who needs to be notified? Like how do we propagate out this this error to other team members? And this is where the testing pipeline was born. This is where I started to interact with pretty much everyone on my team, including the endpoint developers, um, of like how do we take this notion and kind of expand it out to being an automated pipeline anytime we have updates. Um, so this is my little testing pipeline here. This is kind of my baby and has been for about four or five four years or so, I think, at Elastic. So, we're validating our endpoints. We spin up a staging artifact

environment. So, we're able to take the new artifacts that we want to push out and like mock them and to pretend like they're production. We're then running our endto-end test where we're creating a full stack solution um where we're able to update it to point to these other artifacts. We're going to run a full um policy validation test and this essentially is what I mentioned earlier of like, hey, is there a rule that's weirdly formatted? So when I load it into elastic, it's like, hey, no, I actually have a problem with that because even those small warnings is a really bad user experience. And then finally, we initiate our smoke test. So the detonate service I talked about

earlier, we pull up a local dockerized version in CI every time we have an update. And we're able to then basically detonate all like a small set of smoke test hashes to make sure that the malicious alerts are malicious, the benign alerts are benign, and we don't have any like major changes between artifact updates. And then finally, we tear down all the infrastructure. We process our results, post them back into either PRs or Slack channels, and we're able then to have this nice sort of end-to-end process to be able to determine if we've accidentally, you know, done something bad. Uh, but mostly to make sure that the experience of using our artifacts and using our product is still

awesome. So, what are some lessons that we've learned here and how does this help us grow? Automating that feedback is really important. As I mentioned earlier, toil is like the worst thing. Not just for developers, but I'm also sure for security professionals. If you have to constantly do the same thing over and over again and spend like six hours testing something, uh before this process, you'd have to spin up a little local server that faked being the staged artifacts. And then you had to have an endpoint available and that endpoint had to make sure you were locked down so if you ran like something, it didn't accidentally leak out into the rest of the network. And all these various steps

had to happen. And now the security researcher can be like, I made my pull request. Everything's going here. Great. I'm going to keep working while this test runs and they don't have to worry about doing all these manual steps. Uh, so it's really nice for them because it helps automate that feedback. Uh, this also helps us ship with confidence. We now know that our users hopefully will not have any big issues when we ship something out. And finally, this will help us be able to scale as safely as possible. like we're going to be able to reduce the risk as we add new artifacts, um update existing artifacts, change the logic, etc. So, we're hopefully reducing

this uh this risk as we keep growing. So, now we're going to move a layer outward um and take a look at our internal company stakeholders. Um as I mentioned earlier, my team, you know, is a small drop in the wider ocean of not only engineering, but also product, infosac etc. And from time to time, we interact with them to get a better sense of what's going on and to make sure that everything is copathetic. But since this is a bides, I specifically want to talk about our interactions with our infosc team. And if you were here for Cathy's talk yesterday or even the talk right before this one, uh you may see a slightly familiar theme here, which is

great. Great context setting. So for my second case study, I want to take us on a little side quest to talk about incidents. Uh because unfortunately they happen to everyone. Um although please raise your hand if you've never had an incident. Good. There are no liars in here. Um they all happen to us. You know we maintain code. Computers are the worst. Inevitably you will run into some CVE a leaked credential. Some something will happen no matter how careful you are, no matter how diligent you try to be. Um and everyone I'm in this room I'm sure has been as we have now proven has been on one side of this. whether you are um

pinging folks to help triage an incident or you are being pinged to bring in additional context uh for responders. So the text on this slide may look familiar. I'll pepper it with some anecdotes um and some thoughts on how we deal with incidents with our infosc team. Um so when we receive the dreaded ping, it never feels great. I have a co-orker named Darren who is amazing, but Darren sometimes will pull me into things at like five o'clock on a Thursday and I'm just like, Darren, I was done. What are you doing? But um you know, we know we're in good hands though. This is where we help with like the incident triaging from our

side uh when we get the ping from like hi. So we then work hand inhand with our infosc team to help investigating. One thing that I really enjoy about how our team interacts with infosc is that we have a really great relationship with them and are able to engage in this like cross team investigation at like a pretty low level. Um we have a unique position also of having a team full of threat researchers. Uh infosc as a wider organization also has like a set of threat researchers looking kind of more generally you know things happening at the company while we're kind of more focused on the end user. Um, but since we have threat researchers on the team,

when something like this comes up, we can ping certain people to pull them in to this task and be like, "Hey, um, we got pulled into this incident, but we know that you have expertise in X, Y, and Zed. Like, can you help us debug this? Can you help us figure this out?" Um, and so it's really awesome to be able to have the kind of like subject matter expertise directly next to us to help with our infosc team. So then we have mitigation. Uh and from time to time we get this really cool thing that happens which is not only do we try to resolve well we do resolve the problem internally but sometimes the threat researchers that I

work with are like oh hey this would also be really useful for our customers and they will write rules and push them out to production within like a day. So that's really cool too. So we're able to not only help with protecting ourselves internally but then kind of like give that back to our users as well because surely like we're not the only ones who will probably hit this. So that's very cool. And then finally, if the incident is serious enough, uh we will engage like in a post-mortem. Uh this communicate our findings back to the infosc team. Uh this is really important because part of our postmortem is also in how we communicate with infosac like was there

were the right people in the channel? Were the was there too much noise? Like how did we have that back and forth? Because that helps us continue to foster and build this relationship moving forward. So while in the last talk we talked a little bit about how devs and security are at a little bit of an impass having that kind of postmortem is like hey what could we do to better this relationship like with our infosc team and I wanted to bring up this slide as well um this is from um our company source code so elastic is an open source product so there's the GitHub source code this is like our company handbook of like values and things um and this is

what it talks about here about being humble and ambitious um and this uh part of our company culture is to not be an and that is really important with how we work with infosc as well. Um they are always so kind and like understanding that like stuff happens. So when we're brought into these situations they're like our job is to get this fixed. There's not a there's no blaming that goes on. There's no like why didn't you just x y and zed. are telling someone that I hate the word just yesterday because it implies that like we should not have been so stupid as to not have done something in the past. So this always rings really true

to me when I think about how we interact with our infosc team. No matter how unserious or serious an incident is. Sometimes it's nothing. Sometimes it's it can be quite large. They're there to listen and to help. And if you're on an infosc team having trouble working with devs, I think that there was also the fear talk yesterday from Kathy. Our goal is to work together to make something awesome and to fix the situation. There's no time for like being a jerk during those times. So, how this helps us grow? Um, not only do we have this great relationship, but infosc uses elastic defend in SIM. Like we have Defend deployed all over the place. We're using

detection logs. We're have all of our things connecting back to various clusters. They want to have us make a great product because we use it also. Um, if you look up info the elastic infosc customer zero stuff, there's a really great talk also from my colleague Aaron Jwitt, I believe. Um, you can take a look at how they do that. Um, but we have this nice symbiotic relationship. We know that we're safer when we work together and we try to lift each other up. We try not to punch each other down. Like this is a handinhand way that we help build um, our product. And like I said, because they use elastic, because they're using defend, because we're

working together, this is also then going to benefit everybody else. And speaking of everybody else, the final case study I want to talk about deals with our users. Um, Elastic has like a long history being an open source product. I won't get into all those uh nitty-gritty details right now. You can go read about it. Um, but we have a really large and passionate user base um of open source users. Um, and they matter a lot to us. They're our customers just like our paying customers are. Um, even if the word customer may have a different implication. Um, so we really want to make sure that we give back to the community and that we are

supporting them even in a security sense. So my case study three is oops all artifacts. Um, we went I think I mentioned this earlier. We went uh into production with this and out of pre-release in August of 2021. I believe it was the next day somebody tweeted just decrypted all the elastic security edr signatures including ML models blah blah blah blah blah. And in theory, this is kind of terrifying because you're like, "Oh no, my artifacts and it feels a little overwhelming." But the next day, uh, my at the time team lead made this post in our discuss forums where he said, "You know, you may have seen something regarding our artifacts now being out in

the open, but our vision is protecting the world's data from attack. And to do that, you sometimes need to be more open." And we don't want to have to frustrate users. We don't want people to feel like, you know, they're being like have a rug pull out from under them. But we want people to know that we do some of these things in public. Like we have some other detection rules that are public. We're blogging about it. We're trying to make things more available. We have our our models up in Virus Total. Um we have a bug bounty program. We wanted to make sure we addressed it before everybody was like, "Oh, you thought you were protecting and hiding

these secrets." Like, no. Our our goal actually is to continuously lead an open and transparent uh security solution. Um so about a year later after this uh we published this blog called continue leadership uh and open transparent security which led to us publishing some of our protections artifacts. So we have two public repos. Uh the first one here has our behavior uh endpoint rules and Yara rules and the sin detection rules on the bottom that's actually been open since June of 2020. Um, and I think I mentioned the art the protections artifacts on top we've had open since um, August of 2022. And these are continuously updated. Anytime we're publishing updates, uh, these are usually automatically updated, sometimes

manually. Um, but we want to make sure that like users are aware of kind of what we're doing, what's going on, and um, keeping them in the loop. And this is important for us growing because like I said, we really do believe in like bringing in customer feedback into the dev cycle. Um, we have also a public Slack where we have our users in there and sometimes they'll even bug us to be like, "Hey, noisy rule is happening right now. Did you push an update like what's going on?" And we see their feedback. The detection team will take it back. They'll double check the rule parameters. They may fine-tune it, but they're constantly kind of listening

to users to be like, "All right, let's go." And part of open sourcing these rules and these protections was a way to have us get that feedback from probably some of our most valuable customers, but I'm a little biased being on the security team. Additionally, being open by design is part of our team's mission. Uh we do believe um that it's not great necessarily to be shrouded in secrets. Um so we want to make sure that we are open by design and users are feel really empowered when you kind of show them what's going on and you give them confidence in the solution. Not everything is public. We do have some things we do need to keep closer to our

chest, but it really does have users feel like, oh, cool. I'm I'm part of this, like I can feel like I can uh make like a contribution here or make a difference. And so before we kind of move on, I just want to make it really clear. This this partially is my opinion, but I do think it r it rings very true. You can't scale or maintain a product if you don't cultivate these relationships. It's not just with the people who you're next to. It's not just the people who rely on you. Like it's it's not just the users or the paid people, the people who pay to like use your product. All of these different

things help you with like scaling and maintaining your product. I fully believe this in general even if you don't work in security. But I think it's especially important in this context. Like these are people who are relying on us to like make sure we're safe from bad actors, from you know fishing emails, from ransomware to all these different things and making sure that we can help protect people. So it is my humble opinion and it's like with your garden if you don't water it everything's going to die. But if you do but if you you know want to water it but then you don't like pull things back you're still going to have some issues. There's going to be

things that overgrow each other. So all of these things are very important and it can seem kind of daunting. You don't as a one person you don't have to be responsible for all these relationships. I'm sort of summarizing my several people around me into one talk. Um but these are really important. I think I think that that's how you help build out a successful security solution. So how have we done? People like big numbers. So I'm going to talk about some big numbers first before I get a little more esoteric. Um so if you look at our publish our published rules in our public repositories we have published over a thousand detection rules I think around 1,200 at this point

um for SIM and other security integrations. Uh in there you'll see things like um like for example uh some of our SIM rules are things like if you have the AWS bedrock integration set up it will help detect if somebody's trying to maliciously use like your API keys there. That's a cool rule. Uh and you can see all of them in that public repository. We've also published about a thousand or so endpoint rules that not only work crossplatform but also for various operating operating system specific ways. And finally, I think there's about 2,000 YAR rules as well. And those are specific um for each operating system, but you can go to that first repository I showed earlier. Happy

to go back to the slide if you'd like to see. You can take a look at all the rules we publish, which is very cool. Um we've also since January of 2021 published approximately a little over a thousand artifact updates. Um not only just for the ones in the public repository but also just general maintenance that we do internally. It's it works out up to about like an update a business day. Um and some of them are very small and as we say in the next one for about these 35 artifacts they're updated at you know varying frequencies but we do actively maintain about 35 different kinds of artifacts which are helping keep our users safe. And

internally we have over 180,000 um unique detonated hashes. So with our detonate service internally this is what we're using to help perform research. This is what we're doing to ensure that we're protecting against all these attacks. And that's since about January of 2023. And that number continues to grow as we dump more and more hashes into um into detonate. So there's a few evergreen reminders that I want to bring up. Uh embrace continuous improvement. DevSec Ops really works for our team. I put the second in parenthesis because we work with security folks. This isn't like pure dev sec ops. Um but this really works for us is kind of like okay seeing this telemetry bringing it back in and

doing this continuous improvement sort of in this big infinite figure eight. Um empower your team. I teamwork breaks the dreamwork is a real cliche I know but when people feel empowered to make strong decisions to work together um I don't like the you know your work is your family but I do believe like you should work with people who you know have your back and making sure that it's like I have your back no matter what happens like I will I will help you through this incident I will help you through this bug empower your team to make sure that even if you're a dev or if you're an infosc or if you're anywhere in between if you're an analyst

whatever ever. Make sure that you feel empowered to help your team make the right decisions. And finally, listen to your community. Security doesn't have to be all secrets. Um I think that what I've learned from working with um especially the threat researchers that I've um been very fortunate to work with over the years is that someone's going to figure it out even if you don't say it. So sometimes being more open, while it may seem counterintuitive, can help with uh making your product better because then people are seeing like, oh, this is like a way that I can help defend against this and you help grow a community. And I'm really excited about that. And it

could just be again because I'm a uh open-source dev nerd at heart, but I think that being open is really cool. So, I'll leave you with one last thing because as I was working on this talk, that's obviously me being the Canadian. Um, I keep thinking about what I wanted my final message to be. And 5 years is a really long time to be working on a project, committing myself to a mission of helping protect the world's data from attack. Um, I've watched this product go from like a pre-release to fully deployed in large environments, scaling over a number of clusters and endpoints over the years and maintaining our tried and true and also bleeding edge

artifacts. But ultimately, what's kept us successful is a combination of so many things, most of which I've been able to talk about today. Um, I hope that there was something in this talk that you can take back with you. um whether it is how to update your internal development process to better impact your end users or how to interact with a team in a way maybe that needs some TLC. I ultimately think that and I want to remind folks that we're all human and have the same goal in mind which is protecting people. It's a scary world out there and so as the Bides uh charm code of conduct says please be excellent to each other. It sadly

doesn't say party on dudes, but you can also party on if you so choose. With that, I just want to give a few acknowledgements before I open it up for questions. I definitely talked a lot faster than I meant to. Um, so I want to thank some members of my team. Uh, Mark, Hez, Joe, and Devin are all part of uh the security intelligence team and have helped me on this talk. Uh, they're also all exendamers and they're still with me at Elastic. So, that's very cool. Um, also please say thank you to Rupert and Xander. They are my at-home team members um who are very helpful in my day-to-day. Um the photos come from

Unsplash uh and this the Canadian Internet Regulatory Agency. That's where all the awesome Canadian stock photos come from. They are available for free as long as you um site them. And I think they're very fun. And of course, thank you Besides Charm for having me. Um really appreciate a lot of the conversations I've had over the past couple of days. And uh yeah, with that, we have lots of time for Q&A. I would really love to either hear your stories or talk about how to make awesome teams. And yeah, that's it. Thank [Applause]

you. I can't answer questions about moose attacks, though. I I actually don't know anything about those stats. I wish I did. Oh my god, I left 10 minutes for questions and I have left you all speechless.

None.

Is everyone just too nervous? Oh, no. Okay. I mean, I'll be around. So, I guess with that, um, I will let you have a little bit of extra time before, uh, closing ceremonies. So, also, thank you. And if you're going to be at RSA conference, come by the Elastic booth. I will also be there. So, if you'll be in the opposite side of the country in two weeks, come say hi. No, thank you so much.

BSidesCharm 2025 - Inch By Inch: a Case Study in Maintaining & Scaling a Modern XDR Product

Related talks