GT - Building an enterprise security knowledge graph to fuel better decisions, faster - Jon Hawes

Name: GT - Building an enterprise security knowledge graph to fuel better decisions, faster - Jon Hawes
Uploaded: 2019-10-19
Duration: 58 min 46 s
Description: GT - Building an enterprise security knowledge graph to fuel better decisions, faster - Jon Hawes Ground Truth BSidesLV 2019 - Tuscany Hotel - Aug 06, 2019

BSides Las Vegas58:46164 viewsPublished 2019-10Watch on YouTube ↗

About this talk

GT - Building an enterprise security knowledge graph to fuel better decisions, faster - Jon Hawes Ground Truth BSidesLV 2019 - Tuscany Hotel - Aug 06, 2019

Show transcript [en]

Hey everyone. So, a few quick housekeeping notes before we begin. First, in the spirit of what stays in Vegas, or what happens in Vegas stays in Vegas, this talk reflects my own views and not those of my employer. We are going to move fast through a lot of material. Please keep questions to the end. And also don't worry about trying to absorb everything on the slides. Instead, look at this talk as field notes for the deck, which you can come back to later, or indeed now. The presentation with full speaker notes is pinned here on Twitter. Just a few more seconds. Your first week as head of incident response at DigiCorp will draw to a satisfying and uneventful

close. You walk past the last of the meeting rooms, along the corridor that takes you to the elevator, down to reception, and out to the weekend. That's when you hear it. The unmistakable ping. The email has the CEO on CC. Thousands of dollars has been lost. Something about an application called Sunways. Code has been deleted. The suspicion? Sabotage. You're needed immediately on the 120th floor. As you press the button to call up the elevator, your brain starts to cycle through all the things your team will need to find out. But it's cold comfort that you've dealt with incidents like this before. because the first thing you did when you got to digicorp was ask everyone about the firm's most critical applications and this is the

first time that you've heard anyone talk about some way systems past experience is telling you you're going to need to pull together a picture with only a few pieces to go on you know where the rest of them are likely to be though and all their fragmented partial and inaccurate glory and so it begins the blur of phone calls emails and messages coffee Late meetings, more coffee, a tragicomic and constant struggle to keep everyone on the same page of what's going on, punctuated every now and then by a few slices of cold pizza. Eight days later, your team have solved the puzzle. You've found all the pieces and you've joined them up. The picture about

what happened is clear. You've just come from briefing the board And as you think about the story that your presentation told in that hastily put together slide deck, part of you is just glad that the last few days are over. Relief that you managed to join the dots, but another part of you is frustrated. You know that the mental model that the team's built up will soon fade from corporate memory, slower for some people, faster for others, until eventually it returns to the fragmented state you found it in a mere eight days ago. It's a privilege to be back at the Ground Truth track. No mask this time, a little bit of music. And a huge thank you

to Gaben Urban for once more curating an amazing space at B Science here where we can share ideas at the intersection of data science and security. Make it stop. So, in the next 45 minutes we are going to look at how knowledge graphs can help security teams address the problems that we've just touched on in our make-believe incident scenario and also how we can flip the script on thinking and lists to reap the rewards of thinking and indeed operating in graphs. This talk is the product of nine months work in a live operational environment testing a hypothesis that ran as follows. To solve the problems we face, we need to be able to join all the component parts that relate to

security across business and technical dimensions in a scalable knowledge graph so it's easy to capture, link, visualize, contextualize, share, interrogate, and update information in seconds for executive, management, and operations stakeholders. At its core, this hypothesis focuses on a user need which extends far beyond incident response and security operations because this problem doesn't just affect every function in a security team, it also affects colleagues in many other areas as per this quote from a CIO, "Each quarter my control functions bring me a report, probably a PDF, about what went wrong, but I need to run my business today using today's information." the data analytics struggle to produce meaningful timely insights that can help understand security status

justify priorities and track results is not disappearing anytime soon however even if we solve that problem we've only won half the battle because as per this fantastic blog by chris swan once we've mined valuable insights from that data they then need a proper insertion point into the decision making process that means identifying stakeholders who should receive the information we have identifying the right lens for presenting it across the various different levels of the business, adjusting that lens to provide the right amount of zoom based on the audience's context, concerns, priorities and accountability, and last but not least, finding time for them to consume it. This is not a simple problem, it's also not a problem

that's unique to the world of cybersecurity. As data engineers and data scientists know only too well, this is a problem that appears regardless of the industry they work in. So, in this talk we are not just going to look at building knowledge graphs, we're also going to look at how we can create and deliver context-relevant and stakeholder-appropriate interactions with the information that the graphs link together and of course which needs updating continuously. Is our approach valid? Is our implementation practical? Can the concepts we're working with transfer to other organizations? Please share your opinions and questions with us during this talk and on Twitter. For all things philosophical and technical, please @Dinis, who at some point might be in Portugal and might be online, I'm not quite sure.

He also wrote most of the code we're open sourcing today. George, GG, has been operationalizing a lot of what we'll look at and is the best person to go for a programmers and detection engineers perspective on day-to-day usage of the stack that we're going to look at. And for questions about knowledge graph ontology and user needs, you can point them at me. A note of reflection before a probably ill-advised live demo attempt. Years ago, the paper "A Market for Silver Bullets" described a dynamic in which neither buyers nor sellers in cybersecurity had the information they needed to know what effective solutions look like for their problems. I would argue this largely still holds true today.

If recognizing that no one has the right answers is one necessary step to surviving in this industry, the other point the paper asks us to acknowledge is that any deviation from best practices will incur costs where individual members go it alone. Our team are big believers in the value of open source and Creative Commons. The continued perpetration of bad API outputs, two-dimensional dashboards, endless XML joins and mirror mazes of macros and pivot tables makes it clear that we need to collaborate if we are ever to breach this current equilibrium and move from lists to graphs. Yes, as with anything that requires process change, the shift to thinking and operating graphs in a hyperlink way does indeed have side effects in cost, time and effort. I can also

tell you from personal experience it can be a massively frustrating journey to navigate. No, not everything you see here will be immediately transferable or indeed applicable at all to your business. But the goal today is not to suggest that this is how things should be done, just to share one possible path, the experiences we've had and the mistakes that we've made. So please treat this talk like a meal. Eat what you like, leave what you don't. Let us know what dishes confuse you and let us know what you would like to see added to the buffet. With that, may the odds be ever in our favor, let's cast our mind back to the imaginary incident

we talked through earlier, the early blur of phone calls, emails, and messages, the problem of keeping everyone on the same page, and the dots we had to find and then join up. So, here is a different version of how our story could have unfolded. Let's imagine that when we joined DigiCorp, it had actually been building a security knowledge graph for about nine months, and that, excuse me, they'd also grab some readily available data sets, stuff like HR data, application user lists, alerts from some endpoint technology, a few cloud systems, and of course good old manual data entry. So, let's just check that we're still live and alive. Okey-doke. I feel very ill right now. So, As we step into the elevator to the 120th

floor, and we're going to be very glad there's so many floors in this building over the course of this demo, our first concern is we want to know who we're going to be talking to, who's going to be in the room when we step into it. We've got an email with the CEO on CC, and what that means is we don't want to walk in and not know all the names that we're going to be dealing with. So... We are going to do a search for the SVP of Special Projects who was the gentleman who'd emailed us about the incident. We're going to do a Jira search. We don't have time to look for Pete Smith. There could be many of him. And

what we've got back is a result. Great. Pete Smith is in our knowledge graph. This is good news. So what we're now going to do, now that we know he exists, is we're going to ask our knowledge graph to show us Pete. We've got his unique identifier in Jira. And what we've pulled down is the information that lives in Jira, the node, the edges around Pete, so that we can start exploring them in Slack, on our mobile phone, in a lift. So, we've got some options here. Let's get a quick screenshot of the Jira page, because we want to see what else that can tell us. And let's view some links, because that'll probably tell us some good stuff. So here we are. Off we go. So, first

we've just rendered a little graph. This takes Pete and it gives us all the links, the direct links that Pete has. and here we've got a screenshot. So let's look at the screenshot first. This is the data that we have in JIRA. We can see that Pete is assigned the role of SVP Special Projects. He reports to the CEO. Interesting, good to know. This must be a fairly senior guy. He owns a RISC. That's interesting. We'll come back to that later if we have time. He manages a few people and he's funded by the data science team. Okay, well that's fairly helpful. Let's Skip out of that and go and take a look at the

graph. Here's exactly the same data visually represented a little bit easier for us to consume Okay, cool We know what we're doing and that risk looks interesting because it says in the next 90 days if vulnerabilities need mitigating or the app needs taken down There's no single person who can make a priority call expected losses 1 million if similar incident occurs interesting. All right, so we're now We're going to look at the CEO. So we now know that the CEO has a data tag here, GSP353, and we can use that tag to go and see who reports to the CEO. So what's the reporting line we can get if we go down a bit? So

now we're going to get another graph and tell us some stuff. And here we are. So, CEO is the manager of SBP Special Projects. Again, this looks like the reporting line. Obviously, this is demo data. Usually, if you went down for the CEO, you get a much bigger hierarchy. It's just showing the content. So, we know Pete's a senior person. He's got an engineering team underneath him. The question is, though, who are these people who have these roles? Who's the lead engineer? Who's the platform specialist? Who's the engineer? We know from our training in the knowledge graph that when we have nodes like this, we can ask who is that role assigned to? using a search called "role is assigned to" and we're going to hit that

so we're going to try and use some natural language now to explore the graph so that when we teach people this work they know where they're going and it's fairly easy for them to do and we get an expanded graph. Right, this is fairly handy. This is what we were after when we started our search. We want to know who's going to be talked about in this room, who are the people we're going to be meeting with and right here we can now see that our CEO, who we haven't met yet because it's our first week, is Alan Lee. We know that Pete Smith is the guy who wrote the email, Bruno Lyon, Norman Ligos,

Sophia Boleyn and Alan Champion. Great, okay, so now we've got our org structure. So, we know that we are dealing with an incident about an application called Someways. We have absolutely no idea what Someways is. So the next search that we're going to do as we continue going up the elevator is for Someways. So let's see what Jira tells us about this. This does not look good. It looks like Sunways has had some pretty serious incidents associated with this before. For example, we've got admin accounts here shared across users. We've got questions about detections for malware and hacking. And if we expand that out, we can see there's really an awful lot of stuff that

looks like we should know before we walk into that room. Let's take a look at one of them. So we're now going to dive from here. here into 1296. Come on brain. There we are. So with any luck, ah, right. So sorry, that's done a pop up on my sec 12296 on my current screen, which is not helpful. So rather than put that in there. All right, cool. So we have now dived into JIRA. We've gone from Slack because we want to explore the graph through Jira. And what we're seeing is a question that was clearly asked during this previous incident. And we've got a ton of stuff here where it seems that it's been established that there have been an awful lot of detections from the EDR software

against certain users. So obviously we are interested in taking a bit more of that. that. So something jumps out at us, he says, or it did at least when I was preparing the demo. Oh, there we are. Brute forcing. So obviously we're going to be interested in detections of brute forcing. Okay, great. We've got a detection here and here we've got a computer. We've got a device that it's linked to. It's pretty cool. We want to see who owns the device. Let's take a look. We skip into that and Jira is telling us this is owned by Alan Champion. Okay, interesting. We saw Alan at the bottom of the chain. He's the one who had

this weird detection for something that we were interested in. We're now going to jump in to Alan and what we're going to see here, ah, he is the admin for DigiInk. Hmm, okay. So, if we jump in to this, We can now see that at some point that had a login detection against it from a non-Digicorp IP. Bad news, it doesn't seem like there's any option to enforce or enroll 2FA. That seems fairly serious. And we can also see who else shares that admin credential. So what we're going to do now is go back to Slack and we're going to dive in to Data updates about some ways. So we've gone a little exploration through JIRA, but JIRA wasn't

working for us. We got caught up in this kind of minefield of information. We want an easier way to consume that. So here, once more, we have our handy view in Slack. And what we are going to ask is what are the linked issues? So as we go down, we can see loads of incident facts. So at some point, the team that have dealt with this incident have gathered a ton of data as they've been going through it. This all looks very helpful. We can see the admin account. We can see the user account. If we want, we could edit some fields. We could change the assignee. We could add a description. We could add some labels. We could change the workflow and so on. And if we

dive into this, Let's have a look at what the first fact tells us in our graph. Lambda functions are working away, hopefully. Ooh. Ah, there we are. So, no single accountable lead for technical support and triage in the team in the event of an incident. That doesn't sound great. We can see there's a vulnerability there. Let's take a look at that, see what that vulnerability is. Okay, so in the event of another incident or a requirement for architectural change, there's no person who can make a final decision on the impact or appropriateness of a change. That doesn't sound too great. Let's... Go in to Jira and take a little look at that. Oh, must have deleted that

one. All right. So long story short, what we're beginning to run into is that we're having to jump back and forward between Slack and Jira to try and answer the information we want. And that's pretty cool in some circumstances, in some areas or some instances where we're going to want to do that. But really, what we're after is an easier way to consume that information. So let me see. If this works, this is going to be a transcontinental live demo. George is currently either not watching this like he said he would be and telling fibs, or he's going to come online, which I might need him to do the Jupyter support bit. George, if you're out there,

George, I need your help now. Don't leave me, man. All right, I'm going to assume that there's a long lag on the online support. I'm going to forgive you. Oh, my man. So... Awesome stuff. So George are you able to do the JP servers command here because I'm now out of the lift I've got to run down another really long I've got to run down a really long hallway, and I just don't have time to cool down this JP server so I'm some form of signal if that's okay, or I'm going to try and do it from the commands I've got here me

Now this is how live demos should be done. I don't know why I haven't done this before. Right, quickly before we... Very busy, very busy. Okay, so... What George is going to do, hopefully, is he's going to now call us a Jupyter Notebook, which basically has a pre-packaged way for us to view all the things that we're interested in in this demo. And once that comes up, fingers crossed, once he starts doing it, we're going to see some more stuff. While he's doing that, I'm going to show one other thing. So one of the things I mentioned was very often getting hold of information in kind of a consumable format that's stakeholder ready is difficult. And we've seen a number of components in this knowledge

graph today. We've seen risks, we've seen vulnerabilities that trigger those risks, we've seen people who own those risks, and we've seen a ton of facts, a ton of ground truth data that's been collected and evidence has been gathered during our day-to-day work. So one of the things that Dennis did when he built this is a lot of the pain of our lives in security is putting together slide decks. And what we've done is create auto templates so that all that data that lives in the graph, once you begin structuring it in the way that we'll discuss in the ontology section of this talk, you can begin automating the generation of PowerPoints. Not only that, but

you can begin sending them through slack or matter most or whatever information messaging service you have to the right stakeholders and you can put in interactive buttons for people to go i accept that risk or i want a meeting or i don't have time to read this slide deck right now can you give a briefing to one of my team and in doing that you can solve a huge amount of the pain that kind of we have to do to begin to extract information. So this is just an example that Dennis put together of what one of these slide decks looks like. I'll ping back George back in a sec just to see if he

can finish off the demo. So what we've got here essentially, what's happening in the background to this is An instance of Chrome is being spun up by a Lambda function. We're logging in non-interactively to Chrome. We're opening this page in Jira. We're taking a screenshot of it. We're sending it back to the place it's stored. We're putting it into a slide deck. And that all happens in a matter of seconds. So this is like programmatic creation of stuff from applications, which is super cool. Dennis has also done a huge amount of work to clean up the format of Jira. So it's super messy when you actually first start doing this. But with a few tweaks,

you can actually begin to create really nice consumable information out of the way that Jira presents things. You can generate timelines. So great thing about Jira, everything has a timestamp. If I want to say this happened, this happened, this happened, all I need to know is how I want to create that path in the knowledge graph, and I can go create it. If we want to pull down graphs, we can do that. If I wanted to, I could instead ask to just show a part of that graph and to break the text out of the graph and put it in a table, for example. If I wanted to show all the detections that we have

against our EDR software, I could create a heat map of that And so we saw earlier that we map roles and teams. So I could create that for each team I have, and I could send a heat map of the detections that the various teams have had via Slack to the various team owners in our business to say, this is what your lay of the land looks like this week. That dude who's had like 20 detections for adware, that's all software that he needs to do his job. You should probably buy him a good license version rather than the crappy one that he's downloaded, which comes packaged with a load of adware and McAfee. *laughter* So this is like, you know, just another example. If I know I

want to understand accounts that have access to things in an application scenario, again, once I have those paths in the graph, all I have to do is create a reusable, almost like micro service Lambda function to pull that data out of my graph, dump it into the format I want. You can do a ton of stuff with this. Dennis, to his credit, is on holiday and has basically worked 18 hours a day to make this demo happen. So, as has George, and I won't... Ah, here's your server. Great, we're back, we're back. Cool. So fingers crossed, I can click on this and this is going to load up. Yeah, that's it. Let me just try and get my mouse back to base. Let's create

another desktop. Drag and drop this in. Expand that out there. Here we are. Okay, here we are. So here's the B-sized rotation. Thank you, George. "Legend in your own time, what time to be alive," as he likes to say. So that's the slide generation thing. But the thing that George has done when he was running this incident is create a prepackaged recipe book in Jupyter. So one of the things that pains me in our industry is very often we end up asking the same questions multiple times. And very often, I end up asking questions that many others in our industry have asked before me, because certainly the issues that we generally face aren't new, right? Especially if we work in incident response. Wouldn't it be great

if there was a way to codify that knowledge in a way that everyone in our industry could access open source? And this is kind of how we're beginning to do it. So what you have here, is a recipe book that gives me, as the head of incident response, the lenses that I want into the data when I have any incident. And in order to create this, all I have to do is drop in the title of the incident. So once I create an incident in JIRA, once I create an application, this recipe book works for anything because it's agnostic of the particular data. It just asks questions through a path in a graph. So we

can now create this scalable thing that says, right, okay, well, here are all, here's the playbook, basically. These were the questions that this person who managed that incident had in their brain that they knew to ask in this sequence. Like, here was the way that they went from A to B to C in that incident. We haven't got the timeline for some reason. That was working this morning, for which I apologize. But hey, at least the demo has worked up to now. We're almost at the end. Breathe a sigh of relief. So this we're just running through here. So you can see here what we've got like we've got an incident view We've got graphs

to help us understand the application. Oops subways still there. Sorry. It's some ways This is not incident about subways control capability failures are cool. Here we are. So the other cool thing about this is That we can bulk edit data right now from Jupiter into So we don't have to create things individually. If I'm in a meeting, if I'm in a war room meeting with an incident, I can literally be adding questions, adding decisions, adding people to my knowledge graph in real time through this interface. This is like the control panel to begin graphing stuff. And once you kind of get this, you power, you end up graphing everything. You can get a little bit,

we did get a little bit too much into it, which I think we've pulled ourselves back from the brink. But here you can see like these queries here, These are the queries that are generating this graph. And as long as my graph has standardized queries, again, I can replicate this for loads of other stuff. And if I wanted to, I'm not brave enough to because I'm not technical enough to, I could change this now, add something else in, and it would change the graph live. So if I know what questions I want to ask of the graph, I can use this recipe book to begin building my own whole new dish based on the scenario

I have. So this is the building block for me to expand stuff out. out so understanding Subway's accounts etc etc you can graph things like this like we use plant uml just because it's super easy to read but if you want to do more kind of Neo4j-esque stuff you can do it like this so this is showing the middle is some ways application the accounts around the edges so the accounts the next layer in and then we can see all the the alerts and the people who own those accounts that are associated with them and also I know we've talked a lot about incident data, but when we run incidents, one of the things that

we do is we look at what the facts that we discover during that incident tell us about successes and failures of our security controls. So, in PIR, what this enables me to do after an incident is go through here and say, right, well, for example, this fact that there's no authentication gateway for an application indicates a failure of identity and access management, secure architecture, secure engineering, and I can actually begin to build out a picture of how successful my controls are for certain coverage across certain elements that I'm interested in in my environment, and I can begin to look at that across my entire enterprise. All I need to do is start linking the data

together. So that concludes the demo. Let's just go... Cool. Thanks, George. Thank you. The goat sacrifice was not in vain, people. It was not in vain. Turns out you can have anything delivered to your door in Vegas. Is it about to go horribly wrong now, though? This would be typical. Ah, we're back. Right, cool. Okay. So... Let me navigate back to the slide where and let's have a look. So how am I doing for time, please? As in left ready yourselves people this is not going to not going to be easy So let's have a look at the tech stack. There were quite a few moving parts there before we go behind the scenes

Help us move faster in partnership. That is not to say the primary function of our tech stack is not to automate pain away. It is to enable us to deal with that pain faster. It should basically be like an Iron Man suit, which any member of the team can put on to help manage new problems quickly and efficiently. And once knowledge and analytics are based baked together, sorry, into a process that is stable and reusable, only then will we add automation. This approach draws inspiration from this article, which is super cool, called "Automation Should Be Like Iron Man, Not Ultron." And basically the goal is to enable us to do the creative stuff, enhance our

data system, and keep ourselves moving faster, rather than doing boring, repeatable tasks. Second, we want to make it easy for anyone who joins our team and eventually the wider business to be able to find and understand patterns in data, both at systemic and local level. So rather than providing set ways of looking at something, we want people to be able to benefit from our knowledge base, but also add their thinking and evolve it and take it to the next level. To do that, we basically want to take a list of questions like this. and we want to build them into something like this. So these are lists of ingredients in data, examples of recipes, and

eventually a library of microservice runbooks which can be taken and joined up as similar patterns emerge to the ones that have happened to create them. So this slide provides a rather abstract visualization of this process, but basically from the top to deliver a specific mission, for example, an incident response, you know, as you're in diagnosis mode versus solving versus mop-up and PIR, You're going to use different data building blocks. These are your ingredients. Those ingredients are going to be combined in different recipes over that life cycle to uncover knowledge, answer questions, complete tasks, etc., etc. And in future, what we hope is those recipes become reusable. Finally, there's a heavy focus in our system on

the ETL phase of transformation. Why? Curveballs in consuming data are sadly the rule, not the exception. Messiness is a feature, not a bug, especially it seems when you need to consume and correlate data at short notice, so we designed Heavily for that. So, let's take a tour through the data system. Here are its current components, the lines between them indicating a route of input or output for data to flow. At this point, well, probably before this point, you may have been thinking, Jira? More on that in a moment, but when we began building this, we had a shoestring budget. We needed a system that we could choose to scale as we wanted it to scale.

And sure, it may look a little bit weird at first glance, but sometimes you have to sail with the ship you have rather than the one you want. if I'm honest, of all the data systems I've seen built and worked with to try and solve analytics problems in security, this is by far the most elegant. So, a few definitions in terms of how we think about the data system components. Jira is shorthand for our graph data store and ontology management system. As well as having a lot of highly configurable fields, It also logs every single change that's made under a ticket, providing a full audit trail of who did what, updates to what ticket and

when, and who doesn't love a good audit trail? Elk is our friendly neighborhood index where we store Jira data so it's easy to search and visualize via Slack. It's also a good place for us to analyze and visualize trends relating to nodes and edges, albeit more in terms of how people are using the data system rather than actual operational scenarios. Slack is basically our command line tool as as well as the communications fabric that our company runs on. This lets us automate all kinds of feedback loops via a medium that our colleagues are already familiar with and are engaged with a huge amount of the time. Big shout out here to Ryan Huber, whose blog

on distributed alerting informed a lot of our thinking, and all we've done is really add a sprinkling of Jira into the mix. Here's a few examples of the kind of command lines we can call down through Slack. Here's another one. And then finally, Jupyter. So this is our more advanced interface for creating and working with both ingredients, books, and recipes. Here's an example of an ingredients book. And basically what this is designed to do is enable easy exploration of relationships between all our asset vulnerability and risk data. So when I think about all this, when I think about this tech stack, these are effectively like a choice of interfaces, either for users like me who

are non-technical, or those like my colleague George who are highly technical. So we can pick and choose how we want to interact with our graph. Then in the middle, we have GSBot acting as the API broker to make all these various connections possible. That's the tech, but what about the problem set that this actually solves for real people? Here's the frame of reference that we use for thinking about the modes and triggers people have when they need to interact with data or information. And a huge thanks to Russ Thomas, Mr. Meritology, for sharing the triggers part of this with me years ago as it inspired my thinking. We can map our modes. to different parts

of the data system where they fit best and consider what interface is best under what triggering condition. And this helps us consider routes for inputs into our knowledge graph and indeed those outputs that support feedback loops. So with our robot army of Lambda functions, acting as the glue box between this system, we can now do things like this. So when we were graphing manually on the fly, or sorry, as you'll see in a moment, effectively we're under a kind of crisis trigger, under interrogate mode, and this is basically what's happening in the background, right? Tickets created from Slack, tickets synced to Jira, Jira synced to Elk, and then as we query it goes back to

Slack, all happens in a number of seconds. Dennis is, if nothing else, super focused on speed of execution in this kind of stack. And it's very, very impressive to see how quickly it responds to commands. For batch graphing, we can basically dump stuff as we kind of touched on in the demo through slides into Jupyter, sing them to Jupyter, index in Elk again, search in Slack, explore stuff. That's kind of the workflow for that. Here's an example of a Jupyter recipe book. So I'm not going to go through this. I will make the video available online. All this is saying is like ultimately once you've taken two or three messy data sources and you've done

the stuff once, the parsing, like why not batch that, right? Let me take that data and start to identify weird things about it, right? So this is a representation of an ontology. But it's amazing how much business context you can get from just two or three data sets, like your HR database, some random application access lists, right? Anything else you can get your hands on. The gray boxes represent data which is going to need to come from elsewhere. But even with a few data sets, you can start building out context between business and technology dimensions, which, after all, is exactly why we set out on this journey. And it's a short jump to getting data

like this to using the passing process to identify inaccurate, incomplete, and incongruous data. For example, who owns that active generic account in that SaaS system which has a dot-com email domain which isn't yours and seems to have last logged in three years ago? And why is there a disagreement between System X and your HR database about whether or not this employee still works for you? These are the kind of questions that leap to the fore when you start taking these very basic data sets and asking simple questions of them. So those sound like facts you might want to capture, present to management and present potentially a vulnerability. And now we have the beginnings of building

out our graph from just a few data sets that we already have to begin starting to operate in graphs. It really doesn't take much, super cool, and we're more than happy to talk to anyone about how you can speed up that journey if you're interested in going on it. Here's just an example of a recipe, but we have effectively what we've done here is mapped an application via reporting line up to the CEO just so we can better understand the stakeholder landscape. So I showed you that example earlier of mapping down from the CEO. This is a real example that we use in our company. And here's another example with a slightly different lens where

we actually want the names of the people. And this may be there are people leaving, maybe there are people joining, and we want to add some names in in the meeting we're in while we're discussing changes to the application during a threat model. So all of this becomes interactive in every time that we go out and kind of touch a stakeholder either via a meeting or a conversation on Slack. So here's a workflow that combines automation with user interaction in the loop. This is Ryan Huber's distributed alerting. This is a video of the user experience built by the man and the legend, Gigi, who you've already met. So let's have a quick look at what's

going on here. So this is distributed alerting. What's going to happen in a mere moment is... So this is an alert that has come in, it's been triggered by something going off, being put into Elk, and an alert in a Lambda function seeing that data in Elk, and it sent it to George saying, "Hey, we've seen an unauthorized login. It's come from a non-company IP address. Here's the date. I use the method password. Can you please let me know? So this is him just indicating here. This is now synced to Jira, right? So that's auto synced. You can see the workflow there is in progress. Usually this would be an alert that we would have

to go triage, right? Figure out like, hey, was it you? But thankfully George is gonna click, this was me. And that is going to update our workflow to not only log that the person clicked it was them and to say that it was them that clicked it, it's also going to change the workflow, close the event, and we don't have to touch it. All that's had to happen there is a user has had to do an interaction which they're very used to doing, which is going to a Slack channel and responding with a single click. These are the kind of workflows that you can begin to build and they can be much, much more complex

than this. Once you get into it, really your imagination is the limit and then it's just a question of figuring out who wrote the code and maintaining it. So, what this enables in the long run is micropopulation analysis. I don't worry about the details on this slide the main takeaway here is what you've just seen begins to enable us to understand better a specific set of users who have a specific pattern of life within our business and we can use data as a security team to better understand their reality so let's say we have a shared email account which multiple users access With normal detections, all we would see is a detection against that account.

But with the workflow you've just seen, because there is someone who goes and clicks that and acknowledges that that was them, what we then begin to see is actually alerts against a specific user for that inbox. And we begin to build up that pattern of alerts. For example, if this person is burning the midnight oil, seems to be working 24/7 because their colleagues are on holiday, frankly, that might be data that HR should get, right? So there's a huge amount of benefit we can begin to give other parts of the organization. by doing this. The aim here, however, and I just want to stress this, is not to be a 1984-esque security team. We want

to use data to better understand the business process. So that if we do need to apply controls to it, hopefully we give ourselves every opportunity to minimize friction or avoid it altogether. This is a work for progress. Essentially, someone here reports a vulnerability, alert goes into Slack, risk team get notified, risk team explore it, risk team link stuff to stuff. They then change some data in Jupyter. So this is at the moment a great example of where we have a much more involved manual workflow that we're trying to identify opportunities to automate. But basically how we want it to end is that the exec gets sent that neat slide deck in a Slack channel to

go, yes, I accept the risk or no, please have a meeting with me. So If we can use feedback loops from these kind of interactions, perhaps this actually gives us a better pattern for what risk appetite truly is, because what we're doing here is we're mapping and creating a system of record for the decisions that our business is taking about issues that they have in the real world. So this becomes a great proxy for understanding different risk appetites as they exist across our business. The code for all this is here. We've created a fake mini company, a serverless version of Jira. There's basically incidents alert, people roles, everything that you've seen today in this demo

exists in this environment. You can go have a play with it. The integrations with Slack and Jira aren't out of the box yet. Watch this space as we begin to develop it out. But basically, feel free to go have a play. It's a graph database that you can mess around with and begin to see whether or not there is stuff that's there that's useful for your business. So, there I've got 13 minutes left. Probably not going to get through all the slides, which frankly is a good thing. But the reason that we wrote this deck was actually to provide almost like a book, right? So this is designed to be the guide to the thinking

as other teams kind of, if they want to go on this journey or are on this journey, and go, "Oh, that's interesting." Or like, "Wow, that was wrong, and you shouldn't do that." Which of course we would love to know, because again, we then don't have to make mistakes. But let's whiz quickly through the ontology section, see how far we get, and take a look at it. So ontology is really obviously kind of at the core of graph thinking. It's a set of concepts and categories which show properties and their relationships. So the one we've arrived at in our knowledge graph has evolved over time. It is definitely not a butterfly at the moment. And

this section of the talk gives you an overview of the evolution process and some learnings. Let's start with some early work that we are ashamed to put on our kitchen fridge. So this is a flow diagram of our instant response process about eight months ago. There's a blog online about this if you search for it. The highlighted areas just show the different Jira tickets we'd raise across this workflow. And frankly, while this created a varying amount of administrative overhead during an incident from, oh, this is okay, to, Dennis, I will never work in this way again. The detail it enabled us to capture was really invaluable, both for post-incident reviews to look at what went

well and what needed improving, as well as for capturing knowledge about all the business context that we have. So early on, we were using graphs just to kind of map out our incidents, right, in PIR, go right how many questions did we have to ask against a thread you know how how many how with what threads got connected to each other did we successfully complete the incident or not unfortunately loads of really valuable data was getting lost and it was getting lost in the place that we capture incident tasks which is the description field the free text description field in jira this was immensely frustrating because incidents would pull on lots of threads across the

business and there was no good way to weave this data together. We began experimenting, and after realizing how damn expensive it is to refactor node and edge ontologies in Jira, because at that point we did not have Jupiter and I was doing this manually late at night, much to the frustration of my very patient wife, We moved to PlantUML, and this made it cheaper to discard mistakes and rebuild the graph differently, but frankly, the results weren't helping. Almost everything was linking to everything, and while we developed some key components of the graph during this challenging phase, overall things were getting more confusing. When people would say, that looks complicated, Dennis would buoy our spirits, for

which I thank him to this day. Pointing out the complexity we were creating was a reflection of a complicated reality, but that didn't change the fact that our system of nodes and edges was increasingly hard to work for, This is an example of the nodes we have in just one project and we have multiple. And that was before you got to the problem of deciding what edges to use to link what nodes. And while our graph certainly lacked nothing in terms of freedom of expression, the consequence was inconsistency, which made it incredibly hard to navigate the graph and ask it questions with the confidence that you were seeing all the data that you wanted. The

result was confusion in our own team, let alone when we tried to use the data to communicate with the business. And to a large extent, frankly, this was because our graph had become removed from operational reality. Our nodes and edges reflected abstract concepts that we were trying to mold together that didn't make sense to anyone outside our team, and for some of us, not even those that were in it or indeed creating them after a while. So forcing functions are funny things and just as incident response have been a trigger for us to work in GRAS with practical and beneficial results at operational level, budget season helped us to make an evolutionary jump in a

more strategic direction. One of the many challenges we face in security teams at budget season is articulating effectively what won't be done either based on the investment the business is giving us and prepared to make or our security team's ability to actually operationalize the budget we're given. So we began focusing on the function of data to solve this problem, drawing some inspiration from the Bauhaus movement. Our need for fact-based narratives was clear. We needed a common theme of... questions and the things that were coming our way needed to be translated into business context without requiring a human to sit there. So in classic two-choice presentation style, we stole an idea from a friend of mine

who works in management consultancy who once said, there are only two presentations that you give to management, cloudy day, sunny day, in which things are bad, but if you do X, Y, Z things, they will get better, and sunny day, cloudy day, things seem good, but they won't stay that way. And so once we... Once we started to develop narratives like this one here, when we transferred them to our graphs, all of a sudden the graphs began to get a lot simpler and a lot cleaner. Even when our plot lines got a lot more complicated and had a lot more actors in them, we were confident that the storylines were still confident to track, and

we cycled back to JIRA and began implementing the ontology we trialed. While our nodes and edges often did not correspond to a human-readable version of the storylines we were telling, that mattered less and less because you could read those stories across the data in the graph. The nouns and verbs that we needed were becoming far more human-readable, emerging through the shape of the graph, and the storylines were working really well as we went and presented them to stakeholders, even if they never saw the graph. So began the era of the great refactoring, the informal creation of the entropy crushing committee. Hello, James, if you're watching. We started standardizing and formalizing our nodes, our edges, and the relationships between them that could

exist in the graph, which represented a trade-off. Yes. I'll come back. Come on. Don't dial me now. So we chose a rigorous structure of the ability to enter arbitrary data because of these things. We wanted a logical narrative, we wanted to be human readable, we wanted a clear idea to be in someone's head of the possible outputs when they asked the graph questions. This made it easy to see when data was missing. So there's nothing more helpful than knowing what isn't there when you ask about the knowledge that your collective graph has as knowing what is, especially if you have an incident and you might need to call in your colleague who works in threat modeling to do a super urgent threat model of an

entire application. So, thankfully, this choice fitted hand in glove with the way JIRA allows you to organize data. This is roughly how data is kind of, sorry, this is a rough translation of how JIRA organizes information into graph speak. Happily, from an administrative perspective, this structure does support innovation, so the goal of the Entropy Crushing Committee is not to stop experimentation and innovation, it's to stop pollution of the graph that leads to not being able to ask those structured questions upon which so much of our recipe books rely. This is important because change to an ontology is a feature of knowledge graphs, not a bug. Until it becomes cheap to mastery factor your knowledge graphs,

however, I would highly recommend avoiding the pain in doing so. So, two quick examples of lessons learned, which I failed to learn at the time, which cost me an awful lot of sanity. This is a generic version of what an incident graph can look like. This is an edge narrative from a few months ago, just to give you guys a picture of past history compared to where we are now. Here just for reference to the workflows that we associate with each of those different layers in Jira so that adds incident tasks, facts, etc. Move through their life cycle, we can track those. So I'm not going to blend along on those. One of the things

that I failed miserably to capitalize on was investigating the metadata that people were adding to issue types in the knowledge graph. So here's an example from our security incident issue type. which shows all the various fields that were added over time as we tried to tag things we needed, add in details that we wanted to search for or we wanted to organize by. This is just a different view of all those fields and what I should have done earlier was to look at those fields and ask the question, are there other issue types in other projects that could benefit from what we are trying to do? Have they come up with a better way than

we have to do it? And is there value in creating a new node and edge relationship that bridges across our projects within the graph. Had I done that, I would have probably found that we were all trying to tell the same story in slightly different ways and we could have identified what those common themes were throughout those narratives and glued them together a lot better than we did. A second example of lessons learned is from the Red Team project. This is the project's ontology as it exists today. This is the narrative it supports. The ontology did not start like Originally, it reflected more of a very tightly constrained pen test that we called a red

team as we began to get the business comfortable with testing in production on a regular basis. Over time, our scope got a lot more freeform. And as that happened, as our red teamers began to think creatively within the structure, we began to introduce things like security control observations from the red team. about controls that shoulda, woulda, coulda maybe stopped them quicker, faster, and cheaper when they were able to jump from one task to another or exploit or find a vulnerability. And then once we got the blue team fully involved in end-to-end tests and evaluating findings, we began linking that concept of control coverage to the information the blue team had about IT systems and specific

IT assets so that we could understand and then begin searching across our knowledge graph to see if we had similar gaps in other systems from the one attack path that the red team would be able to take because even though they only get one, you can then mine a graph to be able to see, right, okay, what are the commonalities across other systems that might indicate control gaps, control failures, changes in operational states, for example, or lack of coverage. So at a certain point it was obvious that security controls and IT systems and assets should not live in the Red Team project. Unfortunately, we developed the control ontology pretty much in isolation. We missed some

major opportunities to evolve it and make our data richer across all projects. For example, in relation to how regulators articulate controls. And when we changed it again, we had to do a lot of refactoring. So lesson learned, get out your bubble, go speak to members of your own team who are probably battling with similar things. It's easy to get into a spiral of focus and lose the bigger perspective when you're doing this stuff. So where are we today? The focus at the moment is leaving the detail behind. Our time is spent thinking about the big building blocks so that we can begin to fit the detail in them in the best way. Here are some

examples. This is kind of the business structure ontology, which you've already seen. Here's the IT asset ontology. Here's the projects. Here's Interactions, so once you get into the graphing meetings is actually really valuable. Well that decision you made that led to that task, that task hasn't been done, it's been three weeks. So like when you said in the meeting it would be done last week, could you move that into your workflow? So you can really begin to drive accountability through doing this. Cool, five minutes left. Here's a good one for third parties. This is the kind of stuff that I want to know and link to. Here's our incident one as it is today. Here's

the, meh, I mean like this is the, yeah, this is the threat model. This is still very much an evolution. There are some part, we'll come to this in a moment, there are some confusing parts to this graph that don't quite work yet. But again, like this is where we are, and the goal of this is to share in all its imperfection what's going on. And here's the red team project that we've already looked at. So the plant UML code for all of these, if you're not a coder like me, but you can kind of understand plant UML, that's all on GitHub too. So you can now go in, you can grab all these ontologies,

you can experiment with them, you can change them, you can whack them back on GitHub and show us what you're doing and show us where we can take the next evolution. just by way of a few guiding principles. Here's some things that I have found helpful to avoid mass refactoring. There can be many ways to explain relationships between nodes in different projects, but there should be just a few ways to describe node relationships within projects. This is okay because the people project is separate to the instant response. Man, tell me about it. If you feel... There's a guy who goes like... So I know it's a lot of information. I apologize. It'll be over in

three minutes though. So for both of us, like I can't, I just need, I need a beer. I need to go and cry in a corner. So, This would not be okay. This is not a good example of ontology. Next, be careful of node-to-edge paths that create unstable narratives. So here's a great example. You start building your knowledge graph. This incident used a vector. Threat actor in this incident used a vector and caused that impact. Right there, it's super clear what's happened. Bad things happen when you start linking through non-mutually exclusive relationships because the context for which node links to which node quickly disappears and you have no idea of which storyline links to which

storyline. Finally, look for node-to-edge joins that create narratives with the fewest number of touchpoints. So here's a great example. This is just a random example like Joe Bloggs who works for... don't even know what it is, Acme Metals. He reports an incident. If the incident concerns an application, PlantUML does weird things with the actual way the visual's laid out. You just kind of have to live with that. Here's a link into the application so we can begin seeing that. And then this example, which is kind of cool. So what we're doing here is we're actually saying, we ran a project. So for example, if you run a project to do technology oversight and investigate, say,

a firewall you have or make a WAF, you might find a bunch of issues with that WAF, right? And those vulnerabilities can then be articulated to the business in terms of the things they affect and they can be linked to the people that need to make decisions about those risks. And what we can then do is... propose another project that's a take if you want to fix these things this project is going to do these tasks over this time period to deliver that shift and in that way you can begin to articulate the movement of time and resources through the graph that allows the business to understand where its money is going and then the

end of the year all you need to do is basically say, well, here's the graphs and here's how we've changed them. This is basically where your money's gone, which is very, very helpful and stops the usual kind of zero budget thing that everyone has to do going back to the beginning and building everything up in a nightmare of excels. That said, sometimes you just have to accept that security makes a mess out of everything. Here is a project migration graph before security gets involved. And here's what it looks like after you've started adding in what I reckon are fairly minimum requirements. set of relationships in a threat model. So in closing, I think I've just

got time hopefully for this like two minutes. What next? I have been thinking a lot about this quote recently from my friend Dan about how to put graphs in context. One of the reasons that we rely so much in this industry on generic patterns of play and best practices is there is no widely shared knowledge base that helps us identify the right pattern of play for our business. Despite the allure of consultants selling us what good looks like, there is no single repeatable pattern. It's more like playing 50 games of chess and changes on one board affect all the others as we desperately try to tailor our strategy and operating model to deliver stage appropriate

results and build the boat while we're rowing it. It's very hard to understand. I find in a given moment what the best choice is we have. If we lack a picture of the landscape and every visit to SFO reminds me of this, a short geographical distance that does not count for hills may not be the smartest route. Simon Wardley has written a lot on maps and patterns of play. When we think about the inputs and the outputs within controls that rely on data and analytics, and we think about what that would look like if we started connecting it to the business, Perhaps what we need to do is to start considering and combining graphs and

maps to understand where we need to put the focus of our investment and those data scientists that we are hiring and who are trying to do such great work for us. For example, if the internal feedback loop in our SOC looks like this, and the one for our red team looks like this, and the data feedback between these two controls involves this... Then maybe the smart place to invest is here. And sorry, incidentally, along the bottom axis, what you're seeing is things move from genesis, a state of kind of unstable creation, into commodity at the end. So we have a bunch of ideas on this that we just haven't had time to work on. If

anyone loves graphs and maps, please get in touch. This is kind of the next phase of experimentation for us where we're going to take stuff. I'm also super excited about building the FAIR model into our ontology. Jack Jones has done some great, great work in helping us articulate risk. And I think some of the stuff we can do with graphs is especially helpful in quantifying unknowns through the lens of night and uncertainty, for example, where we're actually unable to say what a risk is because we don't have the information to make that judgment. Last two slides. We face a really tough challenge in this industry to hit a target that's basically context-dependent, moving all the

time. I hope that some of what we've shared today helps us all escape a common enemy, which is the risk-a-tron. And that's all. Thank you so much for listening, and I hope it's been helpful. Thank you.

GT - Building an enterprise security knowledge graph to fuel better decisions, faster - Jon Hawes

Related talks