BSIDESLV 2018 - Ground Truth - Day Two

BSides Las Vegas · 20184:53:41446 viewsPublished 2018-08Watch on YouTube ↗

Tags

CategoryTechnical

StyleTalk

Show transcript [en]

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪

about how are you going to deploy it with compliance restrictions? How are you going to scale it up? How are you going to evaluate and show your management that this system actually works and is adding value? Don't ever discount if you're running a production service, you won't have live site with on-call guarantees. There needs to be someone if the system fails who's able to step up and do it. My team, my broader team has a bunch of personas that we rely on to ship a production solution. We've got machine learning engineers, but we totally rely on ops engineers for monitoring performance, our compliance and privacy folks, or help us make sure that we're maintaining our commitments. We've got product

managers that go talk to the businesses and do our metrics. So it's not like a one man security data scientist to get something into a production that's working. That's pretty much the end of my talk. And if you're interested in any of the roles that I showed you, super hiring for my team and the broader team, you can always reach me at my email. I'm on Twitter. That's where all the cool kids hang out. But I just go there for sneaking around. But my DMs are always open. And if any of what I spoke to you interests you or if you feel like you can come and add value, I'm going to hang around. I'd

love to talk to you guys. So we can take about two questions. Sorry. Hi. What was your training size for the PowerShell attack? The PowerShell one, it was, I think, 400 gigabytes a day. A day? So that's basically all the exhaust that we get from detonating suspicious files. Thank you. Absolutely. First of all, very interesting talk. Thank you for presenting. You mentioned you're collecting about, I think it was over 80 terabytes of telemetry from Office 360. It sounds like in your first case study, you relied on labeled data for a lot of the training. Can you talk a little bit about the strategy for generating labeled data from that sort of volume? Yeah, absolutely. That's a

great question. So first one is, we rely on things like our red team. So they would go after a service and then we have the tainted records. That's strategy number one. But the pros of that is that's as close as attacker data that you would get. But the con of it is for a service like Azure, not every service is going to get attacked every single day. So that's like strategy number one with pros and cons. Second strategy is to kind of like inject synthetic data. Now this might sound like extremely trivial, but it really works for you as a sanity check, especially it's a canary in the coal mine. If your system has changed so much and if your injected data does not pop to

the top, you kind of know from a unit test perspective it doesn't make sense. The third thing that we rely on is try to have any sort of open source tools that especially our red team doesn't use. So we have out of test distribution tools to evaluate our system again. So we're not exactly only relying on one particular set of known badness. We've also looked from a completely machine learning perspective. We have two strategies. The first one is out of the box, things like SMOTE, when you try to under-sample and over-sample. And the second thing that we've gotten good results with is, we've got preliminary results with, is using generative adversarial networks. So you have some sort of malicious data

and you try to bootstrap from that. So from a security perspective, those are the three things. And from a machine learning perspective, it's that. And also one of the things with Microsoft is because it has products across the stack. So you've got Windows covering, the Windows team covering the host level. You've got Office 365 covering the application level. One of the things we do cross-pollination between products. So if you want to find out if a VM is compromised, and a hypothesis is like, hey, a VM is compromised if it's sending out spam data. So we get the labels, the spam labels from O365 and join it with our Azure data. So we also have this cross-pollination that's kind of different

from the security and the ML strategies. Yeah. Oh, perfect. Thank you. And let's give a round of applause. Thank you so much. Thank you so much. Come on. Nice to meet you. I love concrete dreams. Oh, Eugene! Oh, yeah, okay, so you did the presentation with Eric Howie. Okay, this one. That's awesome. I'm so excited. You got everything? You need water? All set up? Are you going to be behind the camera? No. Yeah, you give her this camera. Oh, all the way? I'd love to catch up with you, Eugene. Yeah, it's finally nice to face the name. Yeah, thanks for stopping by, man. I'll hit you up. You're perfect. Thank you so much. Oh, Alexander,

thank you so much. Yes, I love all the stuff that you do. So I was fascinated by that. But in particular - Yes, yes. - Yes, we totally use that. - HDMI. - Hey Gabe. Oh, yeah.

VGA to HDMI. Hey, how you doing? Good, how you doing? Excellent. Ready to rock and roll? Huh? You ready to rock and roll? Oh, yeah. That's actually kind of cool. Ram, he was talking about the academic study that came out. It came out pretty recently. This is like our take on it from like two years ago, right? And they had like a team of actual, you know, PhDs working on it. So it's far simpler. We have still had great results, though. And with the run in it, it's been two years. I think that's pretty awesome. Yeah. I got to go get the... Yeah. I just got to go get the... Cool. Is there adapters here, Gabe? Yeah, if they

don't have them up here, they've got them in the back of the room. I asked them to make one. Oh, he's got it back there. What's going on? Not much, not much. Just Mr. Zimbala. I'm not going to be in the back. I'm going to be in the back. No. Oh. So... I've met him once, but... Who me? Hey, I'm Delta Zero. Oh, okay. Oh, thanks. Yeah. Okay. I'm looking forward to your talk. Me too. Thank you. I appreciate it. If there's anything you need help with, I don't know. For sure. Yeah, thanks. Is this HDMI? Yeah, no, we're not here. Whichever. It looks like it. Is that fit? I don't know what that is. So, what are you going

from HDMI to VGA? Yeah, VGA. Oh, awesome. Yeah.

Oh, this is like, you know, PlayStation 4, I think. Something along those lines. Yeah. Yeah yeah it's a Super Nintendo. Pardon me, can we raise it? Oh yeah that's perfect. Raise it up. Is this the speaker water right here? Should I go for it? Yeah. Good morning. and welcome to B-Sides Las Vegas. In Ground Truth, this talk is given by Derek Thomas on PowerShell Classification, Life, Learning, and Self-Discovery. Just a few announcements before we begin. I would like to first thank our sponsors, especially in our inner circle, as Rapid7, our seller sponsors, Amazon, Oath, and Simmel. It's their support along with other sponsors, donors, and volunteers that make this event possible. Now, as you are probably aware, these talks are being streamed

live. And as a courtesy to our speakers, please make sure to silence your cell phone. You do not want that to go off in the middle of it. And if you do have any question, please raise your hand. I'll come by with this microphone. So go ahead, Derek, take it away. Okay. Can everybody hear me? Okay. Good to go. Okay. Let's get this party started. So, um, This is an overly dramatic title really, so it's developing a PowerShell classification model. And the reason it's kind of overly dramatic, my self-discovery, is it's kind of my first model, right? And so this is kind of the process I went through from start to finish, from kind of

observing high-level features and kind of saying, "Hey, this might work," and going through that whole process. And then also the pitfalls and solutions we came up with along the way. So this kind of follows pretty well after Ram's talk with the PowerShell that he mentioned. So that's one of the references that I listed at the end. That was an academic paper they had fairly recently. So this has kind of been our process for the last two years. So who am I? I'm Derek Thomas. I'm an applied security researcher at eSentire. We do managed detection and response. I'm a Converge Detroit conference organizer. There's a couple of us around here, so we hold a security conference

every year. And a Michigan security member. So I consider myself a security data enthusiast. My whole life has been involving logs, so you're probably saying to yourself, "I've lived a really hard life," and that's probably true. So if you work with logs your whole life, you've got to be pretty good at doing analysis, or that's a tool that should be picked up. On Twitter, you can hit me up, DTom, LinkedIn, email, if you want to contact me afterward. I'm always open. So I have a couple goals here. I figure the audience is made up of kind of data scientists and maybe security professionals. For data scientists, I think they would like to learn about this

problem, the strategies for detection, maybe working with subject matter experts on how they see things and how they identify malicious activity and the processes they go through. Security professionals, this is a concrete application ML use case. You see ML all the time, it's buzzwords, someone said ML blockchain AI, you know, it's like, what does that actually mean? Well, this is a concrete use case and how we've derived value from it. And I think everyone should start looking at detection a bit differently. A lot of security professionals are very focused on like atomic IOCs, very detailed observances and things like that, and I think that moving up a level to more generalized detections raises the bar

for attackers. And everyone really just learned from my experiences. I think that if I saw myself, if old Derek saw this presentation, I would have saved myself a lot of time. So this is our first version of the PowerShell project. This was from probably implemented about two years ago. In the process I went developing a proof of concept and that proof of concept was used to say, "Hey, this is valuable. We need to operationalize this, spend time and resources to implement our production model to do real-time PowerShell classification across our entire environment." And as we go through this, knowing that you're going to see some pitfalls. You're going to see some things that say, "Hey,

that doesn't really work." So I'm going to cover at the end the pitfalls and solutions to those problems. So here's the agenda. We're really gonna talk about, you know, PowerShell, the problem, why it's important, how to get data and properties of malicious code, and then the PowerShell classification in part two. It's really just how I started and went through my machine learning process there. So this all really started back in 2016, so it was a while ago. It's kind of how I feel PowerShell, motion-to-use of PowerShell started to gain extreme popularity, and it kind of followed true with Ram's slides there in 2017, 2016. And this began with, as it always does, a simple request from one of my clients to monitor and alert on malicious PowerShell

activity. So following that, it's like, whoa, what does that entail? What is malicious PowerShell activity? So being from a... the world of logs, SIMs, things like that, creating use cases. I created rules, created hundreds and hundreds of rules, then I bypassed those rules, then I created more rules, and then I'm thinking, this is insane, there's gotta be a better way for this. So that's when we kind of clicked and it just hit us that maybe a classification model, looking at the things that we as analysts look at, would be the way to go, and so this is that story. So we'll start off from the beginning, like what is PowerShell? It's a scripting language legitimately

used for any administrative activity within your environment. So basically administrators can do anything they want and automate it through PowerShell, which is awesome, it's super flexible. But it's also, for those same reasons, increasingly leveraged by our adversaries and pen testers, you know, to compromise servers and workstations, and what we call living off the land. PowerShell's available in almost every organization and every workstation that your users are on. So living off the land is using legitimate tools to meet your adversaries objectives, right? So they have the tools available, they can do almost anything they want once they have access to a system. So we dubbed this project Blue Steel and I'm gonna talk about that later.

Everyone either understands immediately or not. So we'll see who gets that reference or may be scared to say they do. Oh, this was born out of client requirements for effective detective malicious PowerShell. Okay, so why is it important? You might say, "Well, okay, Derek, you know, you can detect this. Why is this important? Why should I do this?" Well, it can be leveraged in every stage of the attack. If you look at the MITRE framework, they have one tactic, a reference for PowerShell, which is true. It's, you know, it's for code execution, but with code execution, you can perform any step within your kill chain and meet their objectives, so from recon all the way to exfiltration and command and control. It's appeared increasingly used by our adversaries. So

I'm basing my observations strictly off of new exploit frameworks that are coming out. So we mentioned a few, PowerSploit, PSAttack. It seems like every day there's some new tools to fill a void of offensive capability. Just from going to conferences like this, I've seen an increase in PowerShell talks say, "Hey, here's how easy it is to use PowerShell to do offensive activities and defensive ones as well." But I'm more concerned with the offensive. We observe phishing payloads to our customers significantly. So it's been on the rise over the last two years. And then increased use in penetration tests. So I think penetration testers often rely on PowerShell. And with the Microsoft implementation for logging and script block logging, it's going to become more difficult. And I

see this being less the case in the future, but right now it's being used drastically. So PowerShell is extremely flexible. It can be used to, it's a dual use tool. You can hide in plain sight because you can achieve your objectives, steal creds, or add a user to the domain. One of the cool things, and not cool things, is it can execute encoded commands. So you can feed it a base64 string of code and it can execute that natively. So that makes it tough for me when I'm looking at logs and you see a base64 blob of text. Is that good or is that bad? I don't know. At first, I thought, okay, this is

always going to be bad. And in reality, across a large sample size, we see significant use for legitimate reasons. So, say things like Chocolaty or Ansible deploying or doing activities using encoded commands. So I don't really think that just identifying coding commands really works. With PowerShell, you can obfuscate and execute code. I'm going to show some samples of this. Basically, you can slice and dice, replace, rearrange, compress, encrypt the data, and then execute that code in any way you see fit. And then also it's, you can use it for fileless attack techniques. We've seen this significantly in kind of advanced attackers. So execute code in memory, you know, run a scheduled task, and pull the

code out of maybe a registry entry or something along those lines. So, okay, PowerShell's important to monitor. I hope we're all on the same page there. First you gotta get the data. Like any data science or use case creation, you gotta have the data. So where does this data reside? PowerShell is very nuanced, so it's very difficult. Data resides in multiple places. Each place has different amounts of data associated with this. So it can be executed in memory, from the command line, from a script file. So each one logs differently. So one of the main places we use is Windows Security Log. They list the command line straight from the security log. and those are

easy to get. Usually, most organizations are collecting these already. You might have to check to see if you're enabled process monitoring. Oftentimes, this is enabled already in many organizations. Even better would be PowerShell script block log. So PowerShell version, I think, four or five, enabled script block logging. What that means is that any code ever executed by PowerShell is logged in a file and available for analysis. But this is a different... log channel than the security log. So sometimes it can be more difficult to get. Enabling script block logging is more difficult by far than enabling the audit settings. I'm going to show you a resource for that. But it's GPO settings. You've got to

have the correct Windows management framework. You've got to have the correct .NET settings. And you may not be up to date in many organizations. We see mature organizations can easily implement this. In some, it's more of a challenge. Sysmon events, if you're familiar with Sysmon, it's kind of I don't want to call it an EDR agent, but it does process, significant process monitoring on Windows machines. It's a Sysinternals tool and very popular in the security community. This logs process execution and can be used for getting command lines. And then any EDR solution, if you're familiar with EDR solutions like CrowdStrike, Carbon Black, they're doing significant process monitoring. Oftentimes they have logs that can come out of that or you can query them for logs and say, hey, show me

all the PowerShell data commands and command lines. So here's an example, and I'm going to kind of try to outline the nuance of collecting these logs. So from the command line, I'm executing this encoded command. So powershell.exe with encoded command and then just a blob of text. So that's what we would see if we're monitoring these logs. So that can be difficult. But in reality, this is just showing the PS version table. It's showing the version of PowerShell. And I'm going to show you what this log looks like in the actual log. So here's the Windows security log, 4688. You can see that at the bottom it shows the encoded command. What I'm really concerned

here is the process command line at the very bottom. This is a newer feature in Microsoft. You have to enable this through GPO. And I suggest everyone does this. So there's so much more value and it's easy to extract the command lines from this. I'll show you a reference for doing that in a bit. But important to note that you see basically exactly what was typed on the command line is what's showing up down there. So here's a Microsoft script block log, and you'll notice that in the script block log you see the actual code that was executed. So that was the base64 encoded command right there that shows you the version. So this gives

you, I mean if you're not seeing this already, this gives you so much more value because you can see the decoded commands. So that's a step that you kind of can skip with your monitoring solution. identifying the encoded command and then decoding the base64 blob. This can be, this might seem easy, but it can be, it's difficult because PowerShell has aliases and it's very flexible with, you know, dash E, dash EC, dash encoded command, et cetera. There's a lot of ways to do this. So this will show that encoded command. And really, we use this for identifying commands legitimate uses of encoded command. Because we'll see a ton of obfuscated commands that cannot be decoded

automatically if we haven't seen it before. But those are, we've never seen a legitimate use of that yet. So here's the references for enabling command line process sorting and enable script block logging. I definitely recommend the second link for PowerShell loves the blue team. That's kind of what the blog post from Lee Holmes, I think in 2015, kind of got me into really looking at PowerShell, thinking this is important, and really how Microsoft, some of the features that they were implementing. At the time it was a little bit difficult for many organizations to transition to script block logging and implementing some of these features, but now it should be much more doable. So, we enabled

and we're collecting logs on the workstation or server. How do we get them for analysis? Really, you want to do this, if you have a SIM, your preferred SIM vendor should be able to help with this. But I'm going to go through kind of a built-in function many people don't know about. It's easy to use Windows Event Forwarding through GPO to configure the forwarding of PowerShell logs from the workstations to a central collector. At the central collector, you can install an agent like NX log and say, hey, send all of these events from the forwarded event store to a syslog server, to a database, really to anything you want. So it's kind of just one

way for analysis, but if you have a SIM, most likely they should be able to collect the script block logs and Windows security logs from the endpoints. Okay, so analyzing the PowerShell data. I spent so much time trying to create rules on detecting malicious PowerShell, and then just to realize that I don't think this is gonna work. So, you know, I sat in the corner and cried for a little bit and I got up and I said, "Okay, let's figure out, you know, how we're going to do this." So I had two major findings from that. Many of the most samples were obfuscated, but when they were obfuscated, immediately they stuck-- after you've reviewed hundreds

and hundreds of logs, they stick out like a sore thumb. No programmer in their right mind can execute-- or can code that. Or if they do, they're messing with somebody. So I have yet to see a legitimate use of obfuscated code. If they were not obfuscated, they frequently used strings that were known to be suspicious or malicious. So basically that's saying if I look at a suspicious event, almost immediately I can tell that this is doing some shady stuff. And I don't even program in PowerShell. I'm a threat analyst. I look at create use cases, things like that. So it's pretty straightforward to understand, at least in my opinion, after you've looked at some and

studied this a little bit, what's bad and what's good. So we started identifying high-level features that seemed to indicate, you know, what's malicious code. If you can look at it and understand that it's malicious, why is that? So I started asking myself questions. Why is this malicious? And here's all the reasons why. So I'm going to start going through that. You know, the first one really is quantities of known, suspicious, and malicious modules. Immediately when you look at something, you say, this is taken from a real malware sample deployed through a Word doc. and executed, and it has references to invoke shellcode, power exploit, meterpreter, persistence, really you see anything, that's a red flag immediately,

you know something's going on there. Hopefully it's a pen test, and if it's not, you know, you gotta start digging in there. And you see there's a high quantity of these in the events that's being executed. We noticed too that there was a lot of evasion tactics like strange capitalization. It appeared that the code would be randomly uppercase or lowercase. Most code is not going to do that. So how would you identify that? Well, I would imagine that most of your code is kind of at a known ratio of uppercase to lowercase based on the first letters and things like that. And these, when you have two times the uppercase or equal amount of uppercase

to lowercase, that's pretty significant. Then obfuscation and coding. This really entails any custom way an adversary might try to hide your code. So they can slice it, dice it, rearrange it, replace it, put it back together, compress it, execute or encode it, anything. There's a great framework called invoke obfuscation that basically can take your nice PS version table, make that 10-letter command into a 1,000-word command line argument that's illegible to any human and encoded in multiple different ways. But yeah, it can still be executed. And what this does is obscure the code from automated analysis. Logs are going in your SIM, you're looking for code, and you're looking for specific strings like, "Oh, I'm gonna look for an interpreter, invoke shellcode," or maybe you got an awesome list. But

this is gonna bypass that. Then we saw in a lot of samples for malware, it would be a high ratio of special characters. You look at this, and I don't think that this is known PowerShell to anybody, I hope not. This just looks ridiculous. And why does it look ridiculous? It just has so many special characters. So we figured that would be a good one to look at. And then finally was the cosine similarity. There's a great blog post we'll reference here from Lee Holmes, where cosine similarity basically says, does this event, how similar is it to known good events? If we have a list of known good labeled data, we could take that event

and say, you know, score it based on how similar it is to those. So from there, here's an example of just taking two events and scoring it. You see that the top one is an event from a good log source, and a good log source, the score's pretty high. The cosine similarity of a good log source to a malicious log source, very low. And what we do is we take our set of labeled known good and compare it to the known bad that we're examining at that moment. And so just knowing that, it's pretty easy to implement in Python RR, so that's just a function you can use to derive your features. So we've outlined

kind of a lot of high level features. These are things that our analysts are looking at and saying, these are what they're using to judge how suspicious that event is. So now we're gonna kind of go into why machine learning and how to classify this and train a model. So as I learned, there's a never-ending cycle of creating rules and detecting bad behavior and then detecting the bypasses and then detecting the bypass can go on forever. So we have a better way. With those features, we kind of... So we had identified the high-level features. Those would be tougher to bypass than just string matching and things like that. And honestly, I, as a data enthusiast,

wanted to work on a problem that wasn't classifying flowers, right? So I had gone through some classes, and you see the same stuff on the internet. And I'm like, OK, this is directly applied to me, and I think that we can do this. So it was a lot more fun. My machine learning really comes from self-taught. I'm not a programmer. I'm not a data engineer or machine learning expert. I've studied Coursera courses for the last probably three or four years and just kind of tried to learn because I think it's a tool that any data analyst, threat analyst should pick up because this is I think where it's going. Adversaries are making their behavior fit

in with normal activity. That's getting tougher and tougher to identify. Everything I did really was in R, but I built this in scikit-learn. Just because of transfers to my team, a lot of people work with Python better. I was the only one that kind of wanted to work with things in R. Scikit-learn is great documentation. So, you know, I had learned the frameworks in R using the care package. Looking at this, I was able to do this pretty quickly. The raw events came from both Windows security logs and the Windows script block logs. So we had samples from people who are friendly and said, "Hey, Derek, I'm interested in what you're doing. Here's a whole

ton of data. We even generated some malicious samples for you. Here you go." So we took those, collected them. And really, we need to make it into a form suitable for learning. So if you look at this, this is really free text, right? So free text, in this case, isn't going to work too well in our learning algorithm. So we need to transform it into a form that would be. So we reviewed. Me and my partner locked ourselves in a room for probably months on end, classifying. I think we had, like, initially-- pretty thousands and thousands of known good and known bad events. Also generated adversary samples, also collected from our fishing campaigns, things like

that. I'm really just beg, borrowed, and steeled as many samples as we could. Well, we didn't steal anything. It was all legit, so I have to say that. So for each event, we create a vector of value. So first was the ratio of uppercase to lowercase, like I mentioned. Special characters to total characters, alpha characters to total characters, the cosine similarity, the count of suspicious modules, count of malicious modules. Keep in mind, this is what we used at first. This is not what we use now, and we'll go into the details first. But this did work pretty well. So here's what we did. Here's a sample event. This has been some command obfuscated with invoke

obfuscation. We created functions to derive each one of those values. Those are the values there. And basically it's just a comma, it's a CSV file full of labeled data. Yes? Out of curiosity, were you using the raw data, was that the traditional Microsoft event or the newer XML format? So the events that we got were, so when we store the events, they're stored in raw text. So we collect them and I guess we definitely, they're not stored in XML. So we've collected them through the process, kind of similar to what I showed earlier. You have the log sources, NX, collected at central location, NX log sends them, NX log transmits it in text. so it's not an XML or any other format. So it's just open text that

we process. And essentially, we extract the last field that says process command line. It's formatted almost exactly the same, too, with the formatting characters. So here's just an example of how we derive the values and what each... record looks like. It's just a CSV file when you think about it. And it ends up looking like this. And if you're familiar with machine learning and you see a file like this, you're like, okay, this is kind of all the hard part has really been done at this point. There's some issues with this data, but it's there and it's usable. So it's time to start evaluating algorithms. So there's so many, when I look at Scikit-learn, there's

so many algorithms, I'm not a data scientist, so I'm not sure which ones will work the best. So what I found was that I'll just try all of the ones in there. So I found some good sources for kind of creating a test harness to iterate through each one, develop the accuracy, and kind of go through that. But these were the initial ones. We had more like an MLP classifier, So we ended up looking at other ones in addition to these. But they all performed very similarly. We used K-fold validation. So we took our test set that includes known good, known bad samples. We divided that up into like 80/20 with that, 80% data, that's

our test set, the 20% was kind of our validation set. And with that test set, that's what we used to train the algorithms. And this is kind of the process of how we trained. We used k-fold validation, which what that does is say, hey, we're gonna split the data up k times, so in this case, five. And we're gonna iterate through that and train it on four pieces of data and test the accuracy, or test the metrics on the fifth piece of data. And then we're gonna iterate through each piece so all the data's been trained and all the data has been tested. tenfold, ten splits, but in this example it's fives, but either way

it worked pretty well. So initially we had scoring with accuracy. Our accuracy was off the charts and I'm like okay 99.7% I'm done. Here you go guys. And really that was pretty good, but there's issues with this. Here's the distributions of the different algorithms and how they worked out. See the random forest, linear regression, KNN and CART worked pretty well. There were some of the other ones that did not work very well. And I'm gonna talk about why those didn't work well at the end. You may be seeing this already. But anyways, I saw that there's four that worked approximately the same. Each time I ran through that, it would change slightly. That's one thing

to keep in mind with machine learning algorithms. They don't, it wouldn't be the same results every single time. You would get slightly different accuracies and metrics each time you run through the process. So, okay, accuracy's 99.7%, but in my data set, it's highly imbalanced. And we'll talk about that issue later, but so even if I were to guess benign every single time, I would have like a 93% accuracy, which sounds good. In reality, that's the worst that I could possibly do. So we need to look at other metrics like fault, or error. precision and recall. So these two metrics kind of are what we're interested in because we're interested in a good ratio of catching

everything but not sending false positives to our SOC too often. So what, you know, precision is what percentage of suspicious classifications are truly suspicious. So if I guess there's a hundred malicious samples, then what percentage of those are actually malicious? Recall is a little bit different. So what percentage of truly suspicious events were classified as suspicious? So let's say there's 150 truly suspicious. What percentage of those are actually were classified as such? So it's a slight different. It took my head a while to wrap around it. But Wikipedia, this is like the classic picture of how to understand precision and recall. I guess taking a look at it. Anyway, we use the... We end up

using the F1 score, but here's the precision. So the precision for random forest is pretty high, 98%. So what percentage did we classify? Our classifications were truly suspicious, 98%. Our recall, what percentage of our truly suspicious events were classified as suspicious? 96%. So that was pretty good. And then we end up using our F1 metrics, so 97% on random forest. we're pretty happy with this. This is significantly better than our rule-based approach that's triggering thousands of false positives to our stock and missing some known things that we thought it should be catching. And we end up catching through other means, detections through other products, et cetera. So what we're interested in is kind of the

good balance. We can't send too many false positives to SOC. We want to catch everything. And so how you show that is through the ROC curve. I think this is obligatory for any machine learning model. This ROC curve looks ridiculous. Keep in mind, it's based on one customer's worth of data when I created this. So without an extremely varied set, you're going to get a really good looking ROC curve. But, you know, what this shows is that it's way better than luck. And so we're doing pretty well here. So we end up with the random forest algorithm. It was the most accurate, tested at the time. We ran that for a long time against all

of our data. So, the algorithm needs to be tested against previously unseen data. So, we're training and testing on the same, on our test set of data. Now, I'm going to give that 20% that I set aside, we're going to run this model and say, hey, how does this do on previously unseen data? It's very important, if you're not familiar with machine learning, that you don't train on the data that you're validating on. That's like memorizing version one test and then giving version two and you're like, oh, crap, none of this is the same. So we train the model and test the accuracy on the validation set, extremely high. So you see that on this

test we had 1500, roughly 1800 samples, 99.8% accuracy, and you see the F1 scores were really well there. So we think that this was really good. At this point, when I showed this, I'm like, okay, this is where I'm trying to justify putting additional resources to get more data, to talk to more clients, to get the pipeline, to say, hey, we want to apply this to everybody. And so, random force had the highest accuracy. We used this for a while, it was very successful. So, what I show is we reduced false positives significantly. Our stock was looking at every instance of encoded commands that's used quite often for legitimate uses. So, they would have to

decode the encoded command, take a look, and make a judgment call there. That's time consuming and a waste. It ends up being like 1% of the encoded commands were actually malicious. We increased our true positives. So things that were being missed before are not being missed now. So when we detect that most PowerShell actually caused this service to run, that service running is what triggered like a carbon black watch list rule, we think, wow, we should have saw this PowerShell event earlier. Why? But now we're seeing those. And we decreased false negatives. So we could do this by, we ran in parallel with our existing rules so we could see the things that, were missed

by them that we would never have known if it wasn't for this. And we also test on sandbox events. We get tons of phishing documents. And so they detonate them in a cuckoo sandbox. We get the logs. We analyze those. If we see a PowerShell sample from a cuckoo sandbox, most likely that should be malicious. And someone reviews every single one of those. So that kind of helps us say, hey, okay, this thing's working well. And that's it. No, I'm just kidding. So if you look through here... pitfalls and lessons learned. This is the gold, right? Like if I saw this, this is what I would be happy with. So that was the end of

the POC. Definitely not the end of the story. This was probably the middle of 2017 when this portion was done, or early 2017. So we learned a lot of lessons, and we enhanced this model significantly. So I'm going to go through some of those enhancements, talk about some of the problems. Obviously we can't talk about the exact specifics of our pipeline that we're running for our clients, but I think that with these lessons are what we derive that from. So it's really worth its weight in gold and ML. So one of the biggest questions I get. How do you account for overfitting? How do you, you know, feature engineering? Our data was pretty good. We

had a lot of malicious samples. We generated a lot of malicious samples. We were able to get them from sandbox events. We were able to get them from clients. And we took care to create, separate our test center validation set. But one of the issues is our hand engineer. You can see that these are really hand engineered predictors, right? So we created these. We decide what's suspicious. We decide what's malicious. It requires our judgment to do that. Also, each one of those malicious and suspicious samples are not as malicious as the others, right? So you have a suspicious event, you see invoke expression, often seen in malicious samples, but also seen in legitimate samples. At

first I thought, okay, these were always malicious. That's not the case at all. So how do we fix that? So we already played really good close to separate our training and validation sets. What we did here was update our feature engineering to address this issue. We ended up drastically increasing our feature set, but I can't go too much into this, but basically we stopped using the suspicious and malicious count list. It worked pretty well, and better than our rule set, but it did not work as well as we needed it to. So that's about as far as I can go there. And if you think about it, if I'm creating a list of suspicious and

malicious modules, that's from my experience reviewing these logs. And so even though I split the training and validation set, I've essentially looked at the validation set and generated this list of suspicious events from that set. And now I'm training the model on that. So really that's kind of overfitting to that data and we kind of learned that afterwards when we saw, when we introduced more data, you start seeing things like common modules used at one client that are not used at the other. You're like, oh, this is actually used. I can't believe that. Like encode a command, invoke expression. So that should help if you're working on this. There's also some references at the end.

The bad data, so I found that certain algorithms prefer normalized event data rather than the raw numbers. So if you saw, we had specific counts like 16 and then we would have the cosine similarity be .05. Those are not normalized or standardized in any way, and a lot of algorithms require that. So basically, after when I was looking at the research, standardizing and normalizing those events are really good for any algorithm to work. So now everything's a number between zero and one. You take all those suspicious counts and you convert them into less, between zero and one, like I said. And this is probably the reason for the really poor performance in some of those

algorithms that I was showing earlier in the distribution. We didn't go back because random forest worked really well still, even when I normalized data, so, and we can really increase, we'd spend a lot of time increasing the accuracy for very little gain. So here in Scikit-Learn, you have the normalizer and standard scaler. You can look that up. Basically, we just apply that to the data set and then train and predict based on that data. Here's some links for that if you're interested in. I really just suggest looking at Scikit-Learn, especially if you're new. It walks you through everything. And really, make sure the data fits your algorithms. Like, you know, I'm kind of blindly saying

normalizing data is best. There may be some edge cases where that's not true. I'm not sure. But it's worked well for us. Probably the biggest one in any machine learning data science problem, not enough data, lack of diverse data. I'm in a decent position, not as good as Microsoft is, but we have quite a bit of data. So we're able to, you know, each model makes the predictions based on what we've previously seen. So of course you want to see as much as possible. Not enough data and really need diverse data. So like I had a good quantity from one client. When we introduce other clients, they do things differently. What's normal to them is

not normal to others. So we're able to kind of differentiate that, learn those features and apply that to everybody. And really, the way to do this, get more data. Beg, borrow, steal, or don't steal it. Generate additional data. Luckily with PowerShell, there's lots of ways to do this. Like I said, there's those PowerShell frameworks to do bad stuff that you don't need to know PowerShell that well. I'm pretty familiar with adversary techniques and tactics, so I'm able to kind of run this in a... virtual environment, generate the data that we want without much of an issue. But it's not that difficult. Also, if you get phishing samples, you can take a look at that. So

also even just looking at Twitter, we literally saw samples on Twitter. So they were pretty interesting. Oh, one other thing would be label, label, label. Make the labeling as easy as possible. We had a system where it was difficult, you know, I'm literally copying these features from one file into another and combining them, that sucks. If you have a system where you can review the event, see the prediction, and classify the event, that's great. And then use that for subsequent training. From the beginning, measure your performance. So I'm a security guy. I'm not a data science guy. I didn't measure hardly anything at first, probably nothing. I just thought, oh, it's classified most everything pretty well. Then I started getting questions asked like, well, what's the false negative

rate? What's the false positive rate? Show me a confusion matrix. Show me a rock curve. Show me all these things. And I started learning that as I went through. From the beginning, make sure you can measure that. And we're still working on that a little bit. Just have good measurements to say, okay, this is operating well. Because, you know, one of the questions I get a lot is concept drift. What are we classifying now as malicious that we may not have classified back then? That's hard to understand, and we take a look at that through kind of QA type activities. When we see things that are being classified from a sandbox, we take a look

at that if it's classified as benign, and we need to understand why it was classified as benign. So it's easy to avoid collecting metrics. This is probably, if you take one thing away, your ability to describe the value of the model is as critical as the value of classifying the events. Because if you cannot describe the value of this to people that are going to give you time and resources, then it's never going to be implemented. This has to be done and has to be able to show that this will provide value. And I'll show you how I did that. So for our use case of precision recall, far more important than accuracy, we had

a highly imbalanced set. One thing I learned from this conference was that, you know, if I had known about SMOTE back then for compensating for the imbalanced data set, that may have helped a little bit. We didn't really have that much issues, though, in terms of our precision recall being affected by this, though. So I'm not sure how much of an imbalanced data set causes problems in this case. measure everything from the start, make sure the value is easy to convey. Oh, and this last one. Codenames tend to stick around. So from the beginning, we did not know if this was ever gonna work. This was kind of a pie in the sky activity. My

partner said, "Hey, let's call it Project Blue Steel. "It sounds cool. "It's from one of the best movies known, "probably next to Casablanca. "So if you haven't seen it, go check it out. "You'll be overwhelmed, I'm sure." So obscure references are obscure, not everyone has seen Zoolander. So I learned this one the hard way because blue steel just kind of sounds cool if you've never seen about it, right? It's like steel, it's blue, you know, but no. It's stuck around though and you may see it if you take a look out there. But so here's how, really how I communicated the results. We took a look at our detection efficacy. We had an increase in

28% true positive rate. So we're catching more stuff with this. our false positive reduction was decreased by 99%. And that's because we're able to, we're not investigating every single instance of encoded command and every single instance of invoke expression. And we see that a lot, especially where people are orchestrating updates and things like that. That alone showed, said, hey, this reduces time, saves money, reduces analyst load, which time is money, right, in the case of a SOC. And so we're like, okay, let's move on this. So that was kind of the end of the POC. At that point we moved on it. We went to productionize it. We implemented collecting metrics. At first, like on

my POC, it was a batch processing. I'm just classifying a whole batch of events. We're not doing this in real time. Now we've got a real-time pipeline. We're taking a look at a... large scale pretty much all of our clients data and pushing it through this so we've got so much value from what ends up to be far end up to me I think being far less work than creating roles constantly updating those roles constantly white listing those roles constantly creating new ones trying to bypass rules things like that So we were quickly able to iterate and generate a robust model. It was easily updated, so we updated with new samples constantly. We had a

training, you know, a way to train the data efficiently. And we constantly monitored for false positives and false negatives, right? So we obviously take a look at any false positives that that come or any all the malicious samples anything classified as malicious being looked at anything that's being generated from a sandbox that's classified as benign is being looked at so i mean it's impossible to say i'm looking at all false negatives obviously it's one of the hardest things you can do but we're doing a pretty good job there And with this, we're raising the bar significantly for our detection. So it's far more difficult to evade this than change the capitalization on your script to bypass some case-sensitive rule that's in place. And improved analyst efficiency. So

that's really what this was all about. And so that's about it. There's emerging work in this area. So detecting malicious PowerShell commands using deep neural networks was a reference paper recently released, or an academic paper recently released. I'm not sure if that was the one that RAM had referenced, but it looked like it could have been. Revoke obfuscation is one where they're using kind of machine learning techniques to detect obfuscated code. And then also FireEye recently released a blog post that was similar detection mechanisms. So if you're interested in this subject, take a look at those. They all do, every one of us do things a little bit differently. Every one of us has had pretty good results and we've operationalized this for really two years and

it's been, we've caught nation state attackers, pen testers, really highly advanced threats across many locations. So probably the best work I've personally implemented. here's more resources right so uh i have the data set and jupiter notebook online if you want to go through and kind of just uh be kind to you know like i said i'm not a programmer or a data scientist but that's kind of a walkthrough that you could take a look at and do yourself um the data set does not include powershell code because i can't show that it's usually specific to the organization you might even have passwords or usernames in that in some cases we do our best to remove

that um invoke obfuscation uh revoke obfuscation from uh Daniel Bohannon, I apologize if I got that incorrect. Really, that's how to generate adversarial samples and how to detect obfuscated samples. If you want to take-- Palo Alto has a ton of encoded commands, thousands and thousands and thousands. So if you want to take a look at some samples, Let's see, Lee Holmes has a blog post on cosine similarity. That's kind of what got me down this road when you see this. That was hard to, using that cosine similarity to text obfuscate code very well. And then machine learning mastery, process for working through machine learning problems. A very wise man showed me this process. Here's how

you could go through from training, testing, showing metrics, things like that. And it's a good way to do it. So that's about it. Do we have any questions? The things you were classifying, were they an overall PowerShell script? Was it a line of code? Was it an individual execution within a line? So, there were either two things. They would be from the command line, just the command line process. So it'd be the binary and then all the arguments that you provided. Also, so that's just one of them. So it's an event, and we're pulling one field from that event, the very bottom one. Second one would be the script block log. So if you were to execute, like, let's say... PowerShell.exe, dobadstuff.ps1.

You wouldn't be able to tell if that's malicious through classification. But the code that's executed from that is stored in the logs. And they divide those up. There's like a limit, like a thousand character limit. And they'll divide those up. We take a look at those as well. So I'm not trying to be an asshole. And I don't come from a blue team. Thank you. I'm not going to say the first two lines that I said. But I get that admins need to execute PowerShell to do their job, but what if they're doing in a window and if you notice PowerShell being executed outside that window, then you just raise an alert. Well, that would absolutely be the best, right? Like if you would have, that's kind of like

a whitelisting approach. And if you're in a mature organization, that would work very well. But I find many organizations have a hard time even logging PowerShell. So to implement a policy like that would be extremely difficult and unreasonable in many places. So, yeah, absolutely, I would say that. Or you could even create a whitelist of people who can execute. Like, normal users may not execute PowerShell, right? Exactly. So things like that. So you could potentially do that whitelisting approach, but I find that to be difficult. And in reality, it isn't implemented that well. If you can do that, do it absolutely. All right. Any other questions? As far as detecting these malicious PowerShell scripts, is there anything that you've been working towards

to make the case for preventing them so that in real time that it would be stopped before the damage was done? So things like that are done. That's kind of out of my realm of concern. I'm a threat analytics. I develop threat analytics, right? So we have, you know, In reality, if you can do preventative measures like least privilege type stuff, we have advisory services that say, hey, here's what you guys should do. Do good passwords. Don't get exploited to begin with so they can execute PowerShell. I don't look too much. People have talked about signed PowerShell, things like that. I think that would help. But the only thing I really like to do is detect bad stuff. Yeah. Unfortunately not

a ton. But I'd be interested in what other people are doing there. We have only time for one more question. Any last questions? Hi. So you got data from the and also from script block. and for the scripts, it can be very long, right? So do you see any difference or whether it affects when you try to get the features? Or does the ratio look different from the data from script block versus from the command line? Not really. I haven't noticed that. Not to say that there aren't any. I hope that if they are, that they're being learned in the process because we have, you know, if it's intermixed. Also, anything you see in the command line is typically included

in the script block log as well. So if you saw PowerShell AXE with encoded command, you'd still see that in the script block as a separate event as well. So that would be included even if you were just looking at the encoded command. So, I mean, definitely the scripts are longer, right? You got huge, like, backup jobs that are very long. But those would also be broken up. So how you, whether you reassemble the script blocks or analyze them in blocks, kind of getting into too much... how we're actually productionized this and I don't want to get too far but you know you could reassemble the script blocks or you could look at just the

script blocks because one of the attacks you see is okay we'll mix lots and lots of known good code to to sway these types of heuristics but if you don't reassemble them then you know maybe there's a block that contains that potentially you could spread the code throughout the whole thing what I know is attackers don't do that yet so Doesn't really answer the question, but I haven't seen I haven't observed that myself. Yeah, thank you. Thank you. Thank you Yeah Derek Thomas D-E-R-E-K Yep, dot Thomas at E-S-E-N T-I-R-E I'll get them over to you. Hey, I appreciate you taking a look and reviewing this one. I tried to add the stuff you talked about.

Yeah, you did great. Looking forward to seeing what you come up with next. Yeah, we'll see you again. - So actually when I started I did very similar to you. XGBoost ended up being sort of my go-to website. I started there right before XGBoost. - Okay, you mentioned that and I was like, so we got something running right now and I can't, we have a whole team that does the productionizing stuff. I haven't fiddled with that yet though. I'm going to take a look for sure. You're going to be at the AIA Village? Yes. I'll be over there. I'm pretty much there in Blue Team Village where I'll be split. I'm volunteering Friday morning. Cool. I'll see you there. How are you

doing? Thank you. I appreciate it. Yeah. Yeah. No, she's going to start up in a minute though. I did. : We rely on us to do that. We've misclassified lots of stuff. So your algorithms sometimes are more good at handling that or not. : Oh, okay. What are you looking for? Yeah, this is the... This is something that doesn't work. This isn't unlabeled. Oh, yeah. Yes, that's it. Yeah, okay. Thank you. Yeah, sure. Yeah, good. Can you just be a phone? Yeah. Some people try to put things away. You should never put things away. Oh, come on. Do something. Cheers. So, do you, from my preference, are you going to use one of the labs? What's everyone else been using? Just this? Yeah, I mean, so it's like, if

you're going to want a brand new lab, if you want to stay at the podium, whatever you want. You can also use a lab from the podium. I probably better have the other one, because I'm going to try to stay still, but I tend to... pace around so I don't know what's going on here. So in the in the test room they've got it plugged into VGA? Yeah they switched things up in here. Why is this like two different things? What's going on here? Oh hang on, hang on. Right. There we go. Let's see. Does everyone just add their own connector in here? Do you know which mic she should use? Which lap? This should work perfectly unless you

want to walk around and stuff. So are we not doing the wireless? Some people are just standing here and just speaking from there. It seems like they feel most comfortable with that versus doing that. I don't have to be really close to it though, right? I'll move around a bit. Then we can put that on. I can promise to try and stay still, but I tend to pace around a bit. All right. Do you want to drop this down your shirt kind of thing? This part? Okay. I'll put my focus around here. Do you know where the switch is? I feel like I'm like in the donut or something. Yeah, okay. So all you're going to do is just turn it on. Let's see. Test. Go ahead. Talk.

Hello? Hello? Yeah, let's go. All right. There you go. Is it open now? It's open now. Are you going to turn the podium on or is somebody else doing that? I can do that. Oh, okay. I'll get the...

I like to do selfies with people. Yeah? Yeah. You want to do a selfie with me first? Yeah. Sounds great. You're all set. Do you need water or anything? Yeah I might. I'll get you something. All right. Thank you.

Good afternoon. And welcome to B-Sides. This is Layla Powell doing Can Data Science Deal with Pam? A few announcements before we begin. We'd like to thank our sponsors, especially our inner circle sponsors, Rapid7, our stellar sponsors, Amazon, Oath, and Simmel. It's their support along with other sponsors, donors, and volunteers that make this event possible. Now, as you're probably aware, these talks are being filmed, so please silence your cell phone. Now, if you have any questions, feel free to raise your hand and I'll come by with a mic. Until then... Take it away, Leila. Thank you. My name's Leila, and I'm a security data scientist from Panacea. And today I'm here to talk to you about PAM. PAM is Privileged Access Management. Essentially, these are the

tools and processes that allow security teams to control the assignment and use of administrative privileges. So why am I talking about PAM today? Why is it so important? Well, it's one of the top five critical security controls. And it actually went up in the charts in the latest version, up to four from five. But that's not the only reason. PAM is one of those things that's kind of captured the interest of the board. It can be notoriously difficult to get the board to buy into security and the importance of it. But for PAM, the concept of a super user who can wreak havoc on your business critical systems because you've effectively given them the keys

to your front door is one that really grabs their attention. It's an easy concept to understand. So now security teams find themselves thrust into the spotlight after longing for some attention from the board for many years and now they're trying to deal with PAM under time pressure and under scrutiny. So today we're going to reframe PAM as a data science problem and see what we can do with data analysis to help the security teams deal with this challenging area. Today I'm going to take you through a number of areas. First of all, PAM panic. Why is it so hard? Secondly, we're going to look at how to be a bit pragmatic with PAM. It's a

massive area, so we need to bring down the scope a little. Then we'll talk about how to do PAM in practice. I'll talk about what data we're using, what analysis we're using, and the benefits the security teams are getting from this. And finally, I'll recap a bit before we finish up. So, PAM panic. The way to understand some of the reasons why PAM causes so many challenges is to first of all start to look at what good privilege access management looks like. So let's have a look at what you need to do to do good privilege access management. All you need to do is: Determine approved access paths. Probably install a password vault or a one-time password system to reduce risk exposure. Monitor use of privileged access everywhere

at all times. Identify access patterns outside of those approved. Notify or re-educate users, probably also their bosses. Work out exactly who should have access to what when and remove all privileged access you deem to be unnecessary. And all the while, do not disrupt business as usual. So faced with this kind of situation, security teams can find themselves effectively in their own version of a horror movie. So why is PAM so challenging? Many reasons, but here's my top three. First of all, in other areas of security, the goal is to drive things down to zero. So vulnerability management, another challenging area. You want, in an ideal world, zero vulnerabilities. Never going to happen, but fine. That's

the goal. In PAM, you can't aim for zero privileged access. You have to have some privileged access so people can do their jobs, so people can patch servers or do other changes that are necessary. So it's particularly challenging because security teams have to walk a line. They have to find out what is the minimal acceptable amount of privileged access so that people can get their job done while still keeping risk low. And this leads us to the second reason why PAM is so challenging. Working out what that minimal acceptable level of privileged access is, is really hard. First of all, because it's really hard to know who has access to what. The way permissions are

assigned in large organisations can be infinitely complex. You have the hierarchical nature of Active Directory, you have groups nested in groups nested in groups. You have companies grown by mergers and acquisitions, you have different systems across the subgroups within the organization, different areas of the globe. People can get local access assigned directly for a particular job and never get it taken away. As one CISO said to me, we're really good at giving people extra access so they can be efficient at their job. We're just not very good at taking it away again. And this brings me to the final reason, well, of my top three, why PAM is hard. If we look at how permissions

are at the moment, you've basically got this kind of Jenga tower that's tottering around that all business as usual is built on top of. But from the business's perspective, it's kind of a if it ain't broke, don't fix it situation. Everyone can do their job right now. So why are we going to try and pull some of those Jenga blocks out and hope the entire thing doesn't fall down? The reason for that is from a security perspective, it is broke. Having all these additional permissions that people don't need is a massive attack, it's a massive security risk. So unfortunately, we do have to go and play Jenga. So how can data science help security teams

trying to do this? First of all, we can help with that visibility problem. I said it was really complex to understand who can actually do what. So by going and getting data sets from across the organisation, linking them together effectively, we can give some much more complete, accurate and timely visibility on the permission situation. We can also start to look at how we can use data visualisation to allow security teams to consume this knowledge better. So if you just have a list of permissions, it's quite hard to understand what that really means, what the impact in other areas will be. But if we switch to something like a graph visualisation, suddenly people can consume a

lot more information a lot more effectively. And finally, we can look at decision support. This is where we move on from saying, "Hey, here's a really clean, accurate, nicely visualized set of data. Go and do your work." To saying, "Hey, how about we run some more analytics and try and provide you with a prioritized list of areas to address?" So it's really starting to take away some of the manual work. So if this is a data problem, what are our data challenges? The first one is what I like to call "Do you even log?" And as with the first talk this morning, I'm going to just refer to John Hall's talk on opening the track yesterday to tell you why it's really hard to get data and get data

where you need it to be. And I'm not really going to say any more about that. The second challenge is the variety of data sets that need to be combined to have a good view on PAM. As you'll see later, I'll be talking about things like event logs, Active Directory, HR data, identity access information, and things like asset inventory or CMDB. Finally, when we talk about privileged access management, we're not just talking about devices. This is about access to databases and applications as well. Each of those layers of the stack has many flavors. We've got Windows, Linux, Unix, Mac OS. We've got different flavors of database. We've got all your in-house application GUIs. We have different ways of logging and different formats. We've got to try and

get all that data, which is kind of problem one. But then we've got to understand it, clean it, analyze it, and bring it all together in some kind of coherent picture so you can see really where the risk lies. And finally, we've probably got multiple AD domains as well. So that's just another potential issue to contend with. So, as you can see, the scope of dealing with PAM from a data perspective is potentially absolutely enormous. So, we have to be a little bit pragmatic in terms of how we're going to approach this. So, the two kind of pragmatic principles we're going to go with are start slow and start small. So, start slow. If you think about the three V's of big data, volume, variety and velocity, you've

already got a bit of a variety problem as I already talked about. So you've got enough of a challenge to kind of deal with that. So the suggestion, the approach we're taking in our suggestion is forget about velocity for now. You don't need streaming log events first of all. Why not just focus on getting the variety right, pick an update frequency for your analysis that's suitable. Something like a day would be fine. The second reason for forgetting about streaming and velocity, if you like, for now, is there's really two phases of applying security metrics when you're talking about the basics of having good cyber hygiene. The first step is when a security team first gets

visibility on an area. So there's a new analysis program, there's new data coming in, and effectively that's kind of like lifting a rock and looking under it. You know it's going to be horrible. So there's going to be a lot of cleanup to do. The second stage is after all that cleanup, you're kind of just trying to maintain your situation. You know, you've lowered your risk, everything's looking good, you're happy with your metrics. Now you just have to keep things ticking over. So when we're in the first situation, which we are as soon as PAM comes onto the radar for most security teams, you're going to focus on cleanup, which means you absolutely do not

want anything like real-time monitoring because it'll just be a bunch of noise. So here's yet another reason to just start with, say, daily refresh of your data because slow and steady is going to win the race in this case. So how about starting small as well? What I mean by this is focusing on a few systems that are important. So if you look at this very complex Venn diagram here, you've got systems you care about on the left and data you can get on the right. And basically where they overlap is where you should start doing your analysis. So an example of this might be suppose your company has, I don't know, a few payment

systems that run off Windows servers and you know that you can get the event logs for those pretty easily. Then focus on that, focus on doing this kind of analysis around PAM. on those servers. That's not to say, however, you shouldn't start to request the data you'll need to expand that to the Linux servers or to the databases or to the application layer themselves, because you'll probably be a long time coming. But the point is, don't wait till you have all the data to solve PAM across your entire organization before you start to do something. As many of the other talks have demonstrated, there's a lot of power in just doing a POC first of

all and proving value. So, systems you care about. The approach we take is to focus on business critical systems for this. Now, of course there's other aspects to privilege access management. There's always a risk of something like, if we put low priority systems out of scope, there's always a risk that attacker might come in through your aircon system or your IoT toaster and move laterally across your network. But we're not addressing that here. We're securing our crown jewels, if you like. And we had a talk earlier that talked about lateral movement detecting that. So this is not what we're looking for here. But there's still a bit of a problem. How do I even know

what my business critical systems are? If I had to ask you all to go back to your organization today and get a list of all the applications, databases, and servers that were business critical, how successful would you be? Because in my experience, the security teams I'm working with, they just simply don't have access to this information. And this makes their job even harder. How can you protect things if you don't even know what you've got? Trying to rely on a kind of static CMDB or inventory that often has to be updated by hand is really problematic. You get situations where you might start to look at some of the analysis we're doing and people will

say, "Hey, why is everyone logging into this production server?" And then you go to team and they say, "Oh no, it's a dev server. It's just misclassified." So, I mean, you can't get anywhere with this. Now, building a kind of smart inventory is an area that could be a talk in itself. So, I'm not going to cover it in detail. I've just linked a blog there by one of my colleagues. But here's a kind of quick intro to it so you can see what I'm talking about. Essentially, when you're doing data analysis for other parts of security program, you'll be getting data about your devices. And if you combine that data, particularly from tools that

have a different perspective on devices, you can start to get quite a rich picture of what's on your network. So, for example, vulnerability scanners will go out, scan a range of IPs, and might find things that weren't even registered in your device inventory in the first place. Then you've got network-based data. You can start to see what's talking to what. Maybe you can start to infer the importance of machines based on traffic to and from them. Then you've got things that sit on the endpoint. Maybe now you can get an accurate read on what the operating system is. Maybe you get something like a MAC address. You start to get more solid identifiers that don't

change all the time like IPs do. Essentially, this is a process you can go through to start to build up a better inventory or challenge the one that you have to work with in your organization. At least when you have this, you can start to go to business owners, application owners and say, "Hey, what's this server that sits in your region and your team is logging into on a daily basis? What does this do? How important is this?" Assuming you've got your smart inventory, or at least you phoned a bunch of people and cobbled together a spreadsheet, we can now move on to PAM in practice. PAM is a many-faceted topic. So what we've been

doing is working very closely with our early adopter customers, who we really consider to be development partners. And we've seen the areas of PAM that they wanted to focus on and prioritize first. And working with them, we're trying to understand how we can use data analysis to facilitate that. So the two areas that we're working on together is evaluating kind of what's going on now. So who is logging in to these business critical devices we've identified? And then discovering the art of the possible, where possible isn't normally bad, which means who has administrative permissions on these key devices. So let's start with what's happening. So kind of entry-level data to be able to solve this

problem, you're going to need your event logs. So I'm going to focus mostly on how we're doing this for devices, just in the interest of time and not reading out a bunch of lists. But just keep in mind that, you know, PAM covers all the other types of logins as well, not just device stuff. So for Windows, you've already had quite a few mentions of Windows events, sort of log forwarding, NX log, and things like this, ways you can collect this data in previous talks. You can also use syslog to bring these events in. Once you've got this data, you then need to identify the events that are important to you. And in our case,

it's successful logins. So with these we can use Windows event IDs to cleanly pull them out. For syslog it's a bit more complicated because you have to start to pass the event message itself. But anyway, once we've got that, we need to define what access is allowed. So we know who's logging in, we know which accounts are logging into these key devices, but are they doing it in the appropriate way or not? So if you remember I said right at the beginning that one of the key things for PAM is to define what allowed access paths are. how people should log in, from which count and from where. This is really something the security team

has to put in place. This is all about policy. So this might be where security team has installed a kind of password management vault and then you know that only the accounts managed by that vault should be used to access those systems. Or maybe there's a jump box has been specified or maybe there's a certain type of account we should be using. Whatever the rules that the security team has decided upon we can kind of codify those and then classify all the events, the login events that we see as kind of good or bad. The other thing we can start to do, if we bring in a little bit more data, is start to ask

who. So you can see what accounts are logging in, but who actually owns those? So kind of next level data, we can go and get identity and HR data and go back to that system inventory or Excel spreadsheet, whatever you're working with, to see who owns and maintains the system. The kind of things we can do with this is from an identity and access management tool, we can start to map actual humans to the accounts that we can see logging in. This is valuable for a number of reasons. First of all, again, one of the whole ways of doing PAM is to re-educate the users to use the approved paths. So now we know who

they are and we can send them an email and be like, "Hey, you should be logging in this way." We can also find out from HR database who their line manager is, who the accountable person is. Now you always need this, right? If you're going to drive change in an organization, you need to have someone to ring up and be and say like, what the hell is your team doing? So this is always important for things to actually work in practice within a business. The other angle we can go at is trying to see what team the person works in. We can go back, hopefully, to our system inventory and find out what team is

responsible for maintaining that device. If the person that has that account is not part of the team that's administering that system, why on earth are they logging into it? You can also use this the other way around as well and say, look at everyone in the team that is responsible for managing this Linux server. Do they all have the right access? Or one day when they come to do something critical, will they find they're locked out? And this is particularly important when you're making changes because remember, we have to preserve BAU. So what does the security team get from all that kind of piece of analysis? They're going to get information about all the log

in events to business critical devices via unapproved access paths. They're going to know the person that's using those unapproved routes. They're going to know their accountable manager. So they can start to produce reports and sort of retrain people. And they can start to validate if people on teams have the correct accounts and permissions to allow them to do their job. The second phase is looking at the out of the possible. Just because you can't see someone logging in, it doesn't mean they have administrative permissions that they shouldn't. Now we actually want to go and see who has local admin rights on these devices. The data we're going to need for this is groups and accounts, local admin permissions, the AD domain of those groups, because

in a multi-domain environment, you can't assume that the group name is going to be unique, so we need to check we're querying the right data, and also AD group membership data. In terms of the group membership data, we're going to go and query Active Directory directly and get a list of groups. You have a group membership list and you can walk the hierarchy to find out for each group what are all its members. Local admin permissions for Windows turns out to be fairly straightforward to go and get. Many vulnerability scanners have informational vulnerabilities that you can use to retrieve a list of local admins. Or there's things like OS query, or you can use endpoint

protection where you can query agent to find out properties of the device, like who has local admin permission. It turns out doing this for Linux and Unix is considerably harder. You can still use the approach of going in via an endpoint tool, However, we're now trying to pass out the sudoers file, the location of which differs depending on which distribution, which version of Linux or Unix you're using. And it's very complex. It's not just a list of who has permissions. It's got a complicated syntax all of its own where you can say this group can do this but not this, and this user can do that but not that, and this group and that user

but not that group and that user. And you can define aliases for groups and users within the file. So this is like a really messy kind of data cleaning problem. But assuming you get there in the end, we can combine those two sets of data somewhat like this. We've got the data from the local admin or sudo's files, so we know which group is directly assigned permissions, and by looking at the hierarchy we pulled out of AD, we can see who else has inherited permissions. And if we do that for multiple devices, we start to get something that looks like this. This is just an example, but it's very much representative of what we see

in real data. Although this is not the most complex example I've seen, that would just be basically millions of lines. So it's pretty messy, right? You can see that there's only three devices here and already we've got an absolute chaos of lines and group memberships. The kind of things that the security teams are trying to clean up are as follows. This is just a simple example to show a few specifics. You've got a directly permissioned account here. This is going to be bad because it means to have any visibility that this account has admin permissions on this device, you need to go to the device itself. The same is true if you want to revoke

that access. The other kind of problems we see are accounts with multiple paths to privileged access. Because of group nesting and multiple group membership, certain accounts have multiple ways to get the same permissions. And in real data, we've seen examples where one account can have 20 to 30 different routes to local admin on the same device. This is problematic because it's hard to manage. And also, if you think of that cleanup mode I talked about, every one of those paths is almost kind of like a wire in a bomb that you're trying to clip as you go along without exploding business process somewhere else because you suddenly shut off someone's access to something they needed

to be able to do. This is a good start. What the security team gets from this analysis is they can see all the paths to local admin on business-critical systems and they get a list of all the permissioning problems that they want to remove and their location. The challenge here, though, is the scale. How do we find all those permissioning problems in this? But for every system, remember there's only three devices here, and without impacting other systems. So if you think what we're doing here is using graph visualizations to say, for these given devices, show me everything that has local admin. And you might see a group that doesn't look very important. We'll just take

that away. But out of your view of your current kind of data scope, there's a system over here which you've just shut off access to. And tomorrow you're going to get a bunch of angry phone calls that no one can log into their production database. So there's a real challenge here, which is one we're still working on. We can't show the entire graph of permissions in one go because it's simply not human readable, but we need to come up with a way to try to give context about what else is happening. The next steps in this and things we're working on at the moment is trying to further help security teams untangle this. Essentially, it's

trying to take them away from having to manually view all that lovely graph data. And this is where we get onto decision support part, which is a work in progress. One of the things we're looking at is a what-if analysis. Can we provide effectively some kind of sandbox where people can play around with the graph without actually touching the permissions themselves, and they can see the impact of changes they make? We can also start to look to standard graph theory. We can apply pathfinding algorithms to find shortest paths. We can find examples where between two nodes, an account and a device, there's many, many different connections. So we can highlight these, these can be removed

first. We are also investigating, I also want to look into if we can see whether using techniques like clustering of users and groups and the ways things are permissioned can identify outliers for us, so we can start to see if certain things jump out. Basically, we can prioritise these so security teams have a bit less manual work to do. So, what should you take away from this talk? First of all, access to data via instrumentation is essential. Check what you can get your hands on first, ask for more, but don't wait to get started. Don't boil the ocean. That's probably my number one piece of advice for any data science problem ever. Try to focus on, say, daily reporting or whatever time scale seems appropriate for you, and

focus on your business critical systems. Try to demonstrate value early, and this will make it much easier for you to justify getting more data. And finally, PAM is essentially a bit of a data jigsaw puzzle. You can do quite a bit with only a few pieces, but the more pieces you put in place, particularly in terms of bringing in business context around application or device criticality, identity access management information, HR, your device inventory, the richer the picture gets and the better the security teams are able to prioritize and clean up this challenging area. So I'll leave it there. You can find me on Twitter. I'm around this afternoon and happy to chat. And also if you have any

feedback or suggestions or comments on how you're doing this yourselves, I'd be really interested to hear them. Thank you. Does anyone have any questions? We can take maybe one or two. So you were looking at pathfinding algorithms. Have you looked at minimum cut? No, not yet. So this is really, this is kind of the next steps to look at. Have you talked at all to the Bloodhound guys? No, so I think Bloodhound's an interesting one because they're doing something... although it has parallels it's quite different so i guess they're looking really specifically around um in the active directory kind of hierarchy we're just looking at membership basically they're looking i guess in in a lot of detail around how you can add users to groups how you can

move across the active directory tree and elevate privileges so i think that's not really the kind of angle we're coming at but um certainly some of the the customers we work with use um bloodhounds in parallel with what we're doing here And what were you using to visualize the graphs? This one was, I'll have to check. So we're trying out a bunch of different libraries. So I think this one might have been Cytoscape, which is an open source project. I can check with our front end team. Yeah, we can figure it out. All right. Thank you, Layla. Thank you. All right. your food is waiting for you in some crops oh amazing thank you so

♪ ♪ ♪ ♪ ♪ ♪ ♪

BSIDESLV 2018 - Ground Truth - Day Two

Related talks