Adverse Engineering

BSides Belfast · 201843:47123 viewsPublished 2018-10Watch on YouTube ↗

Speakers

Phoebe Queen

Tags

StyleTalk

Show transcript [en]

hi can I just actually know I was gonna was gonna ask if whoo how many people here involved in any sort of offense related stuff or visit okay cool yeah so I'm going to apologize sorry I'm going to introduce myself first I'm a pen tester with NCC I've been with them for two and a half years I mostly do stuff relating to AB SEC I do some hardware and embedded stuff do a little bit of automotive this is probably going to have some bearing on the rest of the talk I'm not actually a formally trained engineer but I'm striving that way because I think it's quite important and I think there's a degree of there's a there's a need in

pen testing and particularly offensive security to increase the quality of engineering perspectives in the way that we approach offensive work and I hope it was also going to feedback into better defensive mitigations and responses in general so I'm gonna start off with a short and fairly unfair caricature of pentesting based on some standards that we have for a pen test looks like and I'm going to make the case with my adverse engineering sort of perspective on things I'm gonna walk through some work that sort of kicked me off down this track there's been really inspiring me I'm gonna do a walk-through of how I've applied it I've got a disclaimer here first of all this is very subjective

this is very much my own opinions and not any sort of official position statement from my employer and the perspective that I've got is from day to day just doing pretty much completely offense so I am very aware that anyone working from a defense point of view probably has a very different perspective on things and that there's a lot of things that I don't necessarily have as much consciousness of when I'm sort of developing my ideas in this respect I kind of don't want to tread on people's toes in that respect and I'm just very aware that I've got blind spots because I just break things and other people have different priorities so from in terms of what a pen test is

we've got this classic pen testing methodology model and it's a standard that sort of incorporated into crest into OCP standards into ch and all the rest of the various different accreditation bodies and standards none of them in particular I'm not calling them out particularly it just is a part of all of them and there's this idea that we've got these stages that are scoping intelligence-gathering potentially threat modeling enumeration exploitation post exploitation and then reporting and is the sort of lifecycle that has certain flaws so the scoping stage is typically there's a degree of wishful thinking involved the client doesn't necessarily as was mentioned by previous in a previous talk attended today the client doesn't necessarily

always know what's involved in their system so the scoping procedure is often theoretical because it's in the absence of the system itself usually then we've got intelligence gathering which is often using search engines and aggregators and various other things relating to social media we've got theorizing about the potential security environment which we haven't looked at yet typically we've got scanning things we've got rummaging through not just exploit libraries as in things that you can download but rummaging through the sort of things that we've exploited in the person that we know are potential issues and trying to apply all of that information that we've an experience in expertise that we've got to the target and then once

we've breached it doing build reviews and then telling the client bad news so I'm claiming here possibly being slightly flippant about but I'm claiming that this is essentially from the technical security point of view this is goofy cyberwarrior stuff and we're focusing more on the idea that we're attackers and less on how do we solve the problems that we need to solve to get from outside of the system to inside of the system to the resources and assets that we want and that we're focusing more on the sort of life cycle of an attack and a threat and a kilcher and this comes back from the Lockheed Martin cyber kill chain concept which is is absolutely critical if your

threat hunting and trying to understand how people persist within your networks and all of that stuff and it turns pentesting into a sort of cyber warfare exercise but it's it doesn't really inspire us it doesn't create guides to how we actually we don't have this is this is focusing on the game and not the technical challenges that we're actually facing although it is nevertheless a really good way to abound scope for testing into the sort of commodity units that can be sold it's not necessarily useful those testers so security particularly from an offensive point of view is decided by your capability over the target system it's not decided by the lifecycle processes in particular and the system this idea of the pen test

life cycle is descriptive of what happens in general but it doesn't prescribe how we should go about doing it and we also it leads us to not necessarily have a great bigger picture about about the connections between the different silos that we have so we have in-person testing there's typically people who are in people or people who are web people or people who are involved in IOT an embedded stuff and hardware and we have these different silos but we don't have an idea of what it is we don't have good common ideas that we share with each other very effective about what it is that's common in the ways that we've developed the information that we've developed through

hacking into things and getting some sort of bigger picture of that is kind of critical because there's new technology coming out all the time and we need to continually be innovating finding ways to get through things we need a bigger theory for dealing with the all of the house the how am I going to do this that we need to get through so how do we create new tools how do we investigate new protocols how do we review build security in places where there is absolutely no best practice and we're seeing new frameworks coming out all the time and understanding how we go from this is something that you know you end up with requirements saying we need

someone with five years expertise in this two year old framework that you need to be able to bootstrap from the information from the framework not even have existed having existed very long to having an ability to act on and hijack that system or understand the ways that it could be hijacked in order to protect it and I think there's a there's a parity here between offense and defense that I'm hoping is kind of clear and also we need to also get to the root causes of vulnerabilities so a typical example that comes up a lot is XSS or sequel injection and the idea that what we need to be doing is correctly filtering all of the bad stuff that is

potentially going to result in a thing being rooted on and and and if we if we actually step back from that sort of thing and we understand it on an engineering level it's not and and this has become clear in the years since sequel injection first started being found the issue isn't that we need better bad evil filters we don't need to find malicious inputs what we need to do is create systems that we can trust to handle any input regardless in a valid and safe way which I'm going to develop on a bit later so a couple of sorts of scenarios the perspective that I'm putting forth here of like all of this high-minded philosophical stuff about

pen testing there's quite resource-intensive frequently there isn't time for it on a on any given pen test occasionally you have a few spare moments to work through some of this stuff but there are cases where specifically pen testing type assessments do lead to this sort of thing specifically and this case is where we find oddball projects new technologies where we have to validate a third-party vendor assurance claim in some way by trying to scrutinize and look look for gaps in things that they might have missed trying to go beyond compliance in cases where there's like an ongoing long-term business risk and you're not just getting clients who want a tick on the box kind of test where

they're willing to invest and have a much deeper looking things for instance like in M&A there's a really big example of that and also generally in typically in research scenarios where you're just trying to look for new things that some look at look at look at things in a way that people haven't looked at before and trying to develop new ideas about how things might be able to fail so from an engineering perspective I've completely curbed this definition from the American engineer's Council for professional development because it was the shortest in clearest example that matched what I'm trying to argue here engineering is the creative application of scientific principles to design and develop structures or to construct or operate

the same with the full cognizance of their design or to forecast their behavior under specific operating conditions so and that's what pen testers before we get to the reporting stage that's essentially what we're trying to do we're trying to under and the nature of how the system works we're trying to build things in it is essentially a hostile environment and we're essentially developers up until that stage where we need to report what our capabilities are and how those could have been mitigated so my argument here is that we can try and deal with the challenges that we come up with by applying scientific principles by making experiments and constructing tools in order to build on the clients or the

targets or whatever environment and for me this is this is more like stepping back from the whole thing like an attack of thing which is such a big cliche around how do you hack things you think like an attack like what does an attacker think like and why does it matter what they think like you know what attackers were unsuccessful and how can we do better than them I'm putting here that thinking like an attacker is probably not as helpful as just getting on with being engineers and developing capabilities and we've got this because essentially we're in some this is pointed out by Sergey brightest a few years ago a couple years ago we're scientists we develop proofs about

unexpected behaviors of systems that are different from the shape of the thing that the developer thought that they built we we come up with hypotheses we make experiments and we prove true or false whether a system is exploitable in a particular way well this is something that's a standard part of what we do it's not a world departure it's just something that we don't really have a formal formalization of and we don't necessarily formally think about it that way so one of the things that really kicked me off in this direction thinking about things in this way was was seeing this talk from Trey forgetty in DEFCON a couple years ago where it turned out that there was a major

incident several regional 9-1-1 networks got taken down by a trivial cross-site scripting attack in an iphone which was put on a prank website and then shared over social media and the interesting the really boring thing is how the actual actual exploit worked it was just a her all field dial this and all this phone number and click that link there was embedded in the page the really interesting bit was that actually the reason that the phone networks came down at least according to Trey was that there was a lack of ongoing provision as the as the number of users and the phone networks had increased they hadn't actually brought up the level of the number of nodes in the system they

hadn't brought up the ability to sustain the amount of traffic that could potentially be driven by a bunch of users and what we've got up in the top right hand corner is this is erlangs beef there Erlang B formula which says this is the probability of network blocking with a certain number of nodes in the system and you can actually calculate you know this is this is how we predict what we need in order to you can invert the formula and you get how we predict what we need into in order to resist a certain scale of attack and that's that sort of wider scale thinking trying to think in an engineering sort of perspective rather than just going oh

well it's an XSS we can fix that XSS will just like cross off that URL handler or make sure that it can't be clicked from JavaScript it rather than doing that we need to be trying to elevate and think what is the bigger picture going on here from an engineering point of view why are these systems failing at what's at what scale of protection and mitigation can we win the game another area that's been a massive inspiration for me has been cryptography crypt analysts and cryptography engineers credit analysts people who break codes for anyone who doesn't know cryptography engineers build codebreaker build crypto systems and there's been a mutual language over the years or built

up out of these fundamental attack models such as this is in CPA which is indistinguishability under chosen plaintext attack you've got in CCA one and two which are indistinguishability under chosen ciphertext attacks they're two different types hence the numbers we've got models of data integrity and non-repudiation but the really important thing that having that formal language gives is that you've got a formal understanding of what sort of protections are appropriate to which sort of contexts and what you actually gain or what you can trust which is really important because as is pretty notorious trying to develop any sort of crypto is a complete minefield it was absolutely necessary for them to develop this language of attack models

and defense models in order to be able to actually reason about the systems properly another big inspiration for me in the last few years has been Lang Tech which is essentially taking fairly basic link computer linguistics and programming language theory and taking insights from that to start saying how do we deal with bad data how do we deal with bad input and when can we know that we're going to win when can we know that we have a chance at winning against dealing with an arbitrary input so and the outcomes of this are things like no shotgun passes which is to say you validate everything and then you do all of the processing that you might be

doing you do not do those together as a mixed process throw out pastels law which is the which is the older computer development of software development principle that you accept input liberally and you send out output more conservatively what's been shown is that if you accept input liberally if you accept slightly invalid data that you're actually just shooting yourself in the foot and giving your attack as much much more options for bypassing any of your filtering any of your attempts to sanitize or protect yourselves from malicious inputs also the there are there like basic I mean the basic things from cheering error that if you have an input language that's cheering complete you cannot know you cannot do enough

computing on that if you have an input that's even deterministic context-free which is slightly less complex linguistically in the in the in the Chomsky hierarchy which is a it's a set of computer linguistics concepts but if you have a certain degree of program ability in the input data that you're trying to pass at some point there you cannot ever guarantee that you will not be that you will know how that ends up in your system and that's a really really important thing to recognize if we're if we're trying to make systems that we can trust essentially and it also develops this idea it would and kind of off to the one side because it's fairly vague and informal for a lot of

the early Atlantic stuff but it develops the idea of a weird machine and a weird machine is the machine that happens to be all of the unspecified behavior that is latent ly living in your system that you didn't necessarily deliberately put there but you could kind of put there by accident and then hackers go and program that weird machine rather than the Machine you intended to invent so the final example I'm going to give before I move on to my how I applied this whilst hacking things is this paper by have a flake Tomas DeLeon which is called weird machines exploitability improvable and exploitability who he takes all of the finite state machine theory from like typical

computer science education and starts formalizing the idea of an exploit and formalizing the idea of the intended machine and actually implemented machine in the concrete implementation and starts formalizing our understanding of which is an understanding that exploit developers have had for decades this is not it's not like all of this stuff from the philosophical point of view all of this stuff isn't new ideas it's and he says in his paper as well all of this stuff isn't new ideas it's it's more that we don't really talk about each other about what we're doing at this sort of scale typically we talk about you know well I've I've bypassed a SLR by linking a pointer from someplace and then doing

this and that and it's very very low we're very close to the bone where we're dealing with that stuff and we're not really communicating in what's happening at higher scale so that we can make more fundamental changes and understanding how we find those sorts of issues on a formal more fundamental level so key ideas that I'm putting across I'm hammering these sorry maintain an interest in the theoretical fundamentals of your domain so don't just one of the things that I've heard a lot of people working in security say particularly specifically people who went into computer science formally is oh well I've never used any of this you know I why do I need to know about finite state

automata or you know all of these other things that these abstract theory things that it's just not relevant to breaking into a system or security in a system or reviewing firewall rules and it isn't directly related to that stuff we have to make those connections and we have to make those connections at a big enough scale that we can make better judgments about how to fix them rather than fixing one firewall rule if you have to if you have the resources to do so if you have the time available which admittedly it is pretty rare but it happens try and go beyond looking for the existing solutions for unsolved problems trying try and find potentially new ones

try and develop your ability to understand the problem bitter but a bit better start making experiments for it and keep you keeping yourself healthy skepticism about we all have these received understandings of how things work right and those need questioning on a regular basis because it's a big place where all of us are prone to fall down on from time to time because there are plenty of things that just shouldn't work and we're in the business of finding out what shouldn't work but happens to really actually work and then fixing it and making sure the machines are more trustworthy okay so what marketing for is stepping back from the vulnerability first point of view here because

vulnerabilities are sort of a long way down this chain they're not what we're starting with and if we start with looking for vulnerabilities then we're kind of closing our minds off to all of the other ways that vulnerabilities people hadn't really thought of yet and we need to be a bit more creative we need to test them and we're looking for if we're thinking like engineers we're looking for ways to build bridges across gaps we're looking for ways to build bigger levers we're looking for ways to like often it often simply the difficulty of operating within a system that you've got a limited understanding of is difficult itself and being able to build that ability is on a short

timescale is really really useful so I was quite irreverent to start off with about the core model for pen testing there's lots of cake most of the cases where it's it's not really possible or relevant to go into the deeper computer science type Theory stuff I'm not trying to slam anything really but I'm trying to motivate argue that there's there's a real motivation and there's a real need for us to be taking a bigger picture look and looking across looking at bridging gaps between different information security silos and trying to understand the sort of things that that join different fields together a higher level so my example from my own experience once upon a time I was

investigating an IOT gateway device it was based on Java firmware system with some proprietary JVM there was the device itself bridged to some back-end network over GSM and the back-end network was on a private APN which is private access network so from a capability of point of view owning that device was interesting but bearing in mind it's a gateway to another realm what I really want to do is be a bit more grandiose about things and see if I can take it a bit further so being a GSM device it has an SM for connecting to a private network I don't know what's on there are other routes to own all of the other devices that are also connected to

this network is the backend interesting in some way I've got questions and I need to find out how do I get answers so I've got a device the device is somewhat restricted can I build a device that will do the things I want to do and yeah pretty much you there aren't any freely available ways to plug in a sim into a mobile dongle however you can just solder wires between them it's just a hot iron and a bit of time fiddling with some tweezers however in order to get through to the APN you need Preds and this is all kind of typical pen testing stuff you look through the look through the Java environment for

something you can exploit to find the creds you look through data on the a sim for creds you sniff dates around the buses for creds you and then eventually in this case I found that I could get I found file chunks zip file fragments which if you're familiar with Java then obviously Java jar files are in zip pkzip format and I've found my j2 embedded data so at this point is where I hit my brick wall in the usual methodology which is where the rest of the talk becomes relevant the BIM walk yields corrupt garbage or incoherent decompression which is to say been what immediately fills when I unzip things using bin walk when I try and unzip the

fragments of compressed file I my hard disk immediately fills up because the corruptions in the deflate stream completely hose my hard drive and don't actually turn up in a useful data we can see that the data is fragmented shuffled out of order into regular size packet pages but there's no common file system in indicators that being an embedded system there's a custom operating system there's a custom file system it's writing the files in pages but we don't really know much more than that and as a hell of a lot of reverse engineering to try and figure out how the file system works I can't even find any page tables that's starting to look a little bit sad

from a pen testing point of view so I've got a demo here excuse me

this is this is me running BIM walk on the dump on the memory dump and you can see all of this zip archive data which is lovely it looks really promising you try and decompile one of the class files oh nothing really really sad so so been walking it didn't really work out so at that point I go well I'm really really keen to try and sort this out so data storage theory zip files if whether you're I don't know if anyone's in to like file forensics and file format type stuff but zip files are really unusual compared with a lot of other compression formats in that there's lots of redundant data internally there's lots of internal

metadata they self reference they tell you things about what's of operation at operating system it was zipped on one different parts of the zip file point to each other and this structure forms a sort of a tree which is going to be useful later I've completely at this point given up on the on reverse engineering the file system from some kernel drivers in the memory dump somewhere that there was a lot of there was a lot of obvious code there but it was going to take far too long for me to do anything useful working that out and also at this point I realized this is actually quite an interesting sort of puzzle solving problem I've got data

that tells me at least pieces of information about what how the file ought to be structured I have the file fragments of file that have been shuffled out of order and if I assume for argument's sake that I've completely lost the file system information can I use that information from within the file to start piecing the file back together into the correct order and decompress it so here's a picture of from a shell Bhutanese cocoa me project is like really really if you want to look at any file format there's these amazing diagrams in Ana's github which will walk you through all sorts of different file formats but in particular this one's a zip file and you can see it

starts with a series of local file headers and file data and then you have a directory which is convenient and then at the end of the file you have the end of central directory record which is a pointer back to where the where the directory starts so you have a sort of listing for where different parts of the zip file is supposed to be which is pretty useful looking at the dump itself you have these different pages and I say pages I'll just skip ahead a bit here you can see this is sort of an example of what a page looks like so a page has been written at 800 400 and that page ends at

800 800 this is a one kilobyte page and you can see here that it's really obvious that the entire data type like these are two different fragment 3 3 different fragments one of them is ending one of them is fully visible here and one of them is just starting and those have been shuffled out of order in that way and we can see that like the page size that this system has been written in is a 1 kilobyte page size so if we start looking at these pages in the in the file system we see things like this we've got like pages that are just central directory box we've got pages that are bits of deep compressed

data stream with local file headers spread through them we've got a couple of end of central directory records that in there and so on so we can see for instance the second one from the bottom is where the central directory starts we can recognize that because it's where the data ends and these things happen up one after another and that's where it happens so we start building a model of the file in little pieces the we're going to start with the end because the end of central directory happens only once in a file and that at least tells us at the very least how many files are we looking at in this dump it's also got

a very easily searchable magic value PK 5 bytes 6 byte so we search for those it also tells us the size of the central as well as telling us where the central directory record starts it tells us the size of the central directory so we have like us bird's-eye view of what the files going to look like and we search through the system and we search through the dump and we find two of these which means we're looking at two fragments there's a little bit problematic because the internal pointers within the zip file in the central directory which is like the library of files the internal pointers are only relative to the start of a zip

file so if you have two zip files you have two starts of zip files you need to disambiguate between those two different groups of file fragments so that you can reorder them correctly separately from each other so how do we classify them there's a classification problem it's I hesitate to say machine learning someone said machine learning to me when I explained this to them a previous event but we have classification vectors that we can take and turn this into a classification problem and use the k-means clustering algorithm which will take a bunch of data it will take the classifiers and it will separate those into a pretty reasonable estimate and you can see here what it does is it puts

a value puts values randomly within the data set and then moves them so that you maximize the separation of the classification that those lines are splitting the data in this example that of crib from Wikipedia and those lines move until it more accurately represents the data set as a whole you don't really need to know this because you can just plot figure out what the classifiers are and plug them into k-means anyway it's just one in the SyFy library you can just import module and crack on with it but it does help to understand how these clustering algorithms work to the extent that you understand why they're going wrong when they do go wrong so we've got lists

of the separate central directory headers we've got lists of separate headers in general but in particular we've got lists as the separate it's the two separate lists of central directory headers that we've clustered based on the things like the compression flags or the timestamp or things like that that are going to be they're going to be distinct between two different builds so in order to reorder them we also in those headers I don't know if it's the text is too small but we've got the relative offset of local header and this isn't a part of the pkzip specification but unless pretty much necessary part of writing a zip file if you're compressing a bunch of data into a zip file format

is that you're writing it from the start of the file to the end of the file you don't typically you can but you don't typically and a lot of file formats stunting involves not doing this but typically you're going to be starting with the start offsets and working your way through adding more data on as you go so we can use those those local file offsets as a correct ordering for the central directory headers there's like three information there and by taking the pages that we found those CD headers and then reordering those pages we get something like this we get a fully formed central directory back we get the end of central directory at the end of

it because all of those pages that those were on are correctly resequenced for their relative file we don't know what's going on in the rest of them there's a lot of gaps for the rest of the sort of skeleton model of a file we're building up but we're getting there so the local file offsets point to corresponding portions of the file that we haven't filled in yet and we can find the local file headers for those central directory headers because they pretty much have all of the same data this is the redundancy feature I was talking about with respect to zip files all of this stuff we can just we can just make what we expect to see in this local

player header and and then do a binary search through the memory dump for that and then put it where the central directory header says that local file header is expected to be we don't just put the header in there we put the whole fragment that we found the header in and we end up with a picture more like this we still have gaps but we have the benefit now of any of that data that was broken or corrupted by being on a page boundary is now lined up nicely so we've recovered the stream in between there we've recovered quite a lot of the data because we've got the run-on effect between different fragments that we've

been able to put together we're left with only a few holes in the overall dump so I'm gonna just okay so here's the scripts that I wrote that automates all of this stuff it's searching for the different headers you can see it's found 924 it found roughly a hundred sorry roughly a thousand sorry 11:26 there and the other one we've got a percentage recovered of about seventy-eight percent and I'm a little 70 percent in the other case so I'm gonna start unzipping those and start looking inside of them and here we go so sorry that's kind of skipped some of the stuff there but inside there there's a configuration file and in the configuration file

there's some general-purpose creds for and that was that was essentially what I was looking for there's also a lot of other data that had been reconstructed usefully and that's something that just wasn't there when I was trying to just bin walk the file and unzipped the fragments as they were so and that's that's like start to finish that's my goal one with some clustering algorithms and that's my that's a job done with some clustering algorithms and understanding how a file system is put together and how it works there's some limitations to the solution which is on github I can't recover any data segments there's bigger that where there's a gap bigger than a two page size worst case

is pretty bad but it does recover a lot of the data in the cases where that I've been looking at I haven't made use of the fact that there's also CRC data so I could hypothetically be brief while searching for CRC for chunks which could be fitted into holes and then make the CRC values validate there is a problem with that potentially in the long run for bigger chunks because you end up brute-forcing a series of chunks that might turn out to match the CRC value but not actually be the correct ordering of chunks in order to decompress to anything useful so there's like benefits and downsides to taking that approach and I didn't really go there because it

wasn't that helpful so anyway that's pretty much everything for me does anyone have any questions or

that's that's a marketing issue I think like from like from so I've kind of taken for granted that the issue I'm really looking at solving here is the technical issue of how to how to deal with unforeseen issues unforeseen situations and environments and building good solutions in the building quality solutions and quality advice and guidance I think I think strictly speaking this sort of methodology that I'm proposing is much much better suited to more in-depth reviews and whether it's called a pen test or not it it's more suited to time when you've got more time available and investment from the customer so like I said before M&A type situations where they want to know that

like not just is this safe from all commonly known vulnerabilities right now but how much risk are we at five years from now because of the general quality of what's going on and you know what sort of things won't my we've been nipped by for instance that sort of thing whether there's a there's a big motivation to do a much deeper level of assessment going on but again I think I don't think what it's called really matters I think it's I'm mostly just making the case that actually computer science is really really much more useful than people give it credit for and that a lot of people lot of people in InfoSec have got some some sort of computer science

background and possibly a kind of doing themselves out of the benefits they could be getting from applying it

you [Applause]

Adverse Engineering

Related talks