BG - From EK to DEK: An Analysis of Modern Document Exploit Kits - Joshua Reynolds

Name: BG - From EK to DEK: An Analysis of Modern Document Exploit Kits - Joshua Reynolds
Uploaded: 2019-10-17
Duration: 39 min 41 s
Description: BG - From EK to DEK: An Analysis of Modern Document Exploit Kits - Joshua Reynolds Breaking Ground BSidesLV 2019 - Tuscany Hotel - Aug 07, 2019

BSides Las Vegas39:41187 viewsPublished 2019-10Watch on YouTube ↗

Mentioned in this talk

Tools used

Adobe Flash Player Equation Editor Microsoft Word Taskkill Wireshark WScript

Frameworks

Protocols

Malware

Concepts

Vendors

About this talk

BG - From EK to DEK: An Analysis of Modern Document Exploit Kits - Joshua Reynolds Breaking Ground BSidesLV 2019 - Tuscany Hotel - Aug 07, 2019

Show transcript [en]

good morning and welcome to besides Las Vegas breaking ground this talk is from ek to de K an analysis of modern document exploit kits by Joshua Reynolds and in his bio its list there's a URL listed that's not actually associated with him so just to make that correction there a couple announcements before we begin we want to thank our sponsors especially our inner circle sponsors critical sac and Val amel we also want to thank some of our stellar sponsors blackberry silent secure code warrior and paranoids a couple other announcements this talk is being streamed live to youtube so if you have questions please raise your hand I'll bring the microphone over so everyone can hear you if you have a

phone please silence it as a courtesy to speaker and everyone else here and with that I'll turn it over to you thank you thanks so much everybody can hear me okay yeah cool yeah so welcome to my talk thanks so much for coming I really appreciate it so let's get started just to give a brief background about myself my name is Joshua Reynolds I'm a senior security researcher with CrowdStrike I've previously presented at a number of other 'besides events and RSA on various malware areas mainly ransomware malicious documents encrypted Jackie Miller I have to say it's a real honor to present at be size Las Vegas I met Jack and Chris and besides Calgary they

came up for the very first besides Calgary when I was there a couple of years ago and and it's a real honor to be here I'm also the co-author of a malware analysis course that's taught at a local college in Calgary Alberta Canada and I'm also the co-founder of YAG sack which is a local security meetup in Edmonton Alberta Canada and if any of you want to talk to me directly there's my Twitter handle there okay so a number of you might be asking yourselves what our document exploit kits so this is a term that we've coined for a number of documents that we've seen contain multiple exploits embedded within a single document so in relation

to the term exploit kit that relates to traditional exploit kits which typically try to exploit the browser environment so a couple of popular and well-known exploit kits are regen angler and essentially how they work is they'll fingerprint the browser environment in order to assess whether or not there is a vulnerable version of a plug-in or the browser itself in order to deliver final exploit and typically that exploit will download and execute a malware binary in some way shape or form so the document exploit kids that we're going to be talking about today specifically are called thread kit and venom kit these are two prominent document exploit kit families that we currently track in the wild they are RTF documents or the rich

text format and we'll go into what that format looks like and again they contain multiple exploits within a single file and upon successful exploitation there's a number of different requirements for code execution once successful exploitation is met so they perform different what we call infection chains I'll be going into how those look as well and probably what's most interesting is is they've started using whitelisting bypasses in many ways shapes or forms that you would commonly see red teamers or penetration testers use so I'll be going into how those look as well and we've observed them distributing what we would consider more common malware variants such as foreign book as role loci bought and net wire

but we've also seen thread kit specifically being used in targeted attacks so we track an adversary called cobalt spider you've probably heard of the cobalt group or the carbon AK gang so the cobalt spider group is the one that was specifically using threat kit so just to give kind of a brief example of what we see in the wild with these documents again there was a campaign that was conducted by Kobo spider they were posing as the European Central Bank and they were using thread kit to draw a our variant called cob inte coggan is a modular backdoor that basically downloads a second stage which then requests numerous other modules that do things like takes screenshots of the

desktop and they use that to establish a foothold within network environments when they once they compromise them so this is an example of an email essentially they were social engineering the user into thinking this was a change in transaction rules and they provided a link at the bottom to a thread kazakian [Music]

okay thank you this is my odd request it's the I peanut M&Ms so if anybody wants them yeah so basically they provided a social engineering message to try and seem like they were actually a part of the ECB and then they provided a link at the bottom to a thread kit document which once one successful exploitation was achieved a download and executed that cog in binary so just an example okay so before we get into how the thread kit and benefit documents function themselves I just wanted to provide a brief overview of the RTF format so RTF or rich text format is a proprietary document format that was developed by Microsoft it sports a wide

array of embedded content which you would expect with a file format for a word document but this includes oh le objects which we'll be going into in detail and are very important in relation to thread kit and Venom kit they support things like pictures fonts annotations drawing objects pretty much anything you would expect and they use the format terminology they use something called control words so I'll show you what those look like and control symbols and groups and those make up the formatting for the document itself it's pretty nice because it's a plain text format so it's pretty easy to to analyze if you are looking at these manually so that here's an example you have these

are those control words and which define the format and then you have these groups which are marked up by these curly braces which is like commonly used in programming languages to denote blocks of code and then you have what's called destination control words which essentially reference data elsewhere in the document but in this case the ones the one that we're actually really interested in is the Abdi de destiny destination control word which can be used to reference a embedded oily objects within the document format and then embedded olee objects are represented as hexadecimal encoded bytes okay so I know I alluded to the O le format it is a proprietary binary format that was also developed by Microsoft it

is a subset of comm or the component object model framework but essentially the idea with olee is to have a an application being able to create a document that's able to reference data internally within the document that's supported by a separate application or externally to the document that's supported by another application so the terminology is essentially the application producing the document is called the container application and then the application that the Oly object is tied to is called creating application it's a really good example of that is you can embed flash objects within Microsoft Word documents and the Oly objects will be flash oily objects and the creating application will be Adobe Flash so that's what those all the

objects are going to be loaded by there's two types of oily objects the first is embedded objects which are content application data content which are embedded within document itself and then there's linked objects which reference external application data and the creating application so again if we go back to that flash example Adobe Flash being the creating application that is identified by a a clsid which is essentially a good that's used by the operating system to look up how it can load that object there's also cases where it uses class names and I'll show an example on the next slide but the most interesting thing about oh le is that it's widely supported for a huge number of

applications so the implication is that since you can load a huge amount of different types of data in Microsoft Office the TOC surface is fairly large because the more external applications you support means that the more sorry the more means you have to try and exploit those applications through a word document which most people would view as being a bit a benign file format so and we'll be going into how those applications are supported okay so here's an example of a binary olá object we have things like the O le version so there's been multiple iterations of this format then we have the format ID so this will denote whether or not the embedded the object is embedded or it's

linked so in this case this is an embedded object then we have things like the class name so in this case the class name is packaged so packaged is essentially an embedded file within the document as an oily object and then we have the native data size so this is the the amount of data that's stored within this object and then this is the file data itself so as you can probably tell this is a bat script so this is a an embedded bat script that's used by thread kit once it gains successful code execution okay so I just wanted to give a general overview as to what happens and how code execution is achieved and

then once code execution is achieved what happens after that so with both the redkit and venom kit so in this first example we start with a single RTF document that first attempts to exploit a logic bug that is CVE 20 17 8 five seven zero and based on how that logic bug works works it requires you to execute an SDC file so or a scriptlet file which is a more common term and that's used to execute a bat script and we'll go into how this logic exploit works in detail there is also a second exploit that is attempted that is actually in a decomp dispatched server that is called the equation editor so this is I don't I don't believe it's any

longer supported but essentially this is an external application that processes equation data for Microsoft Word when it's seen and then it attempts to to separate CVEs for that application and finally it will attempt a flash exploit as the loss measure so probably what's the most interesting about the ordering of the exploits is that it goes from a logic bug to a bug that tries to exploit an external service to a flash bug that essentially requires a use after free in the memory space of word so what's significant about this is that if the first exploit crashed then no further exploits would be loaded and attempted but since they have logic bugs it's not likely that word will crash on that

first exploit attempt and subsequently it tries to exploit an external application if that external application crashes it's still not a big deal it can still continue to attempt exploits and finally the use after free vulnerability in the process space of word if that crashes we've already attempted all the exploits so it can still or it's still not a big deal and finally if successful exploitation is achieved by any of these means there is a batch script that is executed and that batch script that batch script does a whole bunch of things but the two most important things it does is it executes the final payload so this is a portable executable that is embedded within the

threat document that it delivers and it over writes the original document with a decoy document so one this destroys the forensics of the original document so it can't be recovered and two it displays a document that attempts to social engineer the user into into thinking that nothing's wrong or they haven't just gotten owned right so that's essentially the the main role of this thread kit also exploits essentially the exact same CVEs in the exact same manner however they do things a little bit differently once they execute the batch script where they execute CMS TP Exe this is a valid Windows executable and it can be used to download execute remote scripted content so that's significant because one thread

kit has all of its data and eventual payloads stored within the document itself locally or as venom kit hosts everything remotely so this makes analyst jobs a little bit harder because if this command control server is down you're not gonna be able to get this final payload and you're not gonna be able to get the final decoy document so it does things a little bit differently and it's fairly similar and modular and a modular approach to what some red teams and penetration testers do ok so let's get into the documents the document exploits themselves the first is that logic exploit I was talking about so this is using what's called a composite moniker so I won't be going

into the comm theory for monikers specifically but essentially what you have to take away is that this is the CVE so it's CB 2017 eight five seven zero and that it's a logic vulnerability it uses the standard Olli Lang class objects to to embed these two to embed two monitors that make up a composite moniker the first moniker is called a file moniker and essentially this references the scriptlet on the local file system and then a new moniker which essentially triggers this vulnerability so hi feely in Bing sung I'm sorry if I messed up that pronunciation from McAfee did some really good research in this area I'd highly suggest checking out their detailed blogs and research on both

these exploits as they were the ones who actually found the initial bypass 420 1701 99 which resulted in CV 2017 eight five seven zero so this is what one of those competent monikers look like essentially the the most important things are the CLS IDs that are used to to load the object and this which is that final scriptlet path so as you can see you can see the CLS ID for the fact that that distinguishes to the operating system that this is a competent moniker then you can see the file moniker you can see again that file reference and then this is that new moniker so this together will execute that logic born ability or that logic exploit rather so

here's an example of one of those scriptlets that's eventually executed through that logic vulnerability here basically there's using visual basic script to instantiate a W script shell ActiveX object and then they are using that to execute this test out that well so pretty straightforward a lot of the other ones are a lot more interesting to look at because they're highly obfuscated but this one is pretty simple okay so the next boner ability that I had mentioned earlier is the equation editor vulnerabilities and that external service so these are 20 CBE 2017 one one eight a2 and CV 2018 zero eight zero two so in this instance using that Olli terminology the creating application is the microsoft equation editor known as

EQ Ned t32 exe and what's significant about this is it's launched it's launched by the decom server process launcher service and it uses the distribute the decom methodology which is essentially a client and server methodology that's how you can think of it in your head to communicate the use this equation data from Microsoft Word to this external service over over decom so what's again what's interesting about that is since it's an external application that's running even if it crashes it's not a huge deal and multiple exploits can be attempted so let's look at that first vulnerability so I've called it the font record buffer overflow but essentially it's a unprotected string copy which results in

a stack buffer overflow with user control data so very straightforward this is like stuff from the 80s it's very easy to get code execution the font record format that is using is called the MTF record and again there's no depth no I know a SLR no stack if you use there's basically there when this vulnerability came out there was basically no modern security mitigations on this old crusty equation editor binary that was sitting on every operating Windows operating system since like 2000 so it's pretty interesting and basically all you had to do in order to get successful code execution was return to a unknown when exact address and then the font name actually contains a string

for one exact to execute and then basically they use that to get code execution so pretty straightforward so this is what the entire exploit looks like as you can see it's pretty small basically this is that one exact address that they're overwriting using the buffer overflow and then this is that command so this is the final batch script that's executed on successful exploitation of the buffer overflow so the next one is the log font buffer overflow so this is 20 CVE 20 18 0 8 0 2 again this is a another string copy stack buffer overflow that was in the elf face name field of the log font structure so this was not addressed in the initial patch for the previous

vulnerability they added some security mitigations but unfortunately these were by possible because of the fact that the binary it was 32-bit so as you can see this this exploit is a little bit more complicated but it's still pretty straightforward the first shell code segment or actually what I should start with is this is the return address that's overwritten so as you can see this is only two bytes in size so the reason being is that 32-bit aslr will only randomized the first half of the address space so if you just overwrite the second half of that address then you can actually return to a valid address and you can bypass ASLR so it's very straight forward and then you have a

first shellcode segment let's return in - I'll go into that detail that second shell code segment returns into this other shell code segment and then this is the final command that's that's executed again so again I already spoke about the ASLR bypass which is pretty straightforward and then the overflow over writes a word in the return address sorry to return to again and this is that first shell code segment that I pointed out so basically they're pushing this is actually that reference to our execution string that they're pushing into or onto the stack and then they jump to the second shell code segment this is this this is the second shell code segment which basically just moves

this stack address into EAX and that is a valid code dress that's returned to in normal operations when it's not exploited and basically they just subtract a value from EAX and then that's the win exact address that is jumped to here so pretty cool but pretty straightforward I was pretty amazed that these vulnerabilities existed in 2018 okay so the last exploit that is exploited by thread kit is the Adobe Flash use after free bug known as CBE 2018 for eight seven eight and essentially it is a flash embedded object so in this case what's interesting is when Microsoft Word sees a embedded flash object it knows that it needs to load the Flash ActiveX DLL into

the memory space of Microsoft Word so once it does that it's able to read the embedded SWF flaw file or shockwave flash file and what's awesome about exploiting flash vulnerabilities on Microsoft Word is that there is no sandboxing so every once in a while you'll see like oh my god there's a new flash zero day everybody update what's nice about modern browser environments is most of them have plugins like flash sandbox so even if you exploit flash successfully you still up to find a sandbox escape which are very difficult to find as well but within word currently from what I'm aware of this might have changed there is no sandboxing so once you get code execution is pretty much game over

okay so how does that look so once you extract that embedded flash object you can actually decompile it back into ActionScript so this is the actual source code for the original exploit so just to give you a general idea of how they use after free works I won't be going into how use after freeze work specifically however they get the shellcode byte array so this is the shellcode that's eventually executed by the use after free they set a timeout which executes this excuse this start exploit function so it was nice as all the markups were like pretty straightforward like oh this is where the exploit starts but it creates a new use after free gener

object and this is what's executed upon that creation of the object first it executes this method to function this method to function creates a media player object then it creates a DRM object it initializes that media player objects using that DRM object and then that DRM object is freed so this is the free part of the use after free then we have a call to serve let's like then we have this try/catch block which will basically try to connect to foo which the exploit is is suspecting this will not connect properly this throws an exception and then this catch block will essentially create a new DRM object so it's quite complicated but basically this new DRM objects that was used is a

part of the use part of the use after free they then create a new timer and then they add a method one function timer and it's an event listener and then they start that so if we look at that this basically what this method one timer what does is it looks for that newly created DRM object to have a different value and basically this value that's checked if that is not equal that means they use they use after free succeeded and then they continue the code execution so that's the general trigger there's still a whole bunch of memory manipulation that occurs in order for the objects to to get a right wetware condition and read what where

condition and basically they use this to analyze the entire process space for other dll's it needs to get code execution one function it does is it basically checks this built in ActionScript function to make sure it's running on Microsoft Windows and it searches the memory space for the kernel32.dll then it uses this kernel32.dll to to enumerate the memory space for the virtual protect function and the create process function what was really interesting and it took me a while to figure out was its using what's called function object corruption in order to call these api's and then it uses those to execute the final shell code so that shell code that was loaded initially I just want to give a general

basic overview as to how that works basically it walks the in memory module list of the current process execution block it resolves the kernel32.dll using function hashing or sorry using dll hashing so for those of you who do any malware analysis this is like pretty pretty straightforward stuff it will then enumerate the export table of kernel32.dll for the get proc address function and then it will use that to load when exec once it's used the one exec or sorry once it's resolved one exec then it uses that to execute the test of that file which is the bad file within that execution chain so kind of a whole lot for like just executing a bat

file but still pretty cool okay so probably one of the most interesting aspects of Threatened venom kit was their use and is their use of whitelisting bypasses so for those of you who do not know a lot of organizations will only whitelist certain applications from running in their environment a lot of the time they will whitelist Microsoft applications by default which essentially means that if anything is executed that's not signed by Microsoft it will be stopped if it is signed by Microsoft it will be executed successfully now there is a whole bunch of different whitelisting bypasses that are currently in the wild again I don't believe that Microsoft considers it considers it boundary so but the significance is that

you can use a whole bunch of binary that come by to full on the Windows operating system to execute additional JavaScript code and a whole bunch of other things so again I alluded to that earlier but venom kit was using the CMS tp'ed exe binary to download execute that remote JavaScript and so that would execute successfully under an environment that had whitelisting for processes and I've also seen them using squiggly-doo which is essentially using register 32 to load DLL and I've seen ODBC complex C being used to execute squiggly-doo and so for those of you who are also in socks or like the security security space in general when you have something like a needy are product and you see

this these benign processes is being used to like load DLLs its unless your EDR product is tracking DLL loads then it's pretty difficult to distinguish whether or not that might be malicious activity okay so this is an example of one of those bat scripts that contains whitelisting bypasses so here you can see ODBC compte XE that's being used to execute read server to load this dll DLL this is a dll that's embedded in a thread kit binary or sorry a surrogate document and essentially that results in the execution of the DLL and the final malware payload and then they use taskkill I know it's hard to see but they're using taskkill to kill windward exe so that will close the current word

window they then enumerate I thought this is pretty cool they'll enumerate the registry for the most recently used document so ideally that would be the current document that's that's open they put that into this variable and then they copy that decoy document into that path and then so that overwrites the original thread could document then they open that thread document using that variable and then they clean up so this block txt is actually a what I would consider a lock file so they will produce that upon successful exploitation so if it achieves successful exploitation it won't attempt to execute the exploit molar sorry it won't attempt to execute the infection chain multiple times then they basically just clean up that SCT

file and the original decoy document and then this Dell syntax is used to delete the executing bat script so pretty cool so I know that was a lot and please ask me as many questions as you can or slash want to but basically what I want everyone to take away from this talk is that attackers are moving away from traditional exploit kits because of modern security mitigations and browsers and they are moving toward using different attack surfaces we've observed office being used as a primary attack surface for pretty much any type of compromise that we've seen recently the design of oh le is fantastic in that it allows a single pane of glass for a

whole bunch whole bunch of different types of functionality but with that comes a giant attack service so basically if an application is if an application sports le embedded objects that will probably be used by somebody at some point and it basically allows the attacker to have that application code as an external reference for a potential exploitation and again what was cool is the load ordering allowed multiple attempts at multiple exploits without crashing the application and we're seeing time and time again he crime actors are adopting red teaming and penetration testing tactics and we're seeing this in service engagements we're seeing it all the time so we definitely believe that adversaries are gonna continue to adopt these techniques

and everybody has to be prepared to deal with those techniques and again I know I spoke to to the fact that Microsoft had the saran attack surface but they addressed all these vulnerabilities in a very timely fashion and you can prevent all this stuff by just patching so none of these are zero days none of these are critical vulnerabilities that aren't fixable all you have to do is patch and then you'll be protected against both both these document families great I know I went pretty quickly but I just had a few acknowledgments so Alex F and Matthew Meza from Proofpoint did some really good research which I referenced in this research John jong-wook Oh so again if i watch that I'm very sorry

did some fantastic flash reverse engineering research at blackhat a couple years ago that's how I figured out that function object corruption stuff which was pretty difficult and my friend Michael Crowe Alec did some really good research in that flash vulnerability that was released and he you should definitely check out the morph recycle log on that great so I know I still have much time but does anybody have any questions yes okay all right so I'm you said that they were using a flash exploit for venom KITT but I thought that flash was already deprecated or is it just deprecated on browsers and Microsoft Word even the newest ones can still run flash objects yeah I just so it was actually wasn't

using Ben kid only threat kit but just because something's deprecated doesn't mean people aren't gonna stop using it right away right so it's so unfortunately Intel organizations basically mandate the requirement for people to rip flash out of their environments then you're still gonna see it installed it's still going to be loadable and it's still gonna be exploitable Oh

what do you suggest for detection techniques even if you know I am patched well what are the best detection methodologies for finding these particular Rockets so they're super noisy so in those bad scripts like they'll do a bunch of registry operations and they'll do a whole bunch of so like all those bat-file executions basically you're gonna see either like EQ that equation editor process executing a command which is super sketchy you should never see that for the flash exploit you're gonna see word executing like when exact in the next unit command which you should never see etc so if you have an EDR product that's obviously the best because you can see all the command execution across your

environment but if you enable the proper logging for command executions then you should be able to use your seam or whatever to look for these sketchy patterns for like words should never be executing sub-process or like the equation editor should never be executing sub-process and things like that in terms of general mitigations like basically all of these command or all these campaigns start with phishing emails so I know some people argue against proper education but if you can conduct tests phishing campaigns within your corporate environment or whatever you protecting I'm not sure what you what you do for work but if you can educate people that there are these threats and people are going to try and

break into your company these ways then that's that's the primary thing that should take place in terms of education but again if you can stop things from happening once they get code execution that's that's ideal as well I hope that answer your question as far as patching goes what do you recommend for a wait time to make sure that the patches aren't bad patches and aren't going to destroy things in your environment by bad Microsoft patches versus balancing that with your with staying past instincts secure what are your thoughts on that so I've never really led a secure security organization or like a sock so I don't really think I'm a great person to ask that question to

but I think if you can establish some sort of a golden image in your environment and then test what you need to to make sure that it isn't gonna break things I always suggest patching as quickly as possible so it wasn't it wasn't in this talk at all but like we saw the cobalt cobalt spider group using a flash vulnerability that was patched the day before and they were using it the next day so I realized most people have people that I answer to they have responsibilities of not breaking environments but at the same time exploits are going to be implemented the next day after their release to the public and they might even be you same

day so like yeah I hope that that's your question so in your investigation of these campaigns did you find much use of obfuscation encryption yeah mostly just occupation so plug for tomorrow I'm doing a workshop at 11 a.m. there might be some no-shows but basically the entire workshop is four hours of taking apart of these documents so if that interests you I definitely recommend coming but yeah there is there is a high amount of obfuscation and that the scriptlet files I know the one I showed was very basic but yeah most of them have obfuscation in the JavaScript and VB script and the script files and then they have obfuscation in the batch scripts as well that they excuse oh okay

have you looked at lay like it already sorry lay like it I have it is what is they look at oh it is cool we can talk

okay thank you so much [Applause]

BG - From EK to DEK: An Analysis of Modern Document Exploit Kits - Joshua Reynolds

Related talks