"GetInjectedThreadEx - improved heuristics for suspicious thread creations", John Uhlmann, BSidesCbr

Name: "GetInjectedThreadEx - improved heuristics for suspicious thread creations", John Uhlmann, BSidesCbr
Uploaded: 2023-10-26
Duration: 1 h 2 min 15 s
Description: Since its debut in 2017, Get-InjectedThread.ps1 has been a blue team staple for identifying suspicious threads via their start addresses. However, red teams have subsequently identified low-cost evasion techniques to counteract this - obfuscating their shellcode threads with start addresses within l

BSides Canberra · 20231:02:15324 viewsPublished 2023-10Watch on YouTube ↗

Speakers

John Uhlmann

Tags

CategoryTechnical

DifficultyAdvanced

TeamBlue

StyleTalk

About this talk

Since its debut in 2017, Get-InjectedThread.ps1 has been a blue team staple for identifying suspicious threads via their start addresses. However, red teams have subsequently identified low-cost evasion techniques to counteract this - obfuscating their shellcode threads with start addresses within legitimate modules. This talk will outline the memory artifacts that each evasion leaves behind and the development of an updated script which may be used to detect them. John Uhlmann John (he/him) is a Security Research Engineer at Elastic, where he focuses on scalable Windows in-memory malware detection. Prior to this he did similar work at the Australian Cyber Security Centre.

Show transcript [en]

so our next talk is John Olman he the title of his talk is up on the screen uh get injected threadex improved teristics for suspicious thread cre creations uh big round of applause for

John hello everyone um so this is actually the third iteration of describing my journey uh to detect anomalous thread Creations uh and I read the blur that I submitted to Kylie last night and realized I've lured a few of you here under false pretenses uh so the first version uh I gave at seaside's uh camera in August last year and at the time I was using the research topic to teach myself a little bit about po shell um and the Powershell script that I was uh most familiar with was get injected thread so I thought hey let's learn some Powershell and let's improve on this existing um detection technique uh now poell is excellent for for many many

many things this was perhaps not one of them uh so the second iteration uh of this was uh a technical blog that I did for elastic security Labs um and that was the the the first public release uh of the script um and in the the blur for this talk I talked about the script uh but when uh K and sylv said yes to me coming and talking here I thought hey let's try and do things a little bit better so I decided to rewrite my thread start address scanner in a little bit of C C++ um so that gave me access to a dis disassembler and it like it literally solved the performance problems and I

didn't even try um and I also discovered a few more things on the way um so if you were looking for a little bit of power shell uh there will be none unless I failed to update the slides uh but there will be C code instead uh quick little who am I uh so I'm a security research engineer at elastic um we're mostly known for our seam product um but we do have uh an endpoint security like a EDR product um after we acquired uh endgame a few years ago and then I I joined slightly after that uh also a technical blogger for elastic security Labs um uh I I kind of put EDR in quotes um I got to write an

opinion piece uh for elastic at some point um uh it's about the taxing of EDR bypasses because I think sometimes we're a little loose with what our definition of an R is uh anyway that's talk for another time uh so I'm former acsc uh former ASD about 15 years uh also awsn supporter um to find me in the Perth Chapter Room um if you want to hit me up on the slack and want to have a chat so just a little bit of os architecture to sort of set the stage for this talk so a process uh on Windows it's it's a little bit of a lie um it's really not much more than a shared

address space so on its own it does nothing it's threads it's threads that actually do all of the execution so yes the first thread uh in a process it starts at the entry point uh in that initial uh executable um but after that you should see threads popping up for all kinds of things they make the developers lives easier mostly uh They al also introduce the synchronization and awesome race condition bugs um but that's not today's talk uh but uh basically uh users like if you're looking at what processors are running there's only tens maybe hundreds of them if you're looking at what threads were running there' be hundreds of thousands of them so we don't want to expose all

that complexity to our standard users um so we have this this concept of processes which is really just a shared address space but the tldr for this talk is that um computers run threads uh Ergo malware likes threads um so today's talk uh is basically scoped I want to have a look at how malware goes about creating their own threads because they're just developers and they want to use these nice constructs that makes everyone's lives easier I want to talk about how we can detect them creating their own threads and therefore look at about how we can drive that cost into the adversary so um MD spoke this morning about like burning the in the wild

exploits and Bug classes uh my job is is that that post exploitation I want to burn their in the wild post exploitation techniques um and really drive that that cost into um uh into the adversary so get injected thread ex uh so this PO shell script uh was released in 2017 at the sand threat hunting Summit uh it was a collaboration between Joe DeSimone uh who's a tech lead working with me elastic um and uh Jared ainson who is over at Spectre Ops um and it's pretty much been a little blue team staple for identifying these suspicious threads via their start addresses ever since um so basically at a high level what it does is it uh detects threads

that were created with a user um a user start address that is in unbacked uh executable memory so uh unbacked executable memory um it's very normal in processes that do just in time compilation things like net that we just uh talked about JavaScript engines there's lots of lots of un unbacked executable memory but it's not that usual for a thread to start in that unbacked executable memory so usually the the jit engine um uh or the or or the um the the net engine itself it will create the thread on behalf of the jitted code it's very rare to see the jit code itself needing to create these threads um so uh if you so uh just to put it more

visually um uh up on the screen there I've got a screenshot of CIS internals virtual memory map and it's basically down the bottom of the address space you see all the purple areas and that's all of your PE files that's all your shared libraries that's your your your original executable and that's where you're expecting thread starter dresses to be all that other lovely area up up above all the oranges and the yellows and the Light blues um yeah you can have some jit code there can be some legitimate reasons to see some unbacked executable memory in there depending on the process um but you're really not expecting a thread start address to be pointing ever

to that and that was the the Crux of that um that initial detection so how does bit of a win32 programming win32 being um Microsoft's sort of lowest level supported API layer um so the create thread API is the thing that you use in Windows to say hey I want a I want a new thread because I need to do some good stuff um and it takes two things uh one is the uh the address of the entry point of a function um and one is that function has to take exactly one parameter and so the second thing it takes is uh the address of that parameter in other words it is literally a simple show code Runner you can ignore

all the other parameters um and it has this lovely sibling function called create remote thread and that is effectively just remote process injection by Design um so the the original detection was based off the inspection of that start address um and uh just determining whether it's backed by a p file sitting on disk um or not um we want malware to have to be forced to put files on on dis because it raises the costs um and if we want to we want to be able to force them to have to put sign files on dis because that raises the cost a little bit further um now uh this particular script um it's a after

the fact a retrospective detection um based on uh saying hey colonel um hey windows konel can you tell me what was the value of the win32 St address that you stored in your kernel bookkeeping in this this structure called the eth thread um and uh then can you can you tell me um whether or not it's it's unbacked or not uh and now that same information even though the script is retrospective detection it is available in line during um thread creation notify uh callbacks that your security product should be listening to so uh just on that taxonomy of secury products like all good EDR products they should be providing you Telemetry of suspicious thread Creations uh and all

good EP um prevention products they should be trying to like deny those suspicious thread Creations by default and then only um with a mechanism for you to say hey actually no that's that's legitimate but somewhat weird software um that we bought 20 years ago and is now keeping our whole business alive um to sort of let those ones let those ones through um so you will see these legitimate things uh in the wild so you will see for example a lot of other security products um I won't name names in this talk I have given other talks I've Nam names um and they really should know better like they they're a little bit trusting they think hey I'm the only

security product here um and they don't have to worry about detecting themselves but when you got multiple products then you're just polluting uh um gets a little bit ugly uh you will find older copy protection software because it was like it was a a newer it was an emerging area they didn't really know better um you will find it in a lot of anti-che software because they're trying to hide from uh the cheaters um and that's a lot harder than malware because uh mware is running on someone else's computer whereas cheats are running on your computer so like I wouldn't want to be an any cheat developer um but it's okay because I don't have to worry about that because

no one plays games on their production systems right just just just stop doing it I hate the Telemetry um please uh the other major use case I see is UNIX software um which has just been shimmed to run on Windows and and Java kind of some the Java implementations uh fall in that category now look I will give them a free pass um look jvm it's platform independent and windows is really a second class Java citizen it's better to have something that works um even though if it may not work natively uh windows isn't its native Target so um but even with this finite set of exceptions to handle this detection and or prevention depending on

uh whether you're in line in the kernel um it remains highly relevant and successful today red teamers thank you I love you um so a number of bypasses have been demonstrated and published by the red teamers uh most notably there's these two excellent uh blogs uh firstly understanding uh and invading get injected thread by Adam Chester uh and the second one uh avoiding get injected thread for internal thread creation by Christopher pasan um so there are other bits and pieces around but I think those two are probably the most seminal um uh pieces and today we're going to talk through each of the major classes of bypasses and what are the forensic traces that they leave behind so how we

can sort of up our detection game so just jumping back to the original thread creation uh heuristic so it was basically implemented just by a single uh win32 API call um to ask the colonel hey tell me about this this thread start uh the memory region that belongs to this this thread start address uh and that information includes this flag that say hey you're you're a p file me image uh you're a mapped file me mapped uh or maybe your private memory me private or if something funny is going on you might be meem free because they've unloaded themselves and then you've got a dangling pointer and things could go boom but malware doesn't mind too much

about that anyway uh if it's not image and it's a thread start address it's suspicious it was simple it was quick and no deterministic false positives for certain software so how did malware and red teers go about bypassing so in order to do this so you need a win32 start address that you can provide to the colonel and say hey this is an image back location but then you also need to very quickly have execution on your shell code um which is somewhere else in memory um so one way of of of bypassing this is BAS basically yeah a thread starter dress that will very quickly um by it will contain some instructions that will transfer execution almost

immediately to your unbacked memory um so we call this a trampoline because you're very quickly catapulted somewhere else so there are very there are four broadcasters of trampolines uh you can build your own from scratch uh you can use an illusionary trampoline um or now that we're all talking about geni you can make the kernel hallucinate um uh or you can repurpose something else as a trampoline or you can find an a legitimate actual trampoline just sitting there waiting for you to jump on it uh this is also known as hooking spoofing gadgets and wrapper functions and we'll talk about each of those four today hooking number one bring your own trampoline so the simplest trampoline is

a small hook you just write the necessary jump instruction somewhere into uh existing image back memory um these bites you could actually even restore them to the original value right after you've done the thread creation so that helps with avoiding retrospective detection maybe might help um but recall that your endpoint security product should be doing inline detection on this so they should be able to see that hooked uh entry point at execution time and you need those bites to actually be there uh so here's a little example um is my favorite uh function to Trampoline by because uh uh debug UI remote break-in in ndl it's actually a legitimate remote uh thread entry point that you're going to Expect When You're

debugging software you just like uh not expecting really to see it on uh production systems but a vendor who's like looking at you from 40,000 ft like they're not going to know if you if there's not a CIS admin doing some debugging on the live system that's why you really sometimes need that uh CIS admin context to really kind of enrich what these alerts are it's like hey is this good or bad please sis admin can you click yes or no on your device um uh so uh other quick little thing to note um is that uh I use right process memory here because I'm lazy um most mimage uh memory is uh read only by

default um and right process memory actually checks the permissions for you and says oh read only I don't want to fail I'll make you writable then it writes the code in and then it'll it'll it'll set it back to being the previous page permission afterwards um so uh it's that simple to uh avoid the detection not much at all for the uh malware author luckily we can detect that a hook start address occurred very easily so to save memory uh Windows ensures that virtual memory for shared libraries because that's the whole point of having uh Dynamic link libraries dlls the shared libraries as opposed to having them static and monolithic is to save memory back when memory used to be expensive

it's still yeah it's still good to save memory where you can but basically for for for shared libraries windows make sure that um the same backing physical pages are used wherever possible and they just tag that memory as copy on right so it's like hey it's shared for as long as possible until somebody writes to it and then we'll give you a private copy and we'll put a new page in into physical memory so as soon as that hook is inserted um the whole page can no longer be shared instead a copy is created and even if you unhook later the fact that a modification had occurred um is still lingering around so what we need to

check is hey is the start address in that in the working set is it in the private memory of that process is it is the accounting um saying that memory is against being against that process or is it being held against the shared memory in the entire system um so uh hang on if I jumped so we can uh quickly uh query the colel memory manager and ask whether or not the page um is shared or not um there's a there's an API for that uh you can see it up their query working Set uh ex um so now that we know that something was modified uh which is uncommon but we don't necessarily know whether or not

that particular address was hooked uh as I mentioned and that could be false positive because there could be a legitimate hook or some other modification um and uh on the on the image so in particular um you will find that many many security products still like to hook ndl like it's 2007 um and that 64-bit was really a kind of a glimmer in Microsoft's eye um uh but there is also a pleer um of security products out there of sorry of software products out there they just like to extend functionality so even like um like the game modding um but there's not just um game modding communities that we we heard about earlier today um there's

also there's a lot of people who are like hey um like we know that you don't like to click okay lot so we've got like a a a we we'll mod so we'll click those text boxes for you because um we we just want to save you time for clicking things that we know that are going to pop up there's all kinds of um productivity mods that people make to to legitimate software so you kind of got to deal with with those as well um and as I mentioned earlier there's all the copy protection stuff like those those old Packers that like to unpack their code before runtime so all we really have at the moment is that we know that

that 4 kilobyte because that's the default page size on Windows is private um and then then the inline detection would have to then additionally compare hey is that look at look at a pristine copy of what should be there and say hey has it been changed and then alert based on that and then once again to deploy it scale um keep a list of of what those rare legitimate use cases are um so uh number two Shifting the that so I mentioned earlier that security products can do inline prevention at the time of the thread notification callback so you got a thread notification callback and you've got the thread code first running are they next to each other or

is there a bit of a window there um so what I said earlier was true but it's also not quite the whole story so it's the thread notification callback is called before the thread starts and it turns out out um that you can make some modifications to the thread State uh before the thread is executed it's not even a race condition um uh you can it's it's the OS will just let you do it um so in particular um Microsoft is not provide security vendors with a mechanism to determine whether a thread was created suspended or not so they have a mechanism for saying hey you can have a created a suspended thread and you can do some stuff to it and then

we'll start it oh but by the way the Callback fired before we let the user do the stuff to it because a lot of these callbacks that the security vendors are using they weren't initially planned for security use cases so potentially all of the all of the security implications weren't always thought out um and on 64-bit Windows uh Microsoft does not let security vendors do our own kernel hooking so we can't easily check what the parameters were to that create thread um callback either so the mware authors very quickly realized that they could could create a suspended thread which triggers the Callback and it looks like it's going into that beautiful purple um PE section and the two thumbs

up from the security vendor um but uh then they can alter the threads context there a there's a there's a function called set thread contacts which lets you change the register State um so the registers have like the instruction pointer um as well as all the the parameters that that will be um uh passed into various functions um and uh then you can resume the thread there's also something uh known as early bird APC where you get to you can cue a function to run on a thread and it turns out that those cued apcs actually run before the initial entry point is run as well um so this is basically like you've got this amazing passport control system

and you come up and you show your passport and you check your face and then after I'm through the gate I open the unlocked door next to it and I I pass my passport through the the my mate and they walk through um so um it's it's it's once again though it's not the whole story um thankfully effective security products um can still do this detection in line there's a few tricks depending on the OS version that you need to uh employ um but that's maybe a topic for another talk just um it can be still be done what about that retrospective detection VI our scanning tool um as I said earlier uh like you

holding up a fake passport um if you're the legitimate person it works at that point of time but then if you give it to your mate like it's not going to hold up to any kind of retrospective detection so um in a normal thread the user mode start address it's going to be the third function in the thread's call stack so when the thread is first allowed to run um the operating system passes down to uh NTD dll and to a a sort of native AP I just sort of bootstrapping mechanism um and a function called RTL user thread start runs and then it calls into the the Wind 32 layer into this this base thread and

Nick thunk and then after that is the win32 the first user address is called so when a thread's been hijacked you typically either going to see a really odd call stack so either in that in that very first frame suddenly there's no entd and you've got stach wrong there um or if you've done the hijack a little bit smarter and you haven't just overwritten the instruction pointer you've actually overwritten um the user mode thread which is down in another parameter to that um in in a different register um then you'll see that unbacked address in the third frame um or for early bird APC you'll actually see an entirely different call stack um you'll see ndl It'll be in this sort of

initialized thunk and then it's calling NT test alert which is the way of saying hey can you see if there's any apcs pending and then Quee them and run them and then you'll see uh nd's user uh APC dispatcher and then you'll see the injected code so you got this this call stack will look really weird if you come back and have a look um after the fact um so because I was learning stuff and having a bit of fun uh in my original poell implementation I was like yeah why would I want to do the usual like walking down this call stack that's a little bit easy um there there's functions that do it for you um and on

on 32-bit you're pretty much following um these saved pointers that that point exactly on how big each of the stack frames is um so a stack frame is basically it's it's it's a block of memory related to a function and then as a function calls another function there's another block of memory put on top of the stack where it's like all the local parameters and all the data just for that function and then when it returns that that bit's taken off the off the top and when you're coming from top down um it's it's it's easy to work out how big that that is so on 64-bit you look up some exception unwind information and and 32bit you look at

these frame pointers and you and you do that but I'm like Ah that's a little bit boring let's see if we can climb up the stack with no further information um so with Powershell I just did it really rough and ready and I just ran ran up that uh that uh stack as quick as I can um and every aligned potential return address because it the it always had to be 64 bit ofine I wasn't worrying about the W 64 case um are you a potential return address are you um executable and are you in me image and I was getting pretty good results uh sure sometimes uh it would mistake what was actually a register for

um for a return address for what was a parameter um but I always had the right set of frames even if sometimes I got a bit confused about the order that they were there um so it kind of worked okay and I was in poell and wasn't enjoying myself too much so um I figured that was good enough came back uh in C++ and thought hey can we do this a little bit better so um I had two things one I had a disassembler um no Li brid and po shell that I'm that I'm aware of but but um I can now actually uh look at the the uh bites that preceded that call site

and say hey are you actually a a valid call site was there a call struction right before where that return address is going to be so that's a an extra data point um I had for working it was legitimate um and then I also had all this lovely unwind information I could actually say hey um once I'd climbed up and found a candidate rather than from the top down where you start you immediately know what the next one is and you just ask um I climb up I find a candidate and I say hey how big is your call stack and if that call stack isn't the size of where I am on the stack I'm

like oh nope you can't be right um so with these two things I could actually get really valid perfect core Stacks from climbing up from the bottom as opposed to walking down from the top alas sometimes I encountered a dreaded set FP regge unwind op code so on 64-bit in working out what size of the stack is um you have the option on 64-bit to use a frame pointer because sometimes there's a few optimizations that you can make um and as soon as you use a frame pointer you're now allowed to use a function that should be banned um alaka lets you allocate dynamic memory on the stack so you've now got potentially Dynamic stack frame sizes um some people

at Defcon not that long ago use this fact to sort of try and trick um stack walkers from basically saying hey there's a really big stack frame and it's Dynamic and now I just skip over all the bad code um and and uh stack moonwalk if you want to look that up um anyway so if I encounter one of these functions and I'm coming from bottom up if I'm coming from top down I can kind of emulate the state and I can know what's the value in that um particular register at the time and I can work out the real thing and I can jump past it but when I'm coming and I'm cheating and

climbing from the bottom up I didn't have that information so I really only had a lower bound um but it's still it's good enough and uh pretty much uh this improved sort of Stack climbing uh was accurately identifying the initial stack frames in 100% of the very small number of test cases that I ran so um in short dodgy call stack you're doing something dodgy um and we can find those spooked to address uh category three rapper functions so the third B pass is to find a function that does exactly what you want um so there's actually uh a multitude of these so for example uh the C run time which a lot of the the Unix

folks are familiar with in Windows that's written as an extra layer of libraries that sits on top of win32 and the SE runtime just like the win32 API needs to have a thread creation API um now in the SE runtime uh these these apis they uh the SE run time does a little bit of extra bookkeeping um which is why they uh they they wrap the the create 32 uh sorry crate uh thread API um and you pass your address um into the C runtime and it then calls create thread with this internal function and with with your thread kind of packed into this little parameter thing uh in the case of the C runtime Microsoft has

open source the the code for that so you could if you wanted to work out exactly where it wants to be but for all of these rapper functions maybe you don't have have the code for what's going on um so one thing about these wrapper functions um is that without additional tricks they pretty much can only be used for inprocess Creations um the exception is load Library um and I'll I'll touch on that a little bit later um and those additional tricks they kind of do exist though I won't talk about them today but you should not expect some of these um functions uh like the C uh begin thread you should not expect to see that in a

remote thread creation Now when I'm doing the retrospective scanning after the fact and I'm asking hey Colonel can you tell me about all these threads and what their entry points are now Windows doesn't do any bookkeeping to tell me whether or not it was created um remotely or from from a different process or from the the inside itself self process um so when I'm scanning I don't have that context um so it looks very legitimate um but in those inline callbacks that your security products are hopefully using um they are able to make that distinction um and they can be more aggressive then in blocking those uh remote thread Creations now on on my case on the

retrospective detection front um so we saw earlier that uh to detect uh the the spoofing I just had to kind of uh climb up the stack and find the first three frames and if the third frame if one of them was Private then we had a spoofing problem well with a wrapping function I've just got to climb up yet another frame um and see if if we've hit a private address that early in the call stack um uh and now theoretically yes this could false positive on some kind of jit or other legitimate packed code uh but I'm yet to encounter uh a sample of that um all of the the jit engines that I've looked at um uh they do kind

of their they're calling into their jit code further down in the cor stack so if I'm just looking at that that first kind of four or five frames um I'm it's pretty Trust worthy that those are going to be towards legitimate P files and not to sort of unback jit memory which is um so if you do find it it's probably not jit at all it's probably Shell Code um which is just jit by another name and for a different purpose yes I can see a few red team is out there um uh yes my my stack climbing yes you can bypass it quite easily by overriding it with some fake data on the stack um particularly since there's a

lot of uh stack randomization so where you actually start with the stack um uh Microsoft likes aslr address bace randomization stack randomization they they don't like the exploit writers to know where you are and what your offsets are so they like to shift you around a little bit so you could write some um some false stuff for me to find um but a I was having fun with st climbing security vendors are probably coming properly um uh and B uh I probably could just save what those uh what it looked like um in the pristine copy during the uh my initial callback and uh and then I can later on say hey that value has changed that should never

change Shenanigans are happening here um or maybe I can just climb a little higher because eventually your shell code needs to run somewhere um side note uh if any of the red teamers want to play with RTL dispatch APC um I mentioned earlier that there was no remote uh perfect remote wrapper function I thought I'd found it very briefly um so this is used um in the internals of APC where Microsoft doesn't actually cue user apcs they cue this internal function which has a pointer to the user APC and I thought hey this should be able to work um but unfortunately it doesn't check that a parameter is non-null and seems to blow a few things up but I'm sure there is a

way to make it work if people want to play with that [Music] um so rapper functions we just needed to climb up the stack a little bit further gadgets this is where we get uh a little bit interesting and this is where malware writers started borrowing like techniques from exploit writers um so uh number four is in order to repur repurpose something to do uh something other than what the original author intended it to do um a gadget um you might have heard of R and jop and all the other things that the exploit Writers Do to kind of get from this weird State back to stable execution well even when you've got stable execution sometimes you want to use

these little gadgets to like trick the OS into doing different things so um our earlier 64-bit hook um it was 12 bytes uh and finding an exact 12 by Gadget is is fairly unlikely in practice um but thankfully 64-bit Windows functions use um what's known as a a fast call calling convention it's a for register fast call calling convention by default so this means that the first four parameters are passed by um a registers so when the OS calls our Gadget we have control over the rcx register which is where the first parameter is going to be because if you remember back to the crate thread function we get to say hey this is our

function but we also get to specify one and exactly one parameter so we've got two things that we can deal with um so we can pass a trampoline that is going to sort of jump to rcx and then hide our our Shell Code final address inside that parameter address um so the simplest 64-bit Gadget is then a 2B jump rcx construction also known as ff1 two byes much much easier to find and basically trivial um in fact gadgets don't even have to have originally been instructions they could just be within operand or other data in the code section so for example the ff1 gadget that I found here in ndl it was part of the relative address of a

goid now we could come up with trying to a list of all possible rcx to rip pivot gadgets try and detect those um maybe more on that soon uh but how about we try and actually detect some unknown gadgets too because this technique doesn't actually work yet so in all modern Windows software thread start addresses are protected by control flow guard or CFG it's an exploit protection now control flow guard has a bit map of valid indirect call targets that are computed at compile time so to use this gadget the mail web must typically um call set process uh valid call targets uh function and ask the colonel hey can you just dynamically flip the bit

corresponding to that address in the bit map so that I can uh use it as a uh valid indirect call Target this is not a CFG bypass it is a CFG feature to support legitimate software doing weird things now remember that CFG it's an exploit protection so being able to call set process thread calls Targets in order to call create thread like it's a chicken and egg problem um so it's it's for for exploit Riders so it doesn't weaken um CFG now like before to save memory the CFG bit map for dlls is also shared between processes so this time to detect whether or not they have modified the CFG bit map we just need to say hey are you on a

shared page or are you now in the private working set and if a CFG bitmap page is now on the private working set and there's a thread start address that corresponds to that page we probably have a problem now each uh two bits in the CFG bit map corresponds to 16 addresses uh two bits is four states um so I'm just going to tangent um I needed some sparkly tangent um slide here so so Microsoft actually did a pretty awesome uh optimization with the CFG bit map so two of those states only correspond to the 16 by aligned address whether it's allowed or whether it's export suppressed um which go read about CFG if you want to know what that means and

then there's two more States and they correspond to all 16 addresses either allowed or denied so what's with this 16bit alignment well modern CPUs they fetch instructions in 16 by C lines so when you're jumping into a function you've probably moved to an entirely different bit of of memory so when you enter that function you want as much of the instructions that you're about to execute to be hot in the case as possible so compiler riters are like hey performance let's have all of our function entries on 16 by alignments when we can so most functions not all most functions are going to be on 16 by aligned addresses so this means that the CFG bit map Microsoft was able to make

it an e 1/8 of the size without any appreciable risk in the increase the risk of valid gadgets being accidentally being made um possible by an overly permissive bit map like a factor of eight Improvement by a little bit of smart understanding of CPUs um is pretty awesome but hang on so two bits corresponds to 16 addresses so we've got a private 4K page of CFG bit map that's 256 kilobytes of code being M oh yep that's a lot of false positive potential so all we need to do for our false positives is just hope that legitimate code will okay yep never hope that legitimate code is not going to do that one weird thing it

always does it um so quick shout out here to the WTF bins project um you find it on GitHub it's an attempt to document software that behaves exactly like malware except somehow it's not so um in the CFG bitmap modification page i' I've come across three scenarios U where it happens so one is the Legacy Edge brow browser um so they started toggling a few bits um uh after the fact um they tried to unset a few bits to basically make certain abusable functions harder to exploit from their from their sandbox um by making them uh not valid uh indirect call targets uh user 32 um it is way too kind to Legacy software Microsoft does this all the time um uh

they are too kind but they have to be kind otherwise no one would use Windows um so user 32 if the developer did it wrong and didn't do this uh uh unsuppress of an export address when they register a call back with user 32 then Microsoft will do it for you so um yeah it should have been compiled in the developer should have compiled it with the right Flags but Microsoft did it for them and messes a few things up uh the other one security products um so some most not mine um security products they'll Block they'll drop a page uh of hook trampolines and they'll drop it too close to a legitimate mod module and as

soon as they drop it in there then they they make that CFG code page have to go private um and so that wastes just a few kilobytes of your memory that didn't need to be wasted except they'll actually often drop it at a different module's preferred per boot load address so then in every process that module needs to be loaded at a non-preferred non-shared address so they actually can waste multiple megabytes of your memory but it's okay you've all got gigabytes um right so we're going to have to deal with false positives because they're going to happen um uh unfortunately so like we did before was like hey let's have a look at what the legitimate bites

should have been and we can do that comparison um when we're looking for hooks we'll say hey look at the file on disk is it what we were expecting this time it's like once again um the CFG bit map it's compiled into the binary and then just kind of mapped into into memory so we could find that in the file of disk except it's already in memory and the copy in that remote process where the thread's been created has been modified what about the copy in my process well that's clean and it's there and it's really quick just to do a little bit of uh bit arithmetic to say hey where in the CFG bit map in my process is the

bits that correspond to that address and have what as the original valid um one thing so all we need to do is find the C bit map um so Microsoft doesn't tell us where it is um but uh it's it's a a well-known uh secret that you can easily find the CFG bit map wherever you want and there's no point Microsoft not telling us um because there's functions that they have to tell us about that have to know where this bit map is so you just like look at the first instruction of the function and it's like load address a bit map and you just read that address out um so it's really easy to find the SG bit map

and you know where it is in your process and you just find the right offset and read the right entry um I'm going to typically you're running in a 64-bit process so obviously you don't have access to the 64-bit bit map if you wanted to do this for Wow 64 you just have to compile a well a 32-bit version and and run the conection from there I haven't um implemented that um but feel free if you're interested uh to to have a look at that um but anyway now to detect these Gadget trampolines all we need to do is find the quick offset in our own process read those two bits of memory um and say hey uh is it a valid

call Target because if it's a valid core Target it's probably good and and if it's not then it's definitely bad uh now the other approach that I mentioned earlier was to check the purported Wind 32 address against a list of known Gadget instructions uh alternatively instead of detecting known bad can we instead create a model for what good looks like what a good entry Point's going to be now theoretically 64-bit prolog should really be quite constrained because we need to be able to describe them by the unwind information in our PE file so that we can do that that call stack walking um and to handle any exceptions and all the other goodness that that

provides us in Theory that's great in practice there are compiler writers so it's usually just a couple of pushes of some registers maybe a stack pointer adjustment for what the fixed size of the stack is um and maybe you're setting a frame pointer because there's a few optimizations that you want to do because you're definitely not calling Alica because that function should be banned um but compiler writers want the speed and so sometimes they add a few extra things into the prologue because um maybe uh office is a good example here maybe they need to be patching a lot so and they know that users don't like to reboot so they had to work out how to do

hot patching so they maybe they need to leave a little bit of Hot Patch space at the beginning of functions um such that uh they can make modifications on the Fly um so there's office uh then there's also uh jump tables sometimes you're not actually an entry point you're just jumping to where the real entry point is um or sometimes compilers are like hey why set up my whole stack space if the first thing I'm going to do is check to see whether or not the first parameter is null or not and if it's null I'm just going to um jump back so Leaf functions or stub functions that never call anything else they don't need

to have a stack frame but any function that is going to call something else needs to have a stack frame but developer writers were like hey I'm a function but maybe I can pretend to be a leaf function and never set up my stack frame and I can just do a quick check to see whether or not the parameter was null or not and if it's null I can return before bothering to set up the stack frame so there's a few optimizations that can do just to to save a little bit of time out and all those good savings they they add up in performance um or sometimes there's optimizations because sometimes there are wrapper functions uh like there's a

function that has I never actually explained earlier why it's called ex uh in the win32 API um you'll find a lot of functions that have like an ex for extended or a two or sometimes an ex or a two it's just Microsoft's way of saying hey we wrote the API we realized it wasn't powerful enough so we've created a new version and then a new version um so sometimes they use a two sometimes they use an ex but just way of saying hey here's a new version of a more powerful um API cuz the first one wasn't quite good enough but we can't deprecate it because we said we'd support it forever um so sometimes there's a

wrapper function and all it does is it calls the other function but with a few fixed fixed values so um all it's doing is is is a really quick um set a few things up and then jump to that function so it's not really a function now in Powershell I didn't have a disassembler so I did this with a really messy reject kind of worked but it kind of was also very fragile um and was not that nice to read um in C++ I could write what seems to feels like a more complete prologue grammar um I won't talk through it today but um it's not actually that many lines of code um in fact half the lines are

commenting about what the code is that I'm trying to write like like the the set of instructions is quite constrained even when you uh think about what those additional optimization use cases are um so identifying code that doesn't find this follow this prolog conventional it's useful but it could definitely easily be like a rare compiler that I haven't tested against um and I've only kind of tested against the common ones um but like yeah Chrome uh jit bash a few other things that are uh nonnative to to Windows like they won't um uh they won't trigger this particular Det ction so as well as the bites at the entry point having to be legitimate um

what about the btes right before an entry point so typically depending on your compiler this will be a filte because once again hey if I want to align my B function on a 16 by boundary I got to put something in there um so that filte is either usually a zero um sometimes it's a 90 which is the KN do nothing instruction uh sometimes it's a cc which is the in3 or breakpoint instruction construction um or sometimes it's actually the the end of a proper basic uh block so it's either a turn or a jump um once again though that's only convention so like older compilers they would regularly place like raw data side by side with code because smashing code

and data together never worked out badly um for anyone um but uh that was a illusion to how sort of Stack based overflows work because you've got um you've got your in your stack as well as your state of execution in the same thing as opposed to having them separate um and so here we've got what is what is data but someone is allowed to treat it as code um so um but I did some analysis on the the Microsoft's compiler tool chain um I pulled down all the binaries on their symbol serve I did this back in 2018 um and Microsoft seemed to realize that hey if we need to eliminate um all these Gadget potentials then then we

need to really reduce the amount of data that's in the text section is Market as executable so let's get everything out of the text section that doesn't need to be there so there was definitely a push in Microsoft uh at some point because um Visual Studio 2012 it was RI there was just data everywhere in the Tex section Mark was executable it was Gadget City you could always find what you wanted um visual studio 2013 it was mostly gone we're talking small fraction of a percentage that um would would still do uh like sometimes it was like the IAT table would be left there um but most of the things most of the data had been pushed out um and then in

Visual Studio 2015 update 2 since then I have never seen Microsoft do this um so uh you still might see false positives with this heris but you shouldn't see it for Microsoft software you should really only see it for non-microsoft software um that is using non- Microsoft compilers well so I thought uh until I was updating my slides uh and I'd Rewritten uh in uh C+ plus and suddenly I was getting hits on Microsoft software I was like what's going on so had a little bit of a a closer look uh and that data that was sitting there right before that function entry point had some very interesting cross references so I'd completely forgotten about uh extended flow guard which is

another exploit mitigation from Microsoft so with extended flow guard before indirect valid call targets they put an approximately 55 bit hash of what the uh the function and the parameters are going to be uh before any extended flow targets and this is this is uh harder to bypass the control flow guard for the these ones because you've got to match that hash as well as find the right Gadget um so this really increases the those protections of of um indirect call targets so I'm like ah I'm stumped there's no way I can write a grammar for what is basically a one-way cryptographic H hash function which for all intents and purposes is going to be

uh random but hang on those values were 8 bytes 64 bits why did Microsoft say approximately 55 bit hash so did a little bit of uh web searching a bit of a look and I found a nice paper by the folks over at quarks lab uh it's linked up in the slide there and they noticed that uh there are two masks that are being applied to this hash one for bits that must always be on and one for bits that must be always off they're about nine bits each hence on average 55 bits are going to be from the hash and the other nine bits are going to be coming from the mask which is why

approximately 55 bits so hang on now I had a few bits that I could check so it's not as perfect not as nice as as as it would have been previously but um I can now actually account for those extended uh flow guard uh hashes um so a lot of effort here uh into working out what is is valid what what a valid uh thread entry point is it's got to be like a function it's got to be in certain spots and there's all this different ho-ha going on um I really would love it uh if Microsoft just just just required that uh thread entry points had to be like named exports and then maybe they could even

just use that name of that export as the name of the thread in the debugger and this everyone's life would be easier because you'd know what's going on in these different states but um it's yeah after the facts trying to work out as best we can so how else can we detect gadgets well we like to find Gadgets in modules that are already there because if you bring your own module like if you load your favorite module with your favorite Gadget into the process well the the fact that that library was loaded in the process might be a detection itself so really you're going to want to be looking in modules that are already there are there thread start addresses

in some of these common win32 modules that you find everywhere turns out as far as I can tell no there there is a whole lot of wi3 two modules that you will find um and and I put up absolutely definitively nonexhaustive list of um modules that should never be a win32 start address up there um and you can very easily check hey are you on this list and and just you just make it harder for people to find those rapper functions make it harder for them to find um those those those gadgets uh as I mentioned earlier colel 32 uh and load library is is kind of a little bit of a special case um it's not technically a

valid thread entry point um but it's the one that it's spoken about on how to do code injection in like a it's Jeffrey R's 2005 book on Windows he's like this is how you inject code into another process um so and that create remote thread where it's load Library pointing to a sign file on disk if you want to inject code into another process and you want to be friendly with the AV product and not get um have crazy amounts of you been blocked you've been blocked like do it that way inject your code via assigned dll on disk and then we can use the the signature of that dll to to give the trust to that entry point as opposed

to having to make our best best guess with some heuristics so that is the way that as security vendors we'd really love you to do your code injection um do it that way and then um avoid uh looking like malware uh so the one thing that wasn't in that list but it's the one library that is absolutely everywhere is NTD uh ntdll I often put too many D's or too many L's in it um so uh it's the it implements the the native API which is the layer between the supported win32 API and the kernel um so it has to be everywhere in every process now it's absolutely there so it's the it's the the place it's the place where I always

look first when I'm looking for a gadget um there are actually four valid entry points that I've come across um so there is the ubiquitous uh thread pool worker thread thread pool worker thread um too many threads in that uh there is a thread associated with etw logging uh and then there's two threads that are associated with sort of debugging processes sorry two thread entry points not threads necessarily um but other than those four I haven't come across another valid entry point so you can very easily say hey except for these four an ntid um uh everything else is bad should never be an entry point that just raises that bar on finding the wrapper functions

finding the gadgets now in my early uh poell scripts I couldn't really rely on having access to symbols and not all of those are exported um so this was much easier in C++ uh in Powershell I had to go through some really ugly Hoops to sort of get threads created in right processes that I could then scan for and then dynamically work it out um but it was much easier doing things in C++ um touching the the um the Wind 32 layer um turns out uh I wrote about a th000 lines of uh C++ to reimplement this the original script was 2,000 lines of Powershell um so it's just most of that was actually boiler plate on wrapping

like all the stuff that was in like you get for free in in C because it's all in the header files if you're trying to do interrupt you've got to like Define all these structures and stuff um and particularly when I'm hitting some of the Native apis um they're not always available in Powershell without crazy effort um so it turns out with gadgets there are quite a few ways to detect them so we had the was the control flow guard entry had it been modified was there an invalid prologue invalid tail bites or is it from a suspicious module so sometimes the hey you think you're being neat in by passing today's detection but you've actually left this

this whole big taale um this is this is the continual where blue teams and the red teams help each other where um some blue teamers don't always put on their offensive hat when they're writing something and saying hey I've written a detection okay now how do I bypass it and like you have this conversation with yourself and you like work out how you keep bypassing your detection back and forth and improving until you get to something that's fairly stable um where do we end up so the old get injected thread script did not detect any of these four classes as we knew uh and the new well uh the there is the the PO shell script it is really

slow um I do not recommend it um or there is the compiled binary um and it is much faster uh in fact uh most of its time is spent on first run downloading symbols for just a couple of modules um and then after that it's it's really quite fast and I haven't even bothered trying to make it multi-threaded which is um kind of fun since my whole premise was that more than one thread makes developers life easier um but we can detect everything that we know of so don't expect 100% detection from suspicious cred uh thread Creations Alone um you always are going to need defense in depth uh in this particular case uh typically there'll be like some

kind of memory scanning of unbacked memory um going on uh and also like when unbacked memory is doing suspicious things like calling out to Native apis or to win32 apis directly um my whole premise was jit code should not be calling a win32 API office macro should not be calling win32 apis so uh if you find unback code doing that that's really suspicious um or it's some software developer who was trying to work out how to do something weird and they found a a hacker page as opposed to the way to do it that doesn't um set off alarm Bells um so that said uh even though it's not uh not not not perfect I still

think it's a it's a valuable layer and just just making that a little bit harder um so I'd love to hear about any sort of uh bypasses that people come up for these uh thread Creations uh that me wants to use so uh it is actually somewhat easy to hijack a single thread after its creation uh and run just in that hijack thread but sometimes you need to give that thread back to the OS because otherwise like you're going to you might cause deadlock in whatever it was that you exploited um or sometimes you just need more than one thread because that's how the apis that you're using or you want to be able to pull out

to your C2 at the same time as you're running multiple payloads like threads are just nice um and as much as it easy is easy to hijack a thread and steal it ensuring that all of your malware threads including any third party payloads or statically linked libraries or reflective dlls or Buffs or whatever it is that you're borrowing making sure that they use the exact right detection bypass at the same time for what is the installed security product um on that host that is that is that's a maintenance Burden Burden for the adversary so they will make mistakes and when they make mistakes we will find their capability and we will burn it um and that's what we want to

do uh few references I didn't put up the the couple of blogs that I mentioned earlier um both the original one forget injected thread um and then the two uh blogs on the classes of bypasses um uh as well as my early uh my early blog on the these trampolines which when I released the uh po shell script um and then the GitHub was updated this morning uh with a compiled version uh of my C scanner um for those of you that don't want to install visual studio uh as well as the source code for those of you who do want to look at the code um I've tried to comment it and explain things as much as I

can and that's it for

me thank you very much we might have time for one or two questions in the audience if you can wave your hand hands vigorously uh do we have any at the back uh we've got one at the front over in the middle here as long as it's not Pete he asked his question at Seas sides thank you for that I was just wondering do you do this as a um after an incident or do you do this sort of real time during the day checking for malicious threads and then reporting them where maybe a human could go in and then verify is this really uh so trying not to talk too much about specific products but most of the most

of the products like mine um they will uh depending on what settings you set um they typically will have a paranoid mode which is block everything by default um and then you need to come through after the fact hey this new software that we bought and installed um isn't working and then we need to add an exception for that um and the CIS admin can do that or you can run it in P permissive mode where um you let new software run by default and then the the sock will get the alert saying hey or something suspicious um so mostly it should be running in those those those thread callbacks those inline detections um but

if you're uh if you find that your particular EDR product isn't giving you that visibility um absolutely you have the option to to run this um yourself um I know I see a lot of people with the old Powershell script like they set up automation where they would just run it bring the data back and and and examine it there there's no reason you can't do something similar um if your if the particular product you have doesn't have that that feature and I know a lot of people in IR engagements also used to use this kind of thing when they're trying to just find out everything that's happening on the system so um y hopefully that answer his question we

have a speaker gift for you as well we'll give it to you backstage but let's thank John one more time

"GetInjectedThreadEx - improved heuristics for suspicious thread creations", John Uhlmann, BSidesCbr

Related talks