Reverse Engineering And Control Flow Analysis With Intel Processor Trace - Hagen Paul Pfeifer

BSides Munich · 202529:37301 viewsPublished 2024-11Watch on YouTube ↗

Speakers

Hagen Paul Pfeifer

Tags

CategoryTechnical

StyleTalk

Mentioned in this talk

Tools used

Ghidra IDA Pro perf Radare2

Hardware

Intel PT

Show transcript [en]

oh then um let's start uh we have here exactly um half hour no question please a lot of topic uh what we are talking today today uh we're talking about reverse engineering and Intel PT processor Trace uh feature why it's important and uh it's um probably a new tool for you it will be not the only uh reverse engineering tool you will use in the future but it's an add-on to your tool zoo and I will introduce this because it's really powerful Tool uh where other tools you use uh will not work um we today we will talk about flow control and what you see here on the picture is somehow the forwarding path so if you create source code or have

source code and compile this somehow this is the flow control path which you can create with uh cang um but this is the forwarding path reverse engineering we talk about the reverse part so we have the machine code and go back to dis assembly and then going back to even de compile things um for the agenda for the next couple of minutes uh we'll talk about flow control why it's important and this section is about to level up and get everything on the boat what we are talking talking about um to get some yeah same ideas what we talking about then we talk about uh the in processor Trace what it is and then third is how

we can use it uh from linox um we talk about per and then uh how you can really use it for your reverse engineering as a task and then some practicable tips and also some shortcomings that will not work uh in with int PT so um now the first section um what we are talking about we talking about flow control and if you're talking about flow control we talk about static analys and dynamic one um the static is you have the binary somehow then you use your your tools we will talk about later of this and then you get into the things but without really executing things building the flow control craft look how things will work getting rough idea uh

what you have on the executable or library in front of you Dynamic flow control is the opposite somehow you're running the code execute things how it behaves somehow and normally if you're doing a reverse engineering you will um use both techniques somehow the the one more focus more on the dynamic aspect so it's really a personal experience somehow but normally you will do it in in this way somehow and um for all of this one aspect is really important this is the concept of the basic block the the previous talk was about uh JavaScript and email and things like that now if you close your eyes please think about all the time on assemply right so this is what this talking is

about so if a function is there you have a function prologue then you have this conditions if else uh thing branches and then you have a function epilogue where we return to the col somehow please keep this in mind and um the the basic block is a block in the samply if it's started it executes the all the instruction there and branches out at the end somehow but if it's executed all the subsequence instruction are always in executed or retired by your CPU as a one Atomic block somehow this is in concept not only in the reverse engine or control flow it's also important for a compiler optimization uh and even for the hardware itself

somehow we'll talk about a control flow and control flow crafts this is what's uh locally to one function so if they have branches lse branches statements somehow within um this is and if you build this control flow cph this is the the thing what's happening in one function the call graph is something what you call you Call functions this is the call graph if you're uh R to user for example you use the you sck to the function and for the control flow graph you are typing a for analyzing uh G for cph and F for flow and then you get this control flow cph you see it later and you get uh for the call graph you are calling a g c

to get a local um call gra and capitalize g z for the overall call graph somehow so this is built in and all the reverse engineering tools and you can use on the right hand side you see just instruction there disassembled for the text block and this is what we are talking about this is if you're reverse engineering it's all about um yeah instruction and getting feeling how it works so um I said uh static and dynamic control flow um analyzes the static one is you look at the crafts there created with radar to on the right hand side and get an idea what's happening there and get things done or if there is some anti-debugging

technique how to patch things there and get a rough idea how things works and what you're doing there is you understand the program logic you identify key things or obus skating this is what you're doing there on opposite the dynamic flow control um you're really executing things retiring things and the great thing is about this then if you're analyz a malware you really see what happens there right and the the static control flow graph uh generation get you everything but it's really important what is uh really executed and the static uh analysis can be really complicated in some cases and you don't get the Run timee aspect somehow and this is what you get with the dynamic flow control if you

really run uh things so you you sort and text here on the right hand side the indirect jums and things like that it's really hard and tough to get it uh what's happened there in with static analyzers so this so you use both methods and for the dynamic flow unless you use debugging tools right so GDB as the one the most powerful debugger somehow but there are some other debuggers built in in the reverse engineering tool like for example radara and things like that but that's not the only one to do a dynamic flow control you can also use a a unicorn for example you to execute just some um yeah not emulate the whole operating system

emulating just an architecture for example so you have a lot of tools there on the right hand side you just see a a GDB and a graphical front end uh for example to get things done so but if you're doing a dynamic analysis it's not that clear you have a lot of complexity doing there you had code up view scating right you have anti-debugging uh technology built in so if a debuger it's running um the the mware checks if there's a debugger running it checks if their pet call uh say something no I'm already ATT somehow and the M were exited somehow and don't um yeah execute the things they want to do they try really hard if it's executed

in a virtual environment and things like that so really hard to get all the debuggers and dynamic tools running um to doing malware analysis so this was the introduction somehow and now we switch uh over to Intel PT what is Intel PT in PT um is a hard fature introduced um with um with Broadwell CPU so it's really Intel features not implemented on AMD x86 uh uh processors and it's somehow fig about as an every every um execution every instruction is somehow um yeah snapshot it but this is not really as it work at the end you will see what has happened what it it's really done with in PT is all the I wrote um control flow influencing

instruction are recorded So if it's sending call this is recorded and remember from the introduction the basic block where instruction is somehow um started all the subsequent instruction are always um executed there's no need to in to record this instruction this is really impos required to record only the instruction that had influencing effects like branches and this is the instructions or the results of the instruction that are recorded think about if you have a a for a current um processor which executes or retire for a billion um instructions every seconds if you really record everything it's a lot of data and this is not how things will work so this inel PT just records the the minimum what is

required to reconstruct things and then later on the uh decoding side it will really um decode things that are executed we will see it later um in PT is really uh resource efficient so the overhead is trust if you you can measure things um with just with take the overhead take the time if you execute with and without in PT is just 2 to uh 15 uh% depending on the workload so is it CPU intensive takes some more overhead but it's really not measurable this is also important for the mare analyst who do some timing analysis some to detect if there is an anti or debugging there um the influence is not there this is why this is really great

the int PT it's not visible from the application side itself like if it's debuger you can do something but in PT is um nearly not uh detectable so this great um Advantage there you have a lot of uh volume for the the what your record sure it's a lot of things that are still recorded and it's supported on all the processors U the Intel processors since Broadwell how is this done just to get an idea um every time something is executed a call a function call and and so-called packet is recorded in memory uh really efficient and this is done in in kernel somehow the Kel manage everything and later on you see all this um packets and

there are a lot of packets that Intel PT supports somehow so a call is a dedicated packet that's recorded and also branches it's all about branches in the processor somehow there are is just one bit so if there's a branch think about if else somehow um just if it's taken or it's not taken is encoded as one bit into a meta packet there and The Meta packet um it's called a TNT take not taken somehow it's just really one bit is recorded So you see it's really efficient how uh things are done and implemented um there but there are a lot of other packets as well to do timing measures in an efficient way there is um

yeah the power modes are detected and recorded somehow Al the P States the processor um how fast the processor spins somehow um it's it's recorded and everything is somehow uh recorded it's uh required to do some analysis so this is how things uh are done in the processor and um let's see uh here an example you see it on the uh left hand side you see just a small main function and um is there is in theid it's a little bit small but probably you should see it it has an if L statement and it adds um an counter or increments or decrements the counter there and you see on the upper side it's just with object

dump the function um the main function disassembled on the lower out you see R to the flow craft generated so you see the the function prologue then then if s uh branch and uh the function AP log and what you see on the right hand side is really what the Intel processor Trace uh Records you you really see the the recorded things and you see the PT right this is some marker you can add to your code um just here this is a marker you can add this result in some instruction and record it um by um if you enable things and you see it here and within this you see just an Tak or not taken a packet

there so it's here it's um it's an N so it's not taken and this is this is just how this recording and nothing more it's really required to reconstruct the function it's just the call to the main this is recorded before this is the function pointer to Main and here only the not is recorded and this is everything what's required to reconstruct everything there and you see a lot of things happening there I just highlighted we are on line 8,893 here already so a lot of packets and um PT packets are recorded already because a lot of things happen before main right so the dynamic loader kicks in the Gip C kicks in and everything

loading the dynamic this everything happens here so really a lot of things is recorded and you can restruct nearly everything based on this now we see how Intel PT Works how we can use it and from linox um there's u a perf tool not really introduced um 15 years back don't remember correctly introduced by Ingo molner and um Thomas lik now this is the the framework for performance things it's a interal framework it's a this call and it's a user space uh tool and from this um subsystem you can use Intel PT for record things you just um type A perf record say you want to use Intel PT and the workload so you can record a

whole CPU you can record all the CPUs or you can limit to user space application and only use in the user space and not in the kernel space as well so just check the main page for details here it's important that you're limiting what you want to record and not record too much now that we know how to record um the decoding comes uh into play because now we have record this in packets somehow but how we can reconstruct the flow and how we can do this with uh our reverse engineering needs where we want to know what is in what is um executed somehow and there are two levels um you can use the first one is the call

Tray so as I mentioned there are calls that are recorded if a function is called that's always um recorded and every branch is recorded and on the high level side it's good to look start looking at the call Trace so you need you know at the end what function are really called within and then if you get a rough idea what this EX uted you can look at the assembly stream as well then you decode with the focus on the um assembly stream but there's even more so you can look at M events we will talk a lot lit later or look at Branch uh events or look on power events as well so there's a lot of other things you can

do but for the reverse engineering uh side this call tra is a high level analysis and second um the assembly stream um these are two commands you will have in mind and work how does it looks like so if you're take the call Trace what you see there you it on the left hand side you see the the first row the executable that is executed or was where the recording was done then you see the dynamic Shar object the DSO so if code is executed for executable it's not always that it's in the text segment of of the uh executable itself it's also the libraries right if it's called Sprint F it's executed in the DSO and this is

what you see in the in the second uh column and in the third column here you see the yeah the function that is if you don't have debug symbols which is uh what you normally don't have you see addresses here but this is also uh great because if you're having malware that cause system calls that calls uh uh lip Z function and things like that so you get a rough idea what's happening there and then you need to analyze and focus uh on the mware and this is what you see on the right hand side so if it's just a call here it's the main function somewhere you see it on the with the yeah red um Arrow somehow this is just

the main function in the call Trace just one line but if you use the disassembly trace and encode or decode this way you see what happening in main so all the function all the instruction that are executed uh will be seen there become transparent and um on the left hand side what I skipped here was the timing uh so also the timing can be in a two digit nanc resolution can be um shown here I just skip this but this is um you remember hopefully um the the the the ex set u cve from fa um this was discovered because as at H has slowed down somehow and this was exactly the thing how where does this

slows down and where does it block this can be really analyzed here this also this timing Behavior where happens things now and just looking at the the timing aspects where I larger Deltas then you know okay it's happening here and then you can focus um there and also the ex set um CV see it there's some anti-debugging things U so you cannot look with a GDB what happens there or you must patch out things or skip instruction somehow would you regulate it but this works just by using um int PT and you can use uh this framework to build your own tools somehow we can talk later later afterwards um because you know the instruction point and all

instruction are exed it's really not easy but to build your own custom tools and to yeah get an idea what instruction or function are called somehow we can talk offline um I built some um custom tooling also there here on this U thing just uh get the slides after and yeah now some uh challenges or some other tips often you record too much there are filters there are Hardware filters that are supported by int PT um you can start tracing at a particular point so if you can limit things you can limit things with in PT you can filter that's only a particular function it's a function start and fun and and end address somehow you can hand over this

you can uh limit things um you should limit recording because it's getting quite large this important we already saw the PT WR instruction these are important because you can set markers there if you know there are something is in this area you want to analyze just patch out this write this markers there and then you can you have this markers and then you can analyze this the instruction flow really in between so the markers are really handy you always find something you can patch out to add these uh markers and yes then what you already what I already mentioned was the the amount of data I just here for 100 for 10 milliseconds I record Intel PT for all the CPUs here uh

on the larger system the recording size is 2. 7 megabytes file size but if you decode this it becomes nearly 1 gbes it's really a large one it's decoding so you see how good the compression was with the Intel PT packets but if you code decode this becomes quite huge so it's good to limit things to time limit things what you want to to decode and yeah and sometimes you get overloads because of lot of data still uh recorded with Intel PT you can um increase the buffer size for the recording and there's also a snapshotting mode so that the recording is always is done somehow but then you send an SEC user to signal

to per and then it just get the the buffer to user space and you don't um experience this overflow thing so just a little um some tips here also so some limitation it comes to the end is you see a lot but what it's not visible even to in PT is if the instruction is not there regular and cannot be decoded so right it needs the the elf binary or the elf object to decode things if the it's shell code it's loaded from the uh from the internet somewhere it's will not cannot be decoded if there is using the mware using some kind of interpretor things uh and it cannot be decoded right because it needs a reference point in

the elf objects to get from the branch information to decoding information so this is uh where it's will not work and it captures not data so if you debug you can get all the registers and look at the register this is not recorded so you don't get any data that are uh transferred somehow for arm there are some some counterpart corite etm somehow um and for IMD there is no uh counterpart but you can just execute things on uh Intel and then this Works somehow and now we have 4 minutes left for questions thank you so who oh okay we have a question hi um so looking at the limitations it kind of makes me think

like um how well would Intel PT integrate with other re tools except for perf maybe with fles with Ida Pro or gitra because yeah yeah this is a great question because I really started to things to integrate things there is no integration for this there's a library and a small application the stand alone uh application provided by an Clan somehow a stander loone tool but there is currently no integration into the um larger reverse engineering tools somehow so for this I Al I started to build an own tool to get things done so really see what is executed but this is a toy prototype somehow I think there is a lot of things that can be done in the future

but I am not sure if there's another program I never see one having challenges with like merging of the static analysis and so on so yes so it's I'm what I use there is to switch to tools there and then I get an idea in with in PT and per somehow and then I switched to redera this is what I'm use I'm somehow a the user so I'm somehow red tool user somehow and then I combine the uh the results of both tools somehow this is how I work but I think yeah there's some possibilities and do more with this technique uh with some high level uh reverse engineering tool yeah thank you because for me absolutely this

is something new and I never had tried it but I'll definitely take a look at yeah okay any other

questions hi you mentioned you wrote some scripts yes that do check check okay sorry uh you mentioned yeah like this one I'm curious have you written some scripts to uh monitor like low-level code like firmware code or like system management mode code cuz with uh IP you should be able the package should be able to capture all of that as well right which would be even invisible to the operating system I'm just curious if you explore this space no not no uh just uh analyze the kernel code uh but uh not below the kernel mainly it's user space code uh what I analyze mhm okay I'm doubt that in in this ring that the the recording is even

allowed but I'm not sure really uh the fir stuff okay thank you but a great question but uh yeah but normally you have everything happens somehow in user space somehow and then you limit on users space the allare stuff right it's yeah that's what I focus on um so any way to detect that something is running with Intel PT sorry so let's say if I am a malware any way for me to know that like I'm being instrumented yeah um as I mentioned um I mean this is great for Intel PT right so you don't the malware have problem to detect things and the malware do do a lot things to uh try to detect if it's

debugged somehow look at pet call it P system call if another one open the proc file system and this file system if it's traced somehow looks if it's a BM executed somehow this is great uh for uh because it's cannot be detected uh besides probably some timing aspects but I mean it's some regular uh in the regular area where it's slowdown of 5 person cannot be detected in this way okay uh one related question so Intel also has pin which does the instrumentation by your own kind of logic yes uh any significant differences if you would yeah um it's a complete different stack somehow um the Intel pin the Intel PT is really the hardware layer which

recording this so there is no correlation to Intel pin and in this way really distinct technology is okay thank you yeah all right thank you for your questions we are now one minute past so um if you have more questions please catch them in the hall and give them one more time a big round of applause and thank you so much

Reverse Engineering And Control Flow Analysis With Intel Processor Trace - Hagen Paul Pfeifer

Related talks