← All talks

Evasive Maneuvers: Strategies To Overcome Runtime Detection Tools - Amit Schendel and Nir Levy

BSides Prague37:57107 viewsPublished 2025-04Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
Show transcript [en]

Oh okay thank you. Thank you. Okay so before we start we need to understand how runtime detection tools even work right because in order to understand how to bypass them we need to understand how it works uh on the surface. So we divided it into three main parts. Uh we think it's the easiest to understand it that way. So the first part is the telemetry collection of those runtime detection tools. The second part is the rule processing. What do they do with the telemetry collected? And the third part is to notify to the user once the detection is found. So wait so many technical problems. Sorry. Okay. So the first part is the telemetry collection and we are going to focus

today on random detection tools that are running on the host mainly not like API security and and those sort of things. So the main telemetry being collected from the operating systems are things like processes files network activity system calls permissions and especially when today we're going to focus more on the cloud era of runtime detection. So the events are also being enriched with the context where they are running at. So if in the 2000 the the the antiviruses would just send the telemetry on the processes this time when we are running in containerized environments such as Kubernetes or if we're running on EC2s in AWS. So the security vendors also enrich the events with some extra data for example the pod

or the name space in the Kubernetes and the container name. So now that we understand which telemetry is being collected, we need to understand what is the security vendors are actually doing with this uh information. Now they can do various things. But the new tools like more of the uh cutting edge tools are trying to do more of anomaly detection, right? because the tradition detection of uh hive scanning or bite pattern scanning and all the other rules that are being used for many years they are good but we see that it's not enough like attackers are still breaching in they are still finding ways to bypass today there was a talk here about bypassing all of those

so the newest player is trying to create something called anomaly detection for example assume you're running in a Kubernetes cluster and we have a runtime detection tool that want to monitor uh for attackers on this cluster So what some vendors are doing are actually building profiles for the applications running Kubernetes if you think about it your operating system like Windows at your home PC it's a bit unexpected like every day you can do something new in the in the computer but when we're talking about containers the behavior of a container is pretty expected like it's an image it's pre-built it's running something if you record it for enough time you can understand what the container is doing so for example we can

build a profile for an application We can record the processes, files, system calls and network activity and therefore enforce anomaly detection on top of that. We can also write some custom rules as we are going to see. And the last part after we understand what the tools collect and what do they process is the notification to do and this is the easiest part obviously we can see the alerts in the UI. we can get notification in Slack teams pager duty whatever platform you're using and of course some of the tools are also trying to respondse to the attack or prevent it so now that we understand how runtime detection tools work we need to understand how the telemetry is actually

being collected and now we can tamper that now the most widely used technology today by security vendors is ebpf and I'm going to pass it to Neil to explain about ebpf and then we can see how we can bypass So now after a explain about how do run random detection tools actually work before we will dive into EPF let's talk a little bit about the Kernel. So when you're writing a process maybe a Python process that's need to open a file write to a file then close a file it actually runs in the user space and when you're writing it to a file you actually want to interact with the storage or with the hardware or if you are sending packets

you want to interact with the network interface card and how do you do it is actually using the kernel. The kernel is a very complicated piece of code which is actually the heart of the operation system. it management memory. It has different drivers and kernel modules which are basically allowed you to interact with the hardware using uh the kernel modules. So if your Python process wants to open a file, you actually want to the kernel to load it to the memory and bring you back an identifier which is called a file descriptor. And the way you are interacting with the kernel is using using an API call system calls. So if I would like the kernel to open a file for

me and map it to the memory, I'm using open system call and then write this code in order to get a file descriptor and be able to write to this file. So the next question you would like to ask is how why do I want to run in the kernel. So every time I'm doing a sys call there is something called a context switch. Context switches happen every time I want to move between the user space and the kernel space. And every time I'm coming back with an answer from the kernel into the user space and this context switching is taking time. So if I'm running only in the kernel, I'm not wasting time on context switching and I

can run much faster. The second thing is that if you're running in the kernel, you're actually a little bit like a god mode. You can do almost whatever you want. You can read from wherever you want, every st you want. You can write to almost every piece of memory you want. And uh it's pretty cool and you will see it in the next uh slides. So we understand what is the kernel. We understand why we want to run in the kernel. But how can we write code that it will run in the kernel? The first option is changing the the kernel source code. I can open a pull request for example to the Linux community. Send

them send them some mail and then after a couple of months if they would like us to merge this commit they will merge it. and then I will need to wait couple of years until my clients see this code in their kernel. Ain't nobody got time for that. The second option is writing a kernel module. So you're writing a kernel module and then ship it to your client. But there are a couple of problems with this method. The first problem is it probably will be break every time the a new kernel version will be released and the second problem with developing kernel kernel module is that it might crash your kernel. Who in this audience heard about the crowd strike

incident? Lots of people. What's happened there is that falcon the runtime detection tool uh of uh of crowd strike it's actually a kernel module and there was a panic exception in the kernel module and for this reason the whole operation operating system crashed. Why it's crashed? Because when you are writing a piece of code in the kernel and you have an exception there is no other code that will catch it. So for this reason the whole system is panic. So what is the solution for this? Ebpf. Ebpf allows you to extend the capabilities of the kernel in a safe way and a more efficient way. You can run sandbox program in a in the operating

system. And uh Brendan Greg who is one of the APF guru said that EBPF does to Linux what JavaScript does to HTML. So if you want to add a an feature for your to observe your app or maybe make your app more secure, you can just ask one of your friends who know to how to write ebpf or today ask one of your favorites LLM to write an ebpf code for you and then just inject it into the kernel and you will have lots of fun. So how does it look like? We have the user space, we have the storage hardware and we have the Linux kernel. EBPF allowed me to hook in a different places. I can

put a hook in a sys call whether it's open sysol or right sys call. I can put it in the IP stack and interrupt in a different places. I can use technology that call XDP express data path and manipulate and read packets in a very fast way on the network interface card. And whenever a function let's take for example an open sys call is triggered the code that is actually running is also my code whether it's in the start of the sys call or when it's the in the end of the system call let's take a quick look in you can see only in seven lines of c code I am be able to hook the

return from execv execv is a common sys call that's happen when every time you are generating a new process so what will happen here is that I will be able to extract the process name of the process that you just run and send it using per submit back to the user space. Let's see a quick demonstration. On the above pane, you can see IPython. On the bottom pane, you can see Open Snoop, which is one of the most famous uh EBPF example program. Open Snoop actually hook the Open Cisco and open at sysol. And every time I will open a file what will happen is it will it will retrace it. You can see it trace

the the process name. I'm getting the path and also the file descriptor. And if I will open another file for example mag I will be able to see it [Music] down. That's all. So as I just said the crowd strike falcon b issue you say okay so we have ebpf. Why does it safer than kernel modules, right? It can also it can also crash your your uh your computer now. So not exactly. Ebpf user use component that called verifier. Verifier is a piece of code in the kernel. Actually a very complex and long piece of code. Uh you can see it in the kernel source code. And what this is doing is actually verify verify the bite code of the APF

before it load it into the kernel. It check a lot of stuff. We check that uh all the lines of code in your code is actually reachable that you don't have infinity loops that you do not have null reference and etc etc and this allows you to load an ebpf code in a safe way without the risk of crashing the kernel. So if couple of years ago you want your clients to install a kernel module and I guess they would say no it's risky and they are actually right today you can ship EPF code and as you saw lots of companies today and vendors today build the runtime detection tool on EVPF because it's much much

safer. So we talked about how do runtime detection tools work and now for the most interesting part of this presentation let's see how can we bypass the runtime detection tools and for this Amit thank you Neil so now that we understand how everything works we understand how runtime detection tools monitor us we understand the underlying technology let's see how we can trick it so let the games begin the first technique that we are going to talk about is tampering with EPF maps now This technique is already known for a while and let's talk for a second about what are maps in the context of ebpf. So we said that ebpf is so awesome. It's not crashing. We have

the verify. We have the different components different hook points and maps is a crucial component in ebpf because when you write ebpf code by default all the code is saved on the stack. But as you know from your favorite programming languages, sometimes we want to save some memory on the heap, right? If we mess with a lot of data, we cannot put it all on the stack. So in EPF, the way to do it is put the data in the maps. Now a map is like any key value storage like you know it from a dictionary in Python for example, the same concept and it mainly being used for two things. So the first one storing a lot of data which we want

on the hip and the second thing is to send the events that we trace for example the exact V that we trace in EBPF we want to send it to the user mode to the actual agent controller. Now this is being done in different ways but uh most of them are being uh like relying on maps. So it means that if we can tamper with those maps we can actually do some interesting things. So in this technique actually we're going to show how when we are tampering with the maps we are completely disabling most of the famous runtime detection tools and before kernel version 5.8 all you need was uh capsis admin which is basically

root but after kernel 5.8 state Linux are always uh adding more capabilities and we have capsis BPF so it's enough that you have capsis BPF and you can actually tamper with the ebpf tools now the interaction with the maps are very simple for example here is a simple program that list all of the ebpf maps on your system now the important thing to understand is that maps are global kernel objects which means anyone with sufficient permissions can access them and in this case we listed all the maps in the them. So let's watch a

demo. Okay, so on the top pane we actually have Tetragonon which is a runtime detection tool developed by Selium. It's a graduated CNCF project and it's being used in many environments. Now we run it in the top pane and in the bottom pane we get etc shadow. Okay, sensitive file. We want to get an alert from our runtime detection tool whenever we touch this file. So let's see it. We cat and as expected we get the alerts. Okay, what do we see? We see the post execution of the cat. We see the read system calls because we touched an sensitive file otherwise we wouldn't see that. And we see the exit system call. Cool. Now what we're going to do is we wrote a

special program that actually deletes all the keys from the map of tetragonon and then get that shadow again. So let's see what happens. Okay, we get that shadow but this time we only got an alert that the process was executed. We didn't get an alert when someone access the shadow. And this is because we actually kind of disabled tetragonon uh by telling it to not monitor us. And this is a very important part because you need to understand that those runtime detection tools run in many different environments. For example, a lot of them are running today in cloud environments like Kubernetes. And if I'm monitoring Kubernetes, I don't want to monitor all the processes that that are on the

nodes. I want to monitor only on the containers that are related to to the Kubernetes. Now a container is just a process. So I will hold the map with all the processes that I want to monitor on. And if for example an attacker tamper with the maps, delete the Ps from the maps, basically it's staying unmonitored. And the cool part is is that the runtime detection tool doesn't really have a way to to know that it happened because uh it just rely on the maps. Now we took it even farther. And I guess you all used JPT right or any other LLM. So we thought why not making LLM attack BPF. So what we did

was this time we took Falco another very known random detection tool also CNCF graduated tool and it's actually being used in many companies. So on the bottom pane we have Falco. We're going to do the same thing on the top end. We are getting shadow. Okay, we got an alert from Falco as expected. A sensitive file was accessed. And this time what we did was we actually created a Python uh program. Now if some of you heard of the new MCP protocol by Entropic that allows us to connect tools to LLMs. So we actually wrote an MCP server that provides eBPF tools. Basically what it means now imagine CHP have access to uh the BPF system call to do activities

with eBPF on your system. So for example you can see here at the top it's a bit small maybe you don't see it but we gave it tools like get map details the little keys for map enumerate the EPF programs installed on the computer. So we gave it a query and we said, "Hey, I'm running the Falco EPF tool to monitor my system. I want you to find all the EBPF maps in the system. Find the maps that are related to Falco and delete all the keys from those EBPF maps. Now we let it run. We can see it's doing some tool calls actually calling the BPF system call enumerating the the machine." And in the result he said, "Hey, I found

several LBPF maps blah blah blah of AI." And he gives us the name of the maps. I can confirm it's actually the maps of Falco. Awesome. Now we're going to do the same thing. We are going to get it [Music] shadow. Okay. But this time we see Falco is sleeping. Nothing happened. So the same trick that worked on tetragonon worked on Falco this time with AI. Actually pretty cool like gives us a hint into the the future of offensive security maybe. And you can even take it farther. You can connect tools for example to GitHub or Google and then the LLM can even search in the Falco repository which is open source even find some more interesting things.

But this is just demonstrate that there is a bit of a problem here with the BPF. It's actually not that trivial for defenders, for the security vendors that write those tools to protect from this kind of attack and we're going to talk about it. Okay. Another cool technique that most of the runtime tools that are based on EVPF are vulnerable to is time of check time of use. Now, this backlass is very interesting because it's pretty hard to defend from if you don't have the right hooks in place. And it basically happens when where you're copying buffers around. It's very common in in this type of scenario. So we said with ABPF we can hook almost any system

call, right? And this is what the vendors are doing. So let's imagine we are putting a K-pop which is a type of a hook on the system call. Let's say an open or exec. Now the data that being transferred to this system call for example we want to open a file. So we pass the path to the file right and in the time of check in the time of the hook the BPF is being executed let's say the path is actually something legitimate but in the time of use we will make it be it shadow our sensitive file and how we can achieve that basically we're going to do a very simple thing we're going to write a C

program with a buffer and we're going to create two threads one thread is going to write etc shadow into that buffer and the second thread is going going to write any legitimate path on the system. Now they are going both going to run and of course it's not 100% uh accurate that it will work but sometimes and actually in pretty high probability it does work. So let's see a demo of that as well. This time again we are abusing tetragonon sorry tetragonon in the top pane we have tetragonon on the bottom pane we're going to create a sim link to etc passability another alert that the random detection tools are doing so this time it alerted on us for creating a sim

link over sensitive file so what we're going to do is we're going to write run the program I just told you about the time of check time of use attack this time we only see process execution and exit but we don't see a sim sensitive file because at the time of the check the path was legitimate but after it it was a sensitive [Music] one. So let's see another technique this time using sim links. So in this demonstration we will use falco again one of the most famous random detection tools today and as always we will cut /tc/ shadow and get an alert as expected. Why we get an alert? because falco identify the hook sys call and see it's we are

opening a sensitive file now this time we're going to use sim link we will create a sim link to etc/ shadow to just a file in the temp directory and as expected you can see that falco just identify about creating the sim link how we did it it actually hook the create sim link sys call and check and see that is the opening a sensitive file a created sim link to sensitive file and we get an alert But this time we will cut the file that we created the stream link to the /d. We got the data of / shadow but this time falco didn't identify us. So you maybe you might say okay you can read

the /c/ shadow many times you want and will not get caught. But the first time he alerted he sended alerted on you and you are right. And this the trick is kind of even because on one side he send an alert on us but on the other side we can read it how many time we want. And this this demonstrate a core problem with eBPF. We said that EBPF is great and is safe and efficient but is actually very limited and because of the limitation of the EBPF it's very hard to write complex logic and resolving the sim link back to the original directory is considered to be a complex uh logic and that's why a lots of vendors that

developer runtime detection tools are vulnerable to this kind of methods. Another option is to use alternative sys call by default a lots of runtime detection tool hooking a c center and c enter and this kind of uh hooks allows them to know when someone open a file but in this demonstration we will use open by handle at which by default also today are not get hooked and for this reason you can see that in the above I'm cutting / shadow get an alert as unexpected because we open a sensitive file but this time we will use another program and the program we will be used is using another sys call you can see that we cut the /tc/ shadow using this

program and not get an alert so it's pretty cool and lots of different v vendor also vulnerable for this and uh yes another option is to don't use sys at all and I don't mean do not use all the system system calls but you can implement the sys call that you don't want to be alerted using other tool for example you can just allocate the memory load the dynamic linker map the executable into the memory and jump to the entry point and by this you will not execute the the exec v system call by the way there are lots of implementation for this on github and you will not be caught by a random detection

tool another option is killing agent. Every eBPF program needs to have a user mode program that will manage this eBPF. So as Amit said, if you are a runtime uh detection tool, you have a list of all the process ID you want to monitor. You save it in the user space and write it to the map into the into the kernel. So if I just kill the user mode process, the random detection tool will be actually asleep. of course that in order to kill the agent you need to have the enough permissions and you need to have the access to the same P name namespace but after you do it you can do whatever you want on the machine without get

getting caught uh there are way to mitigate it for example you can use BPF overhead return to stop the killing of your specific process but also it's enter only on a newer kernel versions and you can use KSI which is a kernel runtime security instrumentation we will talk in a couple of slides Okay, so another very cool technique we already showed some some techniques that are uh kind of like weird situations where the ABPF hooks are don't have the right tools to to handle those quite of simple attacks and another one of those attacks is event exhaustion. So we talked a lot about maps. Now the thing about maps is that when you create a map

you actually uh set the size how many keys can be in that map how many elements. So usually vendors don't want to put too much because then it allocates a lot of memory and their agent is consuming a lot of memory and when running in uh cloud environments where you don't have a lot of uh resources. So the maps being allocated are pretty small which means an attacker can exploit it again in order to for example read the sensitive files. So what an attacker would do he would flood the system with legitimate events. Imagine just creating a program in a loop uh sending a lot of events like the open events to a slashtemp something and

then I want to read that etc shadow. So after I bumped the machine with like 20,000 events I'm going to send my one sensitive event the map is flooded with events. Therefore there is something in eBPF called event events drop and this will happen and then you will actually bypass the agent that way. And of course we gave a lot the example of opening a sensitive file because it's very trivial. But imagine taking that actually writing your own tool or rootkit with a lot of different types of system calls. And you can actually bypass all of the modern runtime detection tools. And there is another very cool technique that we want to show you today. But unfortunately uh the

responsible disclosure timeline is still not over. Um, so we found a very cool way to bypass most of the runtime detection tool today. Uh, a very generic way to make a lot of system calls without being detected at all. And so follow us if you want to to hear more about it in the following months. So we showed you a lot of techniques and as you might see it's not very complicated like we didn't create here any unauthenticated chain of zero days and rce and everything but the main idea that we wanted to say here is that there is a core problem and something like someone needs to do something about it. There is there is a gap between what the

security vendors are thinking their security tools are doing and what they are actually doing. So this brings us to the last part of the talk. We understood our ant detection tools work. We understood the technology. We saw some very nice bypasses. Now let's talk about this gap. So in order to understand the gap, we need to take a look into the past and to the future of security. How it all started. Now, we actually wanted to start with Windows because we feel like Windows has some more miles when we're talking about creating an antivirus or something to detect malware. Now, back then, if some of if some of you remember the way the antiviruses software like uh

Avast or Kasperski or any other antivirus software would monitor your Windows machine is to hook a table called SSD. Basically, it's the system called table in your kernel. This is how the operating systems know which function to execute when for example you are you want to open a file your operating system is doing the system call that we talked about and the system call in the kernel is being resolved in this table. So what all the antiviruses software would do they would all hook all the functions in this table it was a complete mess no one like they were hooking each other and they were creating a lot of the crowd strike episodes it happened a lot in this time

so Microsoft decided let's put an end to that we don't want everyone doing whatever they want in the kernel we don't want uh things to be unorganized and in the newest windows you cannot really hook this table anymore it's still there but you cannot hook it because it's protected with mechanisms like patch guard or hvci. Now, of course, we're not going to get into those in this talks because it's a today's lecture just about them. But we need to understand there was a problem here. So, what Microsoft did, they created a new way for security vendors to monitor. They created a mechanism in the kernel called notification routines. This is basically what all of the

security vendors today in Windows are writing callbacks to this h mechanism. It allows them to to see processes, threads, images, objects, registry and see the activities before they happen and after they happen and they can inject DLS, they can inject all sort of things in those routines. In addition, they gave us a platform called mini mini filter drivers. It's basically allows us to view IO operations and file system operations. It's also running in the camera space. And for networking, we have WFP and NDIS which is a bit older. But we can see something interesting because Microsoft understood that they need to create an organized platform for security. At the end of the day, security is very important both for

Microsoft the today's shipping defender and both for the antiviruses software. So they create a pretty good structure and of course there are more mechanisms but the problem is still not enough. We can see that the vendors are actually really likes to hook system calls and for example you can see avast which is a famous a software have the method of infinity hook which is very interesting you can go read about it but basically allows them to call system calls. There are other techniques like shadow worker which is giving you the option to manipulate memory pages and there are a lot of user mod hook techniques we put here too the ayat or red pitch hot pitching but the main idea

to understand here is that there is still a gap there is still a gap between what the operating system provider gives to the security vendors and what the security vendors actually want. And this gap creates issues, create those tiny bypasses that we showed you that at the end of the day can be built to something much bigger. Now let's talk about Linux because we talked a lot about Linux. So the old way of Linux was kernel modules as we said it was net filter and IP tables for networking. It was sec to view system calls. We have pit trace right the famous pit trace to do some injections but we talked a lot about ebpf right so actually even with ebpf

there are the new ebpf what is called ksi now we mentioned this mechanism when we talked about killing the agent now in windows for example the antiviruses contains modules of anti-tampering right the antivirus software doesn't like that even if an attacker has an admin permission and they want they don't want it to disable the antivirus software. Now in Linux it doesn't really exist like the security vendors as we showed you today all the open source tools none of them implement anti-tempering mechanisms and the reason is that they don't have a lot of tools to do it especially with EVPF. So if we look on the left side diagram we can see the flow of checks uh

that the system call is being uh is going through. So for example the system called open. First of all we do the I note search right we want to see if the files even exist on the machine. Then we have some more checks until we get to the security module uh layer. Okay. Now security modules you may know them as uh SC Linux and Aparm but Linux added a very cool way called KSI which is basically you can place CBPF hooks just as you would place them on system calls but you can place them in this layer. And it means that you get two two benefits here. The first one is that it's almost guaranteed that your hook is

going to be called. And it's very important because as we saw with the hooks on system calls, we have a lot of problems like time of check, time of use and the bypass of not using system calls. But in this case, the LSM hook is almost guaranteed and today there is not really a known way to bypass that. But the other very important thing that LSM gives us is the ability to actually change or response to the system calls. So if you remember at the start we showed you how to write a hook on the exact v on the return. Now with the other hooks like k props or trace point or ro trace point and propbes you can't

really do anything. You can only read you can only monitor on the system call but you cannot really change the values. So you cannot really prevent from the system call to happen. Now with LSM you can and this is very powerful. So for example if a vendor wants to detect its tool from someone just killing the process you can put a hook on the system call with LSM and you can actually see that if the kill for example is not coming from your agent and it's coming from an external process you can just block it. And up until a few years ago it wasn't really a thing. Now LSM is always evolving. It's quite a

new mechanism. It gives us visibility uh like as of Linux 6.14 you have 270 places you can put LSM hooks and it gives us visibility into file system processes sockets and and more. So I think like the most important thing is that vendors need to understand that placing hooks is not a universal solution right we need to actually see what are the vulnerabilities in those hooks and with LSM it's pretty interesting now the problem it's not a universal solution because first of all it's only enabled on the newest Linux versions and moreover for for some reason a lot of the Linux distributions are having hard time to adopt this mechanism like only in the recent Ubuntu releases for example they

enabled this mechanism by default otherwise you need to actually turn on a flag in the bootloadader and restart the machine which of course it's not going to happen not in production in any way so there is still a major gap here that security vendors are overlooking they just accept the the situation as is and therefore like we also the responsible disclosure that we wanted to tell you is is actually if if LSM was in place, nothing would happen. So I think we can finish with a quote again from Brendan Greg. Uh thank you for the wonderful quotes. It says taking EBPF observability tools as is and using them for security monitoring would be like driving your car into the

ocean and expecting it to float. And this is exactly the situation we see in the market today. All of the vendors that are providing security for Linux, they all adopted EBPF, which is great, but they are just overlooking the the security gap here and they just use the observability capabilities of EBPF and DPF was built for observability and tracing. It wasn't built for security originally and they just we can say abuse it. So we have the K probes, the trace point, the ro trace points. The vendors are working any system call they are happy but also the attackers. Thank you for listening and if you have any questions can feel free to ask. [Applause] [Music]

Yes. [Music]

[Music] Sorry, can you speak louder? Uh I wanted to ask if uh maybe you tried maybe the time to check to time to use method against the uh runtime detection tools that are based on kernel auditing framework if they are applicable or any other of the uh techniques that you mentioned that are applicable. So we actually didn't try it on any vendor using kernel modules. We more focused on the APF ones. But even if you're using a kernel module and you're not taking like you're not aware of this uh kind of attacks, you're still vulnerable because even with kernel modules, you choose what you're hooking. So if you're hooking the wrong place or you're hooking too early on the on the

stack, then you're going to be vulnerable as well. Okay. Thank you very much. Thank you.