← All talks

The Butterfly Effect - Actively Manipulating VMs Through Hypervisor Introspection

BSides TLV · 201932:20114 viewsPublished 2019-11Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
The Butterfly Effect - Actively Manipulating VMs Through Hypervisor Introspection - Sofia Belikovetsky BSidesTLV 2019 - Tel Aviv University - 24 June 2019
Show transcript [en]

all of us in our next talk by Sophia Valley covet ski is it about another subject that is going to take more and more significance in our lives and that is the relationship between virtualization environments and underlying hardware and the title of our talk is the butterfly effect actively manipulating VMs through hypervisor intro introspection yet is so a round of applause first of all and the traditional Cheers welcome to besides Sophie let's all have fun with the butterfly effect please take your seats Sophie the stage is yours after after a turn in the chaos theory that says that even minor changes in the right time can cause can cause big consequences and the example it's usually given is that even

the butterfly flapping its wings on one side of the world in the right time can cause a storm on the other side of the world so it's very similar to what we're going to show how a by manipulating single values in the memory of the hypervisor itself we can okay so how by in manipulating values in the hypervisor itself in the memory of the hypervisor we can lead to big results and in the virtual machines and such as even stopping cyber attacks so a bit about myself a really big photo so I have over 10 years of experience in the cybersecurity and cyber intelligence space and currently I'm in the final stage of finishing my PhD in the Ben

Gurion University in Israel and today I'm going to talk about a challenge that we had in one of our recent projects so in we had in our level we had a huge private a cloud environment and most of the virtual machines running on this environment where ours were fully under our control but some of the virtual machines were provided by external vendors and were meant to function as a black box at least from the security point of view an example of such a VM is a virtual router it was meant to be placed as is in our environment of course configured correctly and just to run we can't really connect to it directly we can't force the vendor to install any

of our security agents inside and practically we're blind to what's really going on inside the virtual machine and this problem is actually very popular and of course it's the same thing in public cloud infrastructures where let's say Amazon is providing the infrastructure and the environment but the virtual machine that we are using they are not connecting to them directly they're not forcing us to install anything inside and what they are doing right now in order to and in order to see what's going on is practically only monitoring the network communication looking what's going on and what's going into the machine and what's going on from the machine and trying to find Network anomalies and that what's going

on and this was what we did in our network but we wanted to challenge ourselves we decided this is not enough and we wanted to find a solution under the following limitations so one we cannot connect to the virtual machine

okay okay and the second thing that we can't install any agent inside and the third thing that we do not want to rely only on the network communication on the external network communication so we started reading about it and pretty right away we came to this really big body of work of research around how how to understand what's going on inside your virtual machine from the outside only for monitoring the network and the memory and the basic idea behind this body of work is do not try to secure your virtual machine it's the same way you try to secure your physical workstations here you have an additional layer the layer of virtualization and let's take advantage of this layer to

enhance our security and so this talk is about that is about our thinking process and how we took advantage of this ability how we build a security strategy of both in identifying the threat to security threats inside our virtual machines and remediating them from the outside and ok so let's talk about the dis virtualization layer so the main component of the virtualization layer is the hypervisor and which and we have two types of hypervisors the first type is a native hypervisor or bare-metal and which which runs directly on the hardware the hypervisors job is to manage and to stop and run the virtual machines that are running on top of it so this is a popular architecture where

the hypervisor is running on top of the hard way of the hardware and an example of such hypervisors is es6 and hyper-v and this is usually more expensive and the second architecture is the hosted hypervisor which is running is an application on top of the existing operating system it runs the virtual machines on top of it and this is usually a cheaper setup and this was a set up in our private cloud environment and the hypervisor that we used is the k vm so when looking at this architecture and security research found that straight away that there is something interesting in the memory just because the virtual machines are running on top of the hypervisor the

memory of the virtual machine signs within the memory space of the hypervisor meaning that by correctly parsing this memory we can understand what's going on inside each of the virtual machines and this is exactly what we wanted so what are we going to do with this ability ok so what we're going to do with the stability and so first of all and we started thinking ok this is great this is exactly what we want we want to see what's going on inside the virtual machines maybe someone already did it so maybe we don't need to do the hard work so we looked for commercial tools that are actually doing hypervisor introspection and they thought the only thing that we found was

the BitDefender tool but it focuses only on mainly on es6 and of course we have the KVM hypervisor and it's not not suitable for that so this means we need to do it ourselves and then we looked at the research ok what other researchers doing so a lot of the work is focused on monitoring the network the memory in order to find a security threat so ok this is very interesting which we wanted to start with something simple so because of the because all the virtual machines that are running in our environment we want to monitor that are really interesting we don't have any access to are provided by the vendors they also are pretty much predictable that there

are services that are running and we know what is actually going on inside all we can guess so the first thing that we wanted to implement was a white listing of running processes we will have some pre-prepared a list what should run and and we will monitor every new process and and see if it matches this list so let's talk about how it's actually done so if I could write an agent this will be pretty much the logic behind this agent and we just go through all the running processes I will find out the information that is relevant that they want to let identifies this process and I will check it against a predefined list but by looking only on

the memory I have this huge blob of it of memory and I'm losing all them and all the capabilities of the operating system is providing for me so what I'm losing just by looking at this blob not by running inside the operating system is of course they can't run any of the API so the operating system I cannot rely on any security events or operating system events and the most important thing that used to really appreciate is the translation of the physical and virtual addresses and as you recall the addresses in memory are virtual and I really need the physical the physical address in order to get the data so what am I missing what do I need to know in order to

implement this so we started working on a Linux in Linux operating system and in Linux the running processes is a circular circular double linked list of task struct structures every data structure and has the information that I need in order to identify a process and the beginning of this list is a kernel simple that's called any task and so I need several things in order to do this first of all I need to know and what is the virtual address of any task then when I know I need to turn translator to the physical address to know we're in this huge blob of memory is the actual beginning of any task then I need to take every task and

cast it into the right construct a structure according to the right Linux version of course and retrieve the information that I need to find the next task the virtual address of next task and do this whole process all over again so thank God I didn't have to do it by myself of course there is a tool and it's a it's been a it's been here for a while so maybe most of you already familiar with it and so there's a totally PMI and github and it's and it's a sea library that runs on the host and it queries the hypervisor and its main job is to translate between the physical and the virtual addresses of each virtual

machine and right now it supports kayvyun and then hypervisors but and but it's in development still and it's post multiple a guest operating system so this is exactly what I wanted and this and we've used this a library and the thing that we need to provide to this library in order for it to work is two things so first of all we need to provide some kind of a mapping between kernel symbols that we rely on and their virtual address and the two ways to do this is look at the easy way and the hard way of course and the easy way is is a bit of cheating so either you have to run like a script once on this

virtual machine to get this mapping or you can ask the vendor of course you can do both so easier is not not really an option and the hard way is to look at the memory and you Rishta cailli and find the process list and and then to find the beginning of the init task and this is what we did of course and the second thing that you need to do is provide the right and the right header files for the kernel simple structures for the specific virtual machine that you are running on and but if you're going to do it either way just just so you know you can just take the system map file from your computer from the

inside the virtual machine which is just a mapping of caramel symbols and their virtual addresses and just in its apply it to the lip vmi and the way Olivia your mind works is it receives all the kernel symbol mappings it takes the right virtual addresses it travels then a page table and the page directory it retrieves the right page that is needed it gives it that to the application the application takes this big blob of memory cast it to the right structure that we have provided before takes the information that is needed and moves on and the way it looks it's a it's really like a textbook example of Libby mi this is from within the virtual machine it's

running the PS after Isis agent inside and you can see that we we get all the information about the process we have this like demo malware process running inside and this is what we're getting from the outside without supplying any password of the virtual machine just by monitoring the memory over the hypervisor we get the same information we of course get the process ID and the process name so this is very cool because because you get all the you can get all the information you want from the outside so what we did for our first security application was we then we defined something that we called a process identity and those were the parameters that we have used for now so

we have defined that every running process is a subset of its name it's a the file on disk that is running for from the shared object that are loaded inside this running process the permission level process and the user that is connected and we have monitored that our environment for several weeks we found out we build some kind of a baseline to see and to see what's really going inside and if it should run and once we have a looked I'll look at our baseline we have compared every running process against against this baseline okay so this was our first security application and it worked pretty well and when we decided to move to something

and more complicated so while we were at digging inside our kernel structures in order to get the shell object and of course we had we started with every truck we task struct that has like a list files that is going to read your files that is connected to it and a file can be several things should really use the clicker can be several things and one of the thing it can be is a socket and we found it a really interesting so by the so by digging inside I know that our socket type two layers down there is a sock structure which contains information about the network connections that are related to this process and the information that we have

and this task is a source IP desktop in stores poor desperate so what can we do with this information so we started thinking about it right now the first application only focused on the virtual machine itself as a separate unit but we are controlling all data center and we're sitting we can see all the virtual machines so so by ok so by seeing all the assumptions we can do a bigger things so we decided to create a context aware firewall so we had we already had firewalls in our network but we had several problems with them first of all and and we cannot we don't have a firewall between every virtual machine of course they are usually on the edges

of the network the second thing from the outside of very limited we can only see the network and information and third if we really have to if we would have been able to put an agent inside we could have a combined in the network information with the information with a state internal state of the the virtual machine in order to create smart rules so this is exactly what we did using the hypervisor introspection so we have combined our process identity with the network communication information in order to create smart firewall rules then so what are like an example of a rule that we can create if we so we can create a following rule we can make sure

that only an application that is running from a specific place in memory that is on disk that is it has only specific DLLs that are loaded inside with a specific user or specific connections can talk to a specific process identity on the other side of a data center and that's why and that's why we are creating and like a context aware firewall and the way it looks is you can see from running let's start from inside we get the information about the connection itself and of course we know what is the process ID that is running it and this is what we get from the outside it's not like we didn't construct a GUI just so we can see that

from the outside view we can match the network communication to the process that is running it to the client process in this case and of course we can and we can look at any other information that is relevant in order to construct this firewall ok great ok so now what we did it it will it worked great we found a find out a lot of the security threats but what we can do with it so we were thinking about it so far we just looked at the writing cup of at the reading capability but there is also a way to write a capability right mmm into memory so the thing that we we were thinking

about was what if we write into memory can we manipulate the behavior of the virtual machine probably right we can always do some kind of damage if you write into memory ok then we said if we write into memory can we create a specific outcome a certain outcome can we for example stop a process and maybe ok this is interesting maybe we can do it so let's look into that the first thing we wanted to do was exactly this we have already a wine-tasting application can we stop process or terminated once it does not match our baseline and how we did it was just we said okay let's look at the test track that we already have and try to

find some suspicious the elements that can influence the behavior of the virtual machine so just by looking on a destructor we have several leads okay the first one is literally almost always it first is the state with really interesting comment there are a lot of others like flags maybe if we change the flags we can stop the process from running and maybe if we play with it scheduling and the CPU time priority and also exit sites okay so we decided to code with the first which was the easiest and try to see what are the values that we can put inside and if you if you can read it like everything here looks interesting and probably

everything you will try will do some kind of damage to this process and but we wanted to have a specific outcome so we won't try it so in order to get a specific outcome we started to read about what's really every value means and and pretty produce right away we got into the task interruptible which means that we can change the value and even if there is a signal that signals it to wake up it will not wake up until we wake it up manually so this is exactly what we tried we have took our demo malware process that was running that we wanted to terminate we have assigned tasks an uninterruptible value into the

state obsessed of the task track and so what happened and what happened was exactly what we wanted the the state of this struct went to an uninterruptible which means that if we want we can run it again and right now when the throw and then when the process is frozen we can actually take its memory we can retrieve its memory encode data and analyze it and if we want we can even resume it it's running without hurting any other of the running processes inside the same virtual machine and so and so it was nice but bad thing we wanted to do after that was to kiloton entirely to make it in - to turn it into

a state that it would not be resumed so this is what we did we played with the exit state as well of the the path state and the exit state as well and the thing we found we found interesting was the exit dead and my time is up so I will just go through it briefly and we have assigned both values and we of course got it where we wanted okay so I will not go through all those guys yeah couple months okay okay great great guys okay I have more to say but an arrant okay so we were this looks a bit random but it wasn't where they were in the room it was by reading the EM at the

open source code of course and looking at the manuals and and by reading a the cotton trying to figure out how to get to a state that I want to get to this is nothing not necessary there is the code there is any path in the code that goes into this state it's imagine like a state machine that usually the code goes into certain positions but manipulating the memory I just I can take my running position and place it everywhere I want inside this state machine so by manipulating both those memories we achieved the dead should never be seen a state of course because it should never really happen and okay so this is what we did for the

running processes application and the second application what we wanted to create was an activated firewall we already have our context-aware firewall and the things we wanted to say to do is exactly the same thing we have a process running and we want to kill it we did the same a way of thinking we found some suspicious places that were suspicious values that we can change and we change them according to the correct value after reading of course the manuals and the code and some examples of how it works is for example here we can see that if the sock dead flag is raised then it will not do not process the data another example is Stoke Don and

flag is shown on then it will will not go with who not to am and bring the memory and further it will break and those are exactly the way of thing and that we have applied for this project and this is not everything that is needed to stop a project but this is the way of thinking is the same so we have changed the state of the flag it put it inside and what happened is not really exciting from like the visual point of view but it's very exciting when it's really happening the process we kill the process from the outside so here we have a process running inside two virtual machines that communicating with each

other and by changing minor values in the hypervisor not touching the virtual machine at all we have of course terminated this this process with communication and the cool thing is and that the operating system is now is doing the hard work for us we have changed the values and the operating system understand that it is in the state that needs to close this connection and it frees all the resources that were used for this network communication and and leaves the state a clear okay so so why this is cool we have done what we set up to do we have built a no agent no touch and not only rely on network communication and solution and this is really cool also

because any modular that is running inside it it will be a bit harder to evade because we are seeing all the memory and an attacker doesn't really know what we are monitoring for what in memory we are looking at and and not and when an operating system does all the work for us once we are true once we are moving from one state to another state the operating system knows how to the job and they closed all the things that needs to close and the cool part here is by some more effort especially in kayvyun because we don't have events and we can also run code so this will be probably for some for next talk

not now and some caveats things that are said that are not true or not really problematic so one of the things one of the problem sort of a hypervisor introspection is pausing the virtual machine every time you do you need to find you need to translate a virtual address and the physical address you need to pause the virtual machine in the current implementation so what we did in order to to overcome this we have worked with some heuristic access we are always monitoring the network and trying to find us tickly if the information is fresh or not if it's relevant or not so this is one thing that we did in order to overcome this the second thing is the

mapping between the channel symbols and their virtual address so if the case that are is turned on then what I said before is not really relevant system map would not be relevant after every reboot so in order to overcome this you need to do the heuristic method every time you need to every time to scan the memory every time the system comes up and find the right kind of symbol locations and they will not talk about the last one if you want see me after and so some takeaways and so this was a big challenge and I think that the limitations even though they were really strict really helped the solution and I do believe an invitation based on

innovation and this is a really cool platform you can get almost anything from the memory of the virtual machine and so the possibilities are really endless of what you can do with it and it's not really utilized enough right now and of course you can always do harm and they learn it how the way so use it with caution thank you [Applause]