← All talks

ABC to XYZ of Writing System Management Mode (SMM) Drivers

BSides PDX · 201949:031.7K viewsPublished 2019-11Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
"SMM is a special-purpose operating mode provided for handling system-wide functions like power management, system hardware control, or proprietary OEM-designed code.” – Intel Software Developer Manual System Management Mode (SMM) has gotten a lot of attention for being the most privileged processor mode, which raises concerns over how software and firmware manage hardware. This session demystifies designing and writing System Management Interrupt (SMI) handlers, and covers challenges that developers face in the process. Content covers different types of SMI handlers and various methods of invoking them. The session also describes common vulnerabilities that can result from incorrect coding practices or oversights. Debugging is critical to developing quality SMM drivers, so this session also demonstrates debugging using virtual environments (OVMF) and physical platforms. Brian Delgado (M.S. Computer Science, Portland State University) Brian Delgado is a Security Researcher in Intel’s Platform Armoring and Resiliency team (PAR) where he is currently focused on applying fuzzing on UEFI code to identify code security issues. He has worked extensively in firmware security including Intel’s SMI Transfer Monitor (STM) feature to help protect a hypervisor against malicious BIOS code. Brian is also a Ph.D. student at Portland State University where his research with the EPA-RIMM research group focuses on SMM-based continuous monitoring for rootkit detection. In his free time, Brian enjoys learning about photography.. Tejaswini Vibhute (M.S. Computer Science, Portland State University) Tejaswini Vibhute is a Security Researcher in Intel’s PAR team. She develops security tools for automated firmware validation leveraging fuzzing and virtualization. She has worked extensively with System Management Mode (SMM) security at Intel. Prior to joining Intel she was a member of the EPA-RIMM research group at Portland State University, where she completed a masters thesis on utilizing Intel’s STM to enable firmware-assisted rootkit detection. As part of this work she published the first publicly available Xen patches to launch Intel’s STM. In her free time, she can be found taking care of her furry friends or traveling around the country.
Show transcript [en]

alright let's get started welcome to ABC 2 X Y Z of writing system management mode SMM drivers we have today are Brian Delgado and Szechwan e booted and Brian is a security researcher in Intel's platform armory and resiliency team that's the par where he's currently focused on applying fuzzing on UEFI code to identify code security issues I can actually use that work that could actually use that he's worked extensively in firmware security including Intel's SMI transfer monitor and that feature protects the hypervisor against malicious code Brian is also a PhD student at Portland State University where his research with the EPA our IMM research group EPA rim ok on the SMM based continuous monitoring for rootkit

detection in his free time learn about photography see if you could take a picture of a rootkit love that sweetie is a security researcher and Intel's part team as well she develops security tools for automated firmware validation leveraging fuzzing and virtualization she's worked extensively with security at Intel prior to joining Intel she was a member of the EPA room research group State University where she completed a master's thesis on utilizing Intel's STM to enable firmware assisted rootkit detection as part of this work she published the first publicly available Zen patches to launch Intel's STM and in her free time she can be found taking care of her furry friends or traveling around the country let's hear a warm welcome for Greg and

tisha Sweeney

thank you so yeah we'll be talking today about writing SMM drivers and some of the security concerns over them so some of the potential vulnerabilities with is very privileged code on our platforms so I think many of us may not have heard of system management mode unless we've been monitoring on the news and the firmware space and vulnerabilities it's one of the most privileged locations on Intel processors but yet it is not very well understood among the community so we wanted to have this talk and we also note that there's a number of open source firmware hardware platforms becoming available that allow those who are in the community to write their own firmware and tested on systems and so

there's just more people that could have a chance to work on firmware and so we want to share some best practices with how to write some of this very privileged software and some of the security concerns thereof and I think that even if you're writing other types of software some of the principles here would also be of relevance to you so system management mode has been used for a wide variety of operations on platforms traditionally it's been used for power management and low-level Hardware control for example if you have a like a novo think pad like I do the volume button and Linux I will generate an entry into system management mode where the firmware will actually

increase the volume for you instead of invoking an OS driver it's also used on high-end servers that have res features reliability availability service ability to handle memory errors it can also be used for thermal events if there's excessive heat and the processor the farmer can help remediate that and then UEFI has some usages for variables and some of that leverage in system management mode so we're gonna provide an overview of what system management mode is how it operates provides some tips about writing SMM drivers some of the security considerations around that and how to debug this code and also provide some helpful resources may be useful to you in your journey so we focusing today on UEFI so UEFI is one of

several different code bases for firmware and ek2 is the open source implementation of UEFI and so if you're interested in looking at what some of the firmware is that is on your system that is part of your firmware that's a lot of publicly available at the second link below and there's also a community called Siana core that supports this open source implementation of UEFI so on many systems they'll be running UEFI and a large portion of that will be code that runs this ek - along with platform specific firmware that your OEM might add we also provide some helpful links on a third thing below ok so let's talk about system management mode so platforms today have become somewhat

more complex we have firmware that is able to interact with a hardware we have some firmware that is more special like system management mode or SMM that is the most highly privileged fervor on the platform we also have hypervisors that may be virtualizing privileged guest OSS or non privileged guest OS ennis so there's a lot of different trust assumptions that we need to be aware of as we write software and as we take inputs from different layers of the software so system management mode is entered with a system management interrupt or SMI and these are the highest priority interrupts on until platforms that trigger entries into the most privileged context if you're familiar with non-maskable interrupt NM eyes or device

driver interrupts sm eyes or higher priority than these and will be processed before those while the SMI is being processed the operating system or hypervisor is preempted until the completion of this SMI processing and the SM eyes memory which is known as sm ram is designed to be not readable or modifiable from outside of sm ram so that includes cpu-based accesses or device based accesses and so the privilege of SM codes being ring 0 plus a little bit of extra there are certain resources are only accessible inside system management mode and so it is a little bit beyond what you would be able to access from a ring 0 context so give a high-level overview of how these these

SMI flows work in this example we have all of the cores for example in the operating system and then in step one an SM is generated and we'll go into how to generate a little bit but once that happens all the CPU threads will enter system management mode and in the process the register states of each of the CPUs will be stored in the SM Ram savestate map so that's just a table and memory that just stores the copies of whatever's in those registers at the time that the SMI occurred and these tables are unique per processor and available to the SMI handler for use in processing that SMI so then the work the SMI would be accomplished and then the

interrupted register state would be restored to the CPU and the RSM instruction would complete and the course will go back to whatever they were doing at a time of the SMI so that's a very high level flow of how this works and go into a little more detail flow and upcoming slides so one useful resource on a cpu that helps us locate where sm ram actually is is called sm base so this is a register that can be read from either that sm ram safe state map or from of the model specific registers and ass are called ia32 sm base this will provide the base address of a sembra ham now each CPU will have its own sm base value so that

they're independent contexts can be stored and not clobber each other but the way we can access this not directly through that register but through looking into the state state map or through the MSR and so one thing that's very important is that sm ram is the SMM be protected from CPU accesses or cash flow accesses from outside of sm the way this is done is through the system management range registers or SM RR and this is basically a base in a mask that are used to denote the location of SMM memory and block accesses to it from outside of that range so if you take an address and and it with this mask and you get the base then that address is

protected by the SMR R so if you are code let's say in the operating system trying to access sm ram memory maybe you're trying to read from it you would get back all F's and so that read would not be processed if you try to write into it it would not take effect and this sm are our registers only writable from inside sm m and there's also another mechanism against device DMA access into SM Ram this is called T sake stands for txt segment and basically Ram would reside within the T sake and the t's say covers the same region as the SRR and there's a useful register called tamb that provides the base address of T seg so

these two mechanisms we can protect SMM memory from CPU and device accesses so let's look at a little more detail on the flow of these SM eyes so in this example we have CPU threats there in the operating system let's say we have a device driver that's going to trigger an SMI by writing some value using a out B instruction to port B to port B 2 is a special port that's typically armed to fire SM eyes on your right to it so if the device driver does that it will generate an SMI and as we enter as mm the interrupts and SM eyes will be disabled or what we call masked and they're interrupted register state again

stored in at around safe state map once those CPUs enter SMM they would go into the dispatcher and the work of the dispatcher is to identify which s my hand are to invoke for the given right to port b2 in this example so as we wrote value 81 that would match up with as my handler B in this example and one of the CPU threads would be elected to perform the work of the SMI and so there it is transferred to yes my a nor B for processing but only one thread typically does the work the SMI and so the other threads will loop inside the rendezvous code so once thread 0 has finished its work

returned back to the rendezvous code and then as we would leave SMM the the rsn instruction interrupts re-enabled and sm eyes would be enabled as well so that's generally the flow how this this works okay so we wanted to talk a little bit about how you actually go and write an SM driver and we recognize that most people this audience may not need to write SMM drivers but they are very privileged software on the system and learning about how to do that and perhaps play with it on some open source code bases could be something that could be of interest to the audience so hand it over to Sweeney here should you so you could take two and once you have

that firmware you need a place to actually deploy and usually there are certain make abodes RF boats to help you in the process like Mac's or an up scribe boat two that have their sources available that you can modify and and your driver to it if you do not want to go into buying a platform itself then there's options like emulated advancing of QEMU which have support with ovm F which allow you to add your driver to them and you can just run your former you know emulated environment so yeah here are some nice images of minnowboard max and AB squared bode you can check them out yeah so when you're working so OB MF

what's really OB MF it's opens project which so you can all right semis are the ones that are that could be included by our customers to close right to this point I bought a me too my so and so because I semis a part of the film where they have their routines as well as runtime routines the energy routine is actually executed during a phase when your form starts loading up so this method so that all of this is done before I am ready to lock I'm ready to lock your SM Ram is completely locked so that beyond this point your your modified so during post boot whenever possible as in your and running your the

corresponding routine entry will be initiated so how would you go about writing this driver and dk2 sources to give provide you with an example we can start off with creating a package directory inside the main indicator sources and every package directly will have its own easy Antarctic DSC files to actually in our case a friend the actual package directory yeah inside this package actually Woodcreek go about it create your driver directory directory and every driver directory will have its own INF file the friend app directory is our directory which will be the UEFI application so the doc deck files and RDC files are part of the package and they are unique per package the Dec files are

declaration files and they will contain all the goods that your package defines the EZ files will contain the libraries protocols that are being used by our package package and the dot either files are poor driver braces so these

the INF could contain a subset of the libraries from easy files and the entry point the init routine entry point will be mentioned as part of the INF file we used the I used time and every file at the dot a TC and iron file will have its own special good if you start something

all right so the energy so as I said the energy routine does the work load time free boot time so it needs access to it has access to at this time at this time it will do the location of what which protocol you want to register

[Music]

you can add the dot Enis file of it into platform package dot DSC and dot F DF files so these two files will be picked up by the build and all the drivers mentioned as part of these files will be well then the dot d e AI f EF eyes will be created for them mmm so this was the bill command for you min about marks if you are working on ovm F environment you would modify the o vm f package or DSC and the FD a file to include your Honda's INF file and that would let Ovi Maps build command know that it has to pick up this handler so overflows time to check the time we use or consumption

memory and information leaks among others so we'll go through so in these examples and goes town in Oregon and this example on the right doesn't do much checking and overwrite store apps more than it really intends to do so let's see what can happen so one more ability we want to start with is confused deputies so this type of ability is one in which a privilege code module can be tricked or coerced into performing operations that maybe should be doing on behalf of less privileged code so that is a very key concern is the most system if you have it acting adding of malicious driver that would be something very negative so you know what kind of related example

was think pone which basically was a situation in which non-privileged code was able to provide a pointer to SMI handler and they asked my hand I would happily go and process whatever code was at that point but that pointer could reside outside of sm ram and so it's much easier to get code outside of a summary i'm run the inside isn't ram so this is a case where there's no sandy checking over what the address was going to be executed and we've highlighted several other these this kind of gets back to you know any operation is performed in SMM you need to think very carefully about who is invoking this providing to you and make sure you

proper security checking over there have been some architectural improvements in SMM to better armor it in some of the recent years one of these is applying the principle of least privilege to harden SMM and this leverages page tables so some of this these features are not new at all but they're new to being applied in SMM so one aspect is the attributes for pages and so properly applying attributes for example making code sections executable but writable making data pages read write or read only can help reduce the risk of SMM code that has gone awry so this is now part of the k2 code base and the other feature we want to call out was memory

isolation so SMM page tables now don't map OS or the mmm memory into their page tables if you think about why it may be that because maybe you shouldn't have access to those resources if it doesn't need that access so this can help reduce the potential of cluttering memory that really should be accessing as a coding error maliciously crafted attack so by using these mechanisms we can kind of better constrain the SMI handlers access to memory and not give it full access to all the memory as it has traditionally had related feature to this is what's called a comma fur and so it's been a high-risk operation providing data to SMM as well as there's a risk of either

the SMI handler clobbering non SMM code or ran itself being clobbered by the SMI handler were improperly do an operation that was not Santa checked so what does that add it is called the communications buffer comma fur and this is a BIOS reserved region that is outside of sm ram and provides a safer way to transfer data between non SMN and SMM code so by using this sort of secondary memory region data can be exchanged without risk of clobbering these two entities so you'll see more SMI handlers using these types of mechanisms as opposed to providing pointers and registers which was traditionally done for many years so in terms of the best practices you know

again carefully consider all the implications of the operations that are performed in SMM particularly if you're taking inputs or triggers from honest and i'm code and we wouldn't want to call into OS or hypervisor code and execute it because that is done with full SMM privileges and also just do very solid buffer checking input sanitization before we use data using the combo fer can help avoid clobbering memory that should not be clobbered and there's api we can call a pointer to double check that this pointer is not within the SMR are not within SMS memory the page table base security enforcement can certainly help there's also a feature that will cause a fault if code tries to execute code

outside the SMR r and this may not be but is one that can keep your cpu from executing code sm context in which is copied to a region that is too small to hold it and the remaining data can overwrite adjacent bytes in memory and attackers have traditionally use this to inject executable code into handling SMM is potentially vulnerable to this type of issue and there have been issues like right SMI handler where improperly or maliciously constructed inputs could cause as my handler to copy data that was kind of arbitrarily length into a space that was much smaller so it's not unheard of an SM so again we need to be very careful about you know sanity

checking the buffers and the pointers looking at the size of the data we're receiving if we turn on the the page table base security enforcement that can help because if code is not on an executable page it would not be executed and it may trigger like for example a hanging or but it would have less of a security impact than clobbering something that was important and in tow we're also building automated tools to help detect some of these types of issues so relatively recently there is the host based firmer analyzer released the aging tree this is a laptop and automated issue discovery over your code we're also building a tool called excite applies virtualization and virtual

platforms to do fuzzing SMI handlers so using off-the-shelf buzzers to go look for issues this is as we take data from buffer so we check to make sure that the size is appropriate given the video you're expecting and returning if that is not the case we also check the pointers to see if those are in locations that are appropriate to be read from so this is the kind of flavor of the Sandy checking this should be done for your SMI handler so it doesn't just natively or naively trust whatever is passed to it from outside are also a possibility the range of what you're expecting that could result in data corruption or the correctness issues we

have seen in this example maliciously or properly constructed variable name gonna be too long and yes I was not expecting a link of that size which caused an issue this compound good coding practices looking at the data using 3264 checking the math results done on these integers for example making sure the page table basically reinforcements are on using automated tools where possible to discover these types of situations through fuzzing time if we think back to how all the CPUs enter SM and aren't around us to do operations processing but there is a potential for DMAs to happen to memory still and so they're having issues with all the cores SMM doing security checks over data in a calm buffer but then DMA

comes along and over writes a combo 4 which is my handler code so one way to remediate that is to make local copies inside some RAM of the data going to be processing so that you're not at risk of DMA based attack that's clobbering or you just checked so this example this uses some api's to make local copies

memory consumption is interesting one like to platforms may have tons of memory but as a very constrained resource as my handlers can do dynamic memory allocations so my handler to allocate large amounts of memory that could cause problems for other miners and how they handle that situation could be interesting so as a driver developer I think about the amount of memory you need to allocate is a finite resource in this case and free the memory when it's no longer used there's also a very helpful memory profiler result so you can turn on this profiler and we'll show you the usage memory inside your operating context as well as it will provide logs of all your memory

allocations for example done in progress my handler and so you can get this helpful report your memory utilization

this example we've allocated memory but we're also of course freeing it so it's something to be aware of with your be sure to free time when you're not using it and lastly there's a potential issue for information leakage so if you think being very highly privileged code if you have secrets its registers those contents back to the host environment that would be negative so if there are secrets that you want to preserve sighs mmm you sure to scrap those before you return data back to non SMM code otherwise there's information leakage between contexts we want to talk a little bit about you know all the security challenges with with my drivers there's also performance challenges so

you can't take infinite amounts of time as a man because you're taking away time from the operating system and hypervisor for example so important to say we've done a study looking at different durations of estimize what happens as you spend more and more time away from the OS and basically there's guidelines to spend around 50 microseconds 0.15 milliseconds [Music] so if you so as you all right then you're on EMF and GB UEFI applications for debugging applications its lot simpler but this one in front go on to debug it can be a little tricky we need certain more resources yeah so there were some additional topics we wanted to cover if we had some more time like this

libraries available to help you check if your record is doing something bad like use after free and stuffs one of accesses of memory mm-hm then UEFI protocols do you carefully use them don't use anything that's not defined in SM ROM just not good practices then yeah and here are some more infos on educator trainings it's the Intel STM's some of our edk to document steps thank you

[Applause] thank you we have time for questions and what we can do is just any questions yes

so question was what happens when memory is exhausted in sm ram so at that point if you try to allocate more memory it will fail and so there will be a return code that you can look for just to make sure that your see that allocation had succeeded or failed and so that can be caught by software but yeah at that point because it is a finite resource that there's nothing to free up additional memory and so it is

yeah so question is how do these page page tables work if you're entering a real mode and what most are entering in yeah so traditionally yes my Hannah is entering in real mode and they would now a transition between modes and so 64-bit as my antlers are quite common now with ek2 and so yeah it will do some Oh transitions as we enter SMM and then turn on aging and access

a long time ago like in the old like 8090 BIOS you know it was very much like 16-bit and assembly code and that was you know not the state of the art today so I think now there's the flow is different and they will transition to 60/40 at least on the FI

the question is what if there's overflow in the process of turning on the page tables so the trick there is that this is kind of like single entry code and doesn't really branch and it's hard to get it's hard maybe for an attacker to get into that part of the flow so I think - they agree that just stays single execution it doesn't branch it doesn't take and push from outside that should be less common but yes I could very early entry points of SMM or it's somewhat difficult to get that code modified as an attacker

I mean I only speak to us in UK - but there is going to do this - what you may find in the industry is that you know some IEDs or we take snapshots of the ek2 and then customize it with their own work but what they do I'm not sure but if you look at of course oh it's on the codebase so you know you might find that on corporate laptops there might be a higher standard and you might find on some other types of platforms so clearly one would probably want to sign the BIOS firmware and that's health security but there probably are platforms out there they're not using the proper mechanisms yes

so yeah it's a great question like you know why is mm and not doing something as a driver one thing that's unique is its platform independent so if you were to write code driver that can run you no matter what so I can example the volume control oh yeah could write as my hand or just just the volume button windows and so you do get kind of this run once capability it's also somewhat useful for handling situations are hard to do in the OS so maybe you want to do very low level controls of hardware and the operating system so it has this nice property of being below the operating system so if there's things you just

want to have access to that you know

that's true yeah like you don't have to write my volume drivers in SME it's certainly no s SM has traditionally been seen as more privileged context maybe a faulty driver which would like passwords or hard drives and things like that so that was a way to sort of do some of that outside of the range of the operating system and more but you can also find there's been talks where there is effort to see if we can reduce the scale of the code SMM to put some things that weren't before into OS drivers just as a goal of reducing the attack surface so there is a move today to sort of see

yeah I think today UEFI is a big portion of the installed codebase although some of these newer firmware is like coreboot and stores are getting more attention but it's still you defy has been around for quite a while so it's there's no ground to cover yeah yeah yeah

all right Brian tea shop sweetie great job