Modern Linux Kernel Mitigations

Name: Modern Linux Kernel Mitigations
Uploaded: 2023-10-26
Duration: 58 min 15 s
Description: The Linux kernel community continuously introduces security mitigations to defend against kernel exploits, but adoption varies widely across distributions due to tradeoffs between security, performance, and usability. This talk surveys recent kernel security features—including module signing, kpoint

BSides Canberra · 202358:15531 viewsPublished 2023-10Watch on YouTube ↗

Speakers

Ray Veldkamp Matthew Kurz

Tags

CategoryTechnical

TopicVulnerability Research

TeamBlue

ResearchTechnical Deep-dives

StyleTalk

About this talk

The Linux kernel community continuously introduces security mitigations to defend against kernel exploits, but adoption varies widely across distributions due to tradeoffs between security, performance, and usability. This talk surveys recent kernel security features—including module signing, kpointer restrictions, KSLR, memory allocator hardening, and stack canaries—and examines how major distributions like Ubuntu, Debian, and Red Hat configure these defenses differently.

Show original YouTube description

The Linux kernel has long been an attractive target for attackers aiming to compromise systems, as a result the kernel community are constantly responding by introducing security mitigations and locking down attack surfaces. Linux distributions will often weigh up the impact of enabling these features, with the impact to usability and performance of the operating system, resulting in a fragmented approach to adoption of upstream Linux kernel security features. This talk will discuss a range of recently introduced security features in the kernel, which attempt to complicate the exploit development process, and provide an overview of the state of adoption in major distributions.

Show transcript [en]

uh it is on Modern Linux kernel mitigations and it is by Ray Vamp and Matt K so let's please welcome them to the

stage hey everyone uh my name is Ray and this is Matt uh we both work at infos uh we're going to be talking about the Linux konel uh and some more recent security features so Russell gave a really great talk yesterday about the history of security in the Linux Kel and how it went from basically having no security to something that looks like a modern OS uh so we're going to kind of pick up where he left off and talk about some of the more recent things uh that have gone into the kernel so I'm going to start by talking about how a kernel is actually configured and how the dros build it uh and I'll I'll talk a bit about how

things differ between different distributions and and how they build their kernels uh and then we're just going to go through a bunch of these uh mitigations and we'll try and relate them back to how widespread they are um how who's adopted them so the Linux kernel itself uh it's a binary blob um which gets decompressed and loaded into memory at boot time uh on the file system is called VM Linux uh and at compile time that blob uh is built with all of the functionality that you want to go in like built into the kernel uh so it also supports um you know modular functional functionality so it's not practical to just throw everything into your kernel binary

otherwise it would be mive uh so when you're building your kernel uh you can configure with a yes flag if you want something to be built into the blob uh or if you want it built as a loadable module you configure it with the m flag uh and so those kernel modules will be located in lib modules uh and they'll be KO files uh and so at system runtime they can be loaded or unloaded uh via mod probe or inod iron mod um but there's there's also ways for them to be like automatically dynamically loaded as well so the types of things you you can configure are things like file systems uh networking protocols um even like

memory management um but also security mitigations can be turned on or off uh through this configuration uh and so when a when a distribution is releasing um their OS uh they will like often weigh up uh it's a trade-off between like the Improvement that a security mitigation offers uh but also against the runtime performance hit uh so uh there might be a really good security mitigation but if like slows down the OS a lot um they get a lot of push back from the community so they they really kind of need to decide what gets turned on and what doesn't uh on a running system the configuration that was used to build the conal is usually located in boot /c

config and the uname string uh and it'll have like the config of the running kernel and the last few installed kernels some major Linux distributions uh you would have all heard of them that they're like BTU Santos Debian um Arch uh there's like hundreds of others like obscure operating systems out there uh but they're usually derivatives of of one of these um these os's so each dis will essentially Fork the kernel they'll they'll take a version of the Upstream kernel um and then they'll build it with their own customized configurations uh and most of the os's they'll um they'll have like long-term support for certain releases and so what this means is it's like a

promise to the users that they'll get like security like cve patches uh and like security features uh and also other like bits of functionality backported uh for some amount of time usually it's 2 years or five years or it can be longer with like paid support through some companies so an example of um how like Kels can kind of differ is uh like on a buntu by default uh thep networking protocol is included uh whereas on like Santos and red hat uh this network protocol isn't there uh so you can kind of see that different distributions will have very different attack surfaces so if you're looking for bugs and you're working them up um usually the attack

surface is something you have to consider so it's not really uh enough to just say I'm like targeting a kernel a specific kernel version you have to look at the distribution and how it's been configured as well uh so an example of um like a security mitigation being backported uh was R bleed uh which was last year I'll talk a bit more about that later uh but these rep bed mitigations were backported to to5 15 and sent Us's 514 kernels um last year uh so attack surface considerations um the surface will like vary significantly between Linux distributions um they will differ in file system types uh things like you Canon unprivileged user mount a file

system for example like a loop back device um Network protocols and drivers um there's also like usern name spaces so they let you like an let an unprivileged user do things like add an interface um send raw packets that kind of stuff uh they can differ in memory allocators uh and we'll see later um cases can be configured uh differently across different disos things like CPU preemption and recopy update preemption um that can affect you know if you've got a bug that depends on a race condition um if you have preemption versus non- preemption uh that can result in like very different code uh that needs to be written to trigger the bug and of course security mitigations

which we'll be talking about today uh the other thing to consider is that not all support is uh loaded by default so you have these kernel modules that can be loaded uh at runtime so your attack surface like depends on what is baked into your you know kernel binary but also what's been loaded um at a given time uh and finally uh most of the bigger like OS distributions uh they have different spins so there's like a server targeted version or a desktop uh or even like an embedded device uh version so these um these will vary in their attack surface depending on what uh platform you're targeting uh so if you're a drro and

you're you're building your own kernel or if you're a hacker and you want to make your own kernel uh usually do it uh through this menu config interface uh so if you run make menu config uh in the root of your kernel source uh you'll get this uh and it's pretty pretty good it's characterized by the subsystem uh so you just go and like tweak your knobs and your um and your features through here and you you'll get a config file out which you can uh then run make so we're going to Dive Right In and start talking about uh some of the security features uh so Russell mentioned uh ksli yesterday so I'm just going to do a

quick recap uh but kernel address space uh layout randomization uh so what this will do is randomize the physical and virtual Base address of the kernel image in memory uh and so what this does is it actually just shifts uh the binary in memory uh by like a slide so the symbols will all have a different address at boot time uh and so if you want to look at the address of the symbols uh if your root you can look atprk Sims uh so in the image on the the left you can see a kernel uh is loaded at this Base address um without kslr when kslr is enabled uh it'll be shifted by a constant amount so

the important thing here is because the whole thing is shifted the offsets between symbols uh to the Base address will be constant um it's just the the slide that changes uh so if you you got a good bug and you're writing an exploit um this usually requires a KS kslr bypass uh so if you need to use any functions or like know where symbols are you're going to need to know uh the Base address uh so this randomization is done um per boot uh and it's it's pretty low entropy there's actually like only nine bits that change so it's fairly straightforward to defeat uh and recover the K last slide um if you've got a good

info leak so the Linux colel is huge millions of lines of code um there's a history of like it leaking data from kernel space to user space um you know in in a lot of subsystems so if you can leak uh function Point like kernel pointer or a symbol address uh and because the offset Remains the Same between um reboots uh you can determine the Base address uh for the kernel image uh so kslr is is good but it's it's not that good um and it's been enabled by default on most major distributions for a long time now so to kind of make things a bit better um they've come up with function granular kslr uh so this was merged

Upstream a while ago um and basically what it does is it applies a function reordering at boot time uh and it's built on top of the existing kslr implementation uh and so what this does is every boot um kernel image is decompressed and the symbols the functions are all like rear ranged uh every time so if you get a single info leak um it's not going to allow an attacker to predict or to know every symbol address uh so not all regions are reordered by FG kslr uh so some some attacks can still use this to construct prop chains um the patch has emerged Upstream but it hasn't been adopted widely by by DRS yet uh and it results

in a small delay in boot time and there's a high memory requirement for that initial decompression uh and there's a runtime hit um so like I mentioned before like distros will will weigh up the benefit that this offers versus the performance hit and they'll they will decide whether or not they're going to turn it on uh config in it on Alec default on uh so this mitigation uh came into main kernels last year I think it was uh and so basically all uh Heap allocated memory will be initialized to zero uh when it's allocated uh so you know if there are information leak bugs around um you're gonna just leak zero like zeros in your data so you can't use it

for defeating things like kslr or you know if you've got sensitive information in your in your buffers U they'll just be zeroed out uh so in the kernel this affects a lot of like Network and P packet processing code uh so the SKB structs are used for network operations uh and there's a set of functions for dealing with skbs uh and so this stuff usually dynamically allocates um memory on the Heap um does things like alloc headers for packets and payloads for those packets uh so uh this this particular mitigation just wipes out a lot of those uh information leak style bugs this is enabled by default on auntu and has been since uh like the Five series kernel but

it's not enabled across all distributions similarly uh we have in it stack all zero so for objects which are allocated on the stack um they'll be initialized to zero and that includes any any padding on the stack as well so as before um if you've got infol you've got this kind of Defense in depth uh in that you're un initialized data will be will be zeroed out um this one's not enabled on many distributions um that I've seen uh likely because of the performance hit so in Linux uh the D message command uh it's used for dis displaying the Kernel's uh message ring buffer so uh in the kernel or in kernel modules if uh if

you're print king or your logging stuff that will show up in D message uh so it also displates boot logs um if you're inserting or removing a kernel module um all like crash logs and stack traces that'll all show up in D message so some of this stuff can be pretty sensitive um so you could leak uh kernel pointers in some of your your logs uh so there's this uh config security dsage restrict option now and that basically locks uh access to D message down to the root user uh this one's also pretty widely adopted uh kpti uh so Colonel page table isolation uh so this was introduced in response to meltdown uh a couple years

ago and basically it separates the kernel and user Space page tables from each other uh so the konel will have access to both sets of page tables uh but the user will only be able to access its own pages and a very minimal set of Kernel uh addresses to to do that transition between spaces now there are some bypasses out there and they involve like constructing chains uh to trampoline with this swap resore RS function uh that's the functions used with csols normally uh so there's bypasses and like with most these mitigations like they're not silver bullets to like locking things down but they do make like the exploit development uh process a lot more

complicated um so this one is enabled by default on most distributions as well uh so Harden user copy uh if you were at Russell's talk yesterday he mentioned uh copy to user and copy from user so these are the main functions the Kel provides for getting data into the kernel from user space and reading data back out from the kernel to user space uh they're generally used a lot in ials uh and device driver readwrite operations so it's a big history of like buggy code uh and bounce checking uh which can lead to unintended like reads of too much data or even uh wres which enable people attackers to corrupt um sensitive information in the kernel so

this hardened user copy was introduced um in the last year and basically that um has some smarts so if uh you're dealing with objects that are created on the Heap uh it'll actually check uh their size to ensure it's within the size allocated and it will also check against page allocations uh and any overlap with the kernel text segment uh so this has eliminated a lot of like bug classes uh particularly around heex exploitation uh and this has once been enabled in by default in most ises um since last year I think uh I'm going to talk a little bit about kallik uh so just like user space in kernel space you can dynamically uh

allocate memory uh and the way you do that is by calling K Malik um and this will get you memory that's virtually and physically contiguous uh up to a maximum of about 4 megabytes on x86 when you call Malik you give it a size uh and you can also specify Flags uh so in the majority of cases um you'll use the gfp kernel uh flag but there are flags for you know Atomic access uh and other things so when you call K Malik uh what you'll do is you'll get an allocation uh and depending on the size it will come from a cache and there are CES which are organized by the size of the memory chug kic is going to return

uh and these are in power of two mostly in power of two size increments in the colel like you can create these CES with the km Cas create analog functions uh and these exist for like creating special purpose CES so for example struct cred jar is a particularly sensitive structure uh so it will have its own C created uh so yeah objects of the same size like you know up to the the next par two will get allocated from the same cache given they're on the same CPU core uh so the reason I'm talking about K Malik is that there's been some uh movement in how these work in the last couple years uh so as of 514 kernel

memory accounting allocations started using their own set of special calic cases uh and these are denoted with this calik uh CG prefix the way to configure that in the kernel is with this config MCG km flag so what happens here is if this is enabled uh suppose you've got two k malic uh for size 128 if one's uh called with the gfp kernel flag one's called with a gfb kernel account flag uh they'll be allocated from different cases uh so if you're doing Heap exploitation you're grooming your HEAP um and you're doing some spraying this will impact the choice of object uh that you can use for that spray so struct message message for example uh is

regularly used in Heap sprays but this is now accounted this is allocated from the kallik CG cases so yeah it will it will just Place further restrictions on uh use of spray objects uh so if you this might be a bit hard to read but uh if you cat proc slab info uh you can see there at the top the umal CG cases Ed by their size uh and then the like regular kall CES uh so this is a new feature uh it got merged just a couple weeks ago um it's config random K CES uh when this feature is enabled um multiple copies of uh slab CES are created uh as non- mergeable uh and a random are selected

every time you allocate an object and that selection is based on the address of the caller of kallik so allocations on different code paths from different subsystems will likely come like come from different uh slab CES uh so this random selection uh it's done based on a per boot random seed uh and this is done to you know prevent predicting using that seed to predict the the random allocation order uh so this has merged into Mainline uh but it's not reached dros yet but from the git commit messages uh there's some like initial benchmarking and it looks like the performance impact is is quite minimal uh so we expect this to to land in dis some point

soon going to talk about a couple freelist um mitigations that have come up recently uh so Heap memory chunks um like when they when they're freed they get stored in link lists called the free list uh and a common attack is to overwrite a pointer in the free list like the chunks uh metadata and get K mik to return an arbitary address so freelist hardening um it tries to stop this by offis scating the freelist point ahead uh and it exols it with a secret um random value uh and the address that it's stored at uh this is enabled like it's pretty widespread it's in all the major dists now uh we also have freelist

randomization uh so when a new slab is allocated uh allocated allocations from that are sequential uh which is good to be able to predict uh so freelist randomization it generates a random sequence of indexes uh and so the first object will be random allocated from a random starting position and and all allocations after that will follow that sequ that random sequence uh so this aims to complicate Heap overflow attacks um there are workarounds for this it just complicates the exploit development process uh so going to talk about structs uh so when as in user space when compile a struct with multiple members uh will be contiguous in memory and their ordering will be as per how the

struct is declared in the source code uh and the offsets of each of the fields depend on on the size of the fields before it so structs with sensitive information so things like capability sets user IDs um or function pointers they're an attractive Target for corruption uh if an attacker has arbitrary right capability uh so there's a history with struct cred and a bunch of other like task structs um corrupting fields in this for the purpose of privilege escalation uh so given you know the offset of the sensitive field in the struct um corrupting that could give you increased privileges so uh they came up with this GCC plugin ranst struct uh mitigation uh and what this does is it allows a

developer to annotate a structure so that the layer of that structure will be randomized uh at compile time uh so stru when this is enabled structs with like only fun pointers in them they're automatically randomized uh and this is designed to protect um structur structures with sensitive Fields like I mentioned on the last slide uh so this isn't amazing because this this randomization is done at compile time so it will impact um exploit development complexity because as an attacker you'll need to you know get the kernel image figure out the offsets and then kind of like maintain that across versions so it's not super effective for the main distributions where the kernel binaries are used by

millions of people people uh but it's likely more useful for targets who don't publish their kernel builds uh so this one hasn't been widely adopted yet uh and you can hopefully see here uh this is uh this is the implementation of struct cred as of a few months ago uh and the definition of the struct has an annotation down the bottom in red randomized layout so that marks it um as a candidate for this randomization at compile time going talk quickly about uh rep lead I'm not going to go into too many details um but in 2022 rep blade happened uh it's a speculative execution attack uh this one exploited return instructions uh and specifically R plan

which was developed as a mitigation for the Spectre 2 V2 issues uh so proof of concept was demonstrated against AMD and Intel CPUs uh and they were able to leak sensitive things like Etsy Shadow and keyrings and stuff like that so there's a lot of work uh into building mitigations for red bleed uh there were like thousands of lines of codee and they were merged late last year uh and they were quickly backported to uh drro kernels all the way back to the four four dot series of kernels so we'll take a look at one of these because it affects current attack techniques uh so config uh R hunk uh so GCC was updated to now include a feature

it lets you specify an external uh return thunk uh and what this does is basically in functions um they they have return statements uh they'll replace these with a jump to this return thunk uh so this uh feature was added specifically for this um rep bleed mitigation development in the kernel uh so yeah R instructions are replaced uh with a jump to the address of that return thk so now the surface of the return is like located entirely in a single function uh so the reason this is a bit painful is that it impacts kernel R chain development uh for targets which enable this mitigation and yeah this was introduced uh Upstream in the six uh series of the

kernel late last year uh and it was quickly backported to YouTu Centos all the main dist kernels so uh I took a look at the number of rock Gadgets in the kernel before this was backported uh and then and the rck gadgets after uh and you can see in this 51525 um there were about 100,000 rck gadgets ending in a r uh so you want your gadgets to end in a r when you're building a chain uh and then in the kernel after this mitigation was implemented there were only 3,000 uh so there's significantly less Gadgets in the newer kernel or so it seemed uh this is really hard to see but um a lot of these gadgets this is just

output from like standard Rock Gadget tool um most of a lot of these um gadgets end in a jump to this address um which ends in 2480 uh so if you look at proc Kos Sims we can see that this address which comes up a lot uh it actually corresponds to a symbol called x86 return thunk um so if you disassemble this uh symbol uh you can actually see the dis assembly is just a single R and an in3 so provided you know this symbol if you have the kernel image you can look at system. map whatever uh you can still construct krop chains you just have to modify your R Gadget output uh to use

this jump to the return thunk so I'm going to hand over to Matt now he's going to talk about a bit more about kernel modu mod hello um for the second half of the talk uh I'm going to take a slightly different Tack and what I'm going to do is we're going to look at some historic attacks against the kernel and then the mitigations that have been added in to prevent them and I'm going to start by expanding on kernel modules so Ray spoke a little bit about kernel modules before and what they really allow you to do is add functionality to a running kernel and like R said the reason why these so important is they allow

distributions to ship kernels that support the general use case and then anyone who requires something um in extension of that so for example if you want to run uh non-standard networking like multipath TCP or sctp or you have a graphics chipset that requires spe uh a special driver they can ship that as a kernel module so that say for example I have an AMD graphics card I don't have to have the Nvidia code in my my Kel colel modules can be loaded into a kernel in a variety of ways um you have the standard administrative tools that Ry mentioned so that's mod probe ins mod um and to use these tools you have to have the capsis module capability

otherwise you will get a permission denied error separately to those there are also two CIS calls init module and finite module that can be used to load up a kernel module as well and they also require that capability too the final option is it is possible to have the kernel automatically Lo Lo modules for you um subject to some constraints and we'll go through an example of that later so kernel modules provide an attractive um I guess mechanism for an attacker because they allow a root user to dynamically insert functionality into a running kernel so as part of say like a post exploitation chain they provide an attacker quite a lot of functionality to be able to bootstrap things like a

back door or a root kit um and the general way something like this may occur is an attacker gets a foothold on a system as with user privileges um and then they can do a privilege escalation to root and at that point what they can start doing is dropping kernel modules and loading them into the kernel and the reason why they might want to do something like that is to do something like process stealthing um that is like hide their activities from legitimate tools or hooking net filter to look at packets etc etc any attacker is going to use the same kind of techniques to do this that I mentioned on the previous slide so

you've got mod probe or inss mod um they're probably more likely to use init module or finite module as it's easier to do a fileless attack using these two or if they really wanted to they could use the kernel autoload functionality but it would not be pretty so to defeat these kinds of attacks um something called module signing enforcement was introduced and module signing enforcement is simply just the application of code signing to uh koner modules there are two configuration um options that turn this on the first one is sort of really just a halfway measure and it's config module Sig so what it will do is it will say okay well I'm going to allow the load of this unsigned

kernel module but I'm going to log about it in the kernel message buffer and I'm going to taint the kernel so that there is a record that an unsigned module has been loaded the second one is basically full mitigation and what it does is it just straight up prohibits the load of any unsigned module now when we talk about code signing obviously we have certificates and a root of trust as well and these things are no different so module signing enforcement um has a chain of trust and it will only load kernel modules that are signed by trusted Keys these Keys get stored in this built-in trusted Keys key in the kernel and they're usually baked into the kernel

when it's built so all of the distributions uh that I guess distribute these kernels they will be baking their keys in such that when they ship their modules to you as well they will actually be loaded it is possible to add new keys to this key ring so say for example you want to ship your own modules to systems that you control um obviously you can't sign with the distribution keys because you won't have the private key for them so what you can do is you can add key in uh but only if you can have your key signed by a key that's already in there so that's probably not actually going to work unless you build the Kel as well

modules on the other hand are signed at build time but really what that means is they're signed at some point um prior to load the next area I'm going to talk about is uh something called K pointer restrict so historically and Ray mentioned this as well the Linux kernel has leaked kernel pointers through multiple uh interfaces so we're talking proc FS CIS FS D message Etc uh it's usually in the past been possible to say cat a particular file in proc and get the kernel address for a symbol you care about ditto for CIS uh fs and ditto for D message and so when kslr was introduced this was a really big problem because it leads to very trivial KS La

bypasses in the way that Ry mentioned mentioned earlier so K pointer restrict was introduced in Circa 2010 so this is an older mitigation uh to mitigate this and it consists of two components the first is a procfs pseudo file called K pointer restrict uh and it holds uh a tri state so it's either Z one or two and depending on the value of that is depend uh controls what the kernel does when it is requested to print a kernel pointer the second aspect of this is a print K format specifier so if you're not familiar with print K print K is to the kernel what print f is to user space it's simply a way to print a formatted

piece of data um to some endpoint so it could be proc FS it could be CIS FS it could be the kernel message log and so a special uh format specifier was added in which is percent P capital k um and that was for kernel developers to use when they wanted to print pointers and so what happens is when that is used in the kernel when that's executed uh what will happen is the kernel will consult this K pointer restrict value and if it's set to zero the pointer gets hashed um unless of course the no has pointers kernel uh argument is enabled which it will never be on any Dr you'll ever see um K pointer restrict equals one the

pointer will be printed as all zeros unless the reading user has the cap CIS log capability and if K pointer restrict equals 2 the pointer is going to be printed as all zeros regardless of the privilege of the like reading user so root could be trying to read these things and it's it would still show zero which is a really good uh improvement over what was there especially bolstering the Improvement of kslr any modern distribution you see is going to have K Point era strict set to two there's really no real reason why it would be either of the other two um generally speaking we're moving into an age where nothing in user space should

be able to leak a kernel pointer um so two is probably the setting that you'd really like there the next area that I'd like to talk about are user mode helpers now in normal operation there are a variety of scenarios where the kernel needs assistance from user space to accomplish some operation these are things like loading modules core dumping so if a process uh hits an error condition and needs to um Cump bin format handlers Etc and the colel refers to these binaries on disk that assisted with these processes as user mode helpers uh there's a whole heap of infrastructure in the kernel for these things to be called and invoked but the really Salient point that is going to I guess

hold true through the next few slides is that these user mode helpers end up being run as rout with full capabilities and so when I talk about capabilities I don't know if we've mentioned it in this slide I'm talking about caps dis loog like on that previous slide it's basically find grained ability to perform any action you'd like so over the course of since it's been in the kernel user mode helpers have been really attractive targets for attackers uh and the reason is is because it's possible to get these things running as root with a full capability set and we'll talk about an example of how this happens and how it is attacked and you'll see this in CTF

writeups you'll see it in cve writeups as well umh gets targeted quite a lot because it provides a really really cool primitive to attackers but for now we're going to consider the a specific example um that occurs when a user space process creates a socket now I spoke earlier on about how it is possible to have the kernel automatically load kernel modules for you and this is that case and this is well this is one of those cases where that can happen so if you've never done any network programming before a socket is simply a mechanism through which a user space process can communicate with a network uh the C library call for creating a socket is simply there it's

socket it takes three arguments but the one we really care about is this domain argument now domain can be one of many things it can be afin if you want an ipv4 4 socket it can be afin 6 if you want an IPv6 socket but it can also be I guess um less common so it can be things like AF can if you want to talk on a can Network and it can be talk AF Bluetooth if you'd like to talk Bluetooth um but the thing is there's such a multitude of these domains and a distribution kernel that you may have may not support all of them so Ray spoke about earlier that when menu config is done done and these

kernels are configured you can opt to have functionality compiled into the kernel if you specify why you can have it shipped as a module if you specify M and you can just not support it if you specify n and so this hits a problem when we try to create a socket we might say hey look I'd like you to create me a Bluetooth socket and the kernel needs to figure out whether it can support a Bluetooth socket and the cases we have are the functionality is compiled into the kernel the functionality is sitting in a module on disk or it doesn't support it so how the kernel figures this out is it first looks through the ones that are compiled

into the kernel and if it finds it that's great it basically Palms it off to that if it's not it needs to figure out whether there's a module sitting on disk that can support this particular domain and so if it needs to go out and look on disk it invokes mod probe to do that and so the next problem we have is the kernel doesn't know where mod probe is on dis and so what it has is This Global variable called mod probe path and it purely exists to hold the path to the mod probe binary on disk this mod probe path variable is controllable through the mod probe pseudo file and proxis kernel and when the kernel wishes to

invoke mod probe it basically just runs the whatever this um global variable is set to so it's usually set to something like slin slod probe however if you as an attacker were to say change it totm bad. sh it would run that as root instead and that's essentially the attack that usually gets levied against user mode helpers so to elaborate if we want to attack this and we're going to assume we are an attacker and we have a sufficient primitive in the kernel such that we can overwrite with this mod probe path variable is so for that to happen we're talking of something like a write what where where we can write a value we

control to a location we control and what we're going to do is we're going to overwrite mod probe path to point at something else we can control so on the previous slide I threw out an example of sltm bad. sh it can really be anything as long as the kernel can run it so you need it to be like if it's a shell script you want it to have that shebang line in the top so um hashbang B bash and then whatever you want in it it can be an elf executable as long as it has a registered bin format Handler um and it's marked executable if it needs to be you good to go the next thing we're going to do is

we're going to cause the kernel autoload function like module autoload functionality to be invoked and we're going to use this socket example that we used before so I'm going to go in and I'm going to say hey I want to create a socket and I want to talk afan for example um and but the real trick is here I have to pick some something that isn't compiled into the kernel there's a ton of ways you can figure out what's compiled in and what's not at runtime and I won't go into them but just pick something exotic um or something that doesn't exist and what will happen is the colel will go well I don't support

this let's invoke mod probe and by mod probe I really mean whatever mod probe path points at and then what happens is our sltm SLB bad. sh will get executed and it will get executed as root with a full capability set so what that that means is basically we can do absolutely anything we would like at that point with the caveat that we don't have interactive access so you could point it at bin bash but you'd never be able to give it input so one option that is available to you would be to do something like copy bash to Temp and then Market as set you with root and then once the mod probe fails you can just go in and execute it

and you would have a root shell if you wanted to clean up after that you could just reset mod probe path via proxis kernel mod probe uh for those of you that care about the the details um this is a really cool attack because it can be done from interrupt context um which is really really nice so I said up front this gets used a lot um and so there are mitigations that are designed to prevent it and that is the config static user mode helper configuration option and what that does is it basically says all of these user mode helper calls have to go go through a single binary uh and that binary is

specified by this configuration option user mode helper path now you sorry config static user mode helper is unlikely to be enabled on any distribution you'll see today because it requires user space support it requires that the distribution ship a binary uh that can do mod probe but can it can also be a registered bin format Handler it can also be a core dumper and so while that support is actively being worked on it's unlikely to be enabled today but Give It 2 years 3 years and it's probably going to be a thing the next area I'd like to talk about are actually kernel process Stacks but to do so I want to take a quick

detour through V Malik and so Ray spoke about K Malik earlier which is really the Kel's general purpose allocator uh it's going to be the first thing you reach for if you want to allocate memory in a kernel context um but it is limited in the the fact that it can really only allocate you about 4 megabytes of memory and the reason why is because it gives you physically contiguous memory and 4 megabytes of physically contiguous Pages um on any kind of running system uh with memory fragmentation is going to be a tough ask um so if the kernel needs larger allocations than this for example it wants to load up a bigger kernel module

or kernel process Stacks or Graphics buffers or it doesn't require the physically contiguous memory it can use this separate allocator called vmic and so unlike Kik vmic allocations aren't physically contiguous that is the pages aren't necessarily sequential in memory uh it obviously comes with a caveat that you can't use apis that expect physically contiguous allocations such as multi-page dma um but the really cool thing is is that vmic will very nicely give you like a guard page after your allocation which gives you some sort of um base level of of security against linear overflows so how does vmic actually allocate memory so what it does is it will go through memory and it'll say give me basically any page you have I

don't care if they're contiguous if they're free I want them um and so what will happen is it will then take those pages and then it changes the kernel page tables to make them virtually contiguous in memory uh and I'm not sure how visible that diagram is however on the left we have our colel virtual address space and we're going to assume that we have asked V to give us three pages worth of memory on the right hand side of that image yes on the right hand side um we have uh very basic view of physical memory so we're assuming that we have five pages of memory a through e page B and D is

already allocated off to somewhere else and we've asked for three pages so what it's going to do is it's going to go give me a c and e which are the three free pages in our example up there you can note they're not physically contiguous you can see they're quite actually sparse they're separated in physical memory but what V Malik will do is it's going to take them and it's going to stitch the page tables together such that they appear virtually contiguous in the address space I mentioned before uh there's a very nice security property in which V Malik will give you a guard page at the end of the allocation so if there's a

buff in that allocation that as an attacker you can write straight off the end of all the way off the end of the allocation you will hit a guard page and it will um it will basically stop the process that does that uh I don't have an example of an attack against V Malik but I will say that if as an attacker if you are trying to uh I guess uh groom V Malik into an advantageous layout it is possible um I'll briefly cover these so V V malic will give you back the lowest virtual address in the vmic range that can hold the allocation which makes them semi-predictable uh that means you can spray vilic

allocations to get an advantageous layout um but the really big caveat in doing this is that when a vmic allocation is freed it's not truly freed so when setting these things up uh the page tables have to be modified which is the same when the allocations are freed and that's a really costly operation because you end up changing the page tables you end up flushing the tlbs and that's quite resource intensive so to um as an optimization against having to do that every time what will happen is vmic when an allocation is freed will hold on to it and then when it builds up enough that are freed it will purge them all so what that means is refilling a whole to

I guess exploit a use after free um isn't immediately obvious because it will hold and Purge these um these allocations at different times if this is something you need to do uh there are public materials out there you can Google them for example here's one uh from Project zero that had to do a similar thing so that moves us on to Kernel process Stacks so every process that runs in user space has a corresponding kernel stack and this stack gets used when the process transitions to Kernel mode and that usually happens when that process does a system call kernel process Stacks actually get allocated in vmic space when the process is created as long as this particular configuration

option is Set uh and any distribution that you run today is going to have that uh that variable set it actually gives you quite a bit of security from having your kernel process Stacks in vmic similarly to user space buffers on a kernel process stack can also be overflowed so I guess you would refer to this as a kernel buffer a kernel buffer stack overflow um obviously because the stack will be vmall it is bracketed by guard Pages which gives us some protection but there's also a static Canary that's placed at the top of the stack this Canary is not there for security because it is a fixed value it's mainly there for debugging however these mitigations

uh these guard page mitigations don't protect against things like out of bounds right so if you can write from some offset from a location on your kernel process stack um and you can skip that guard page Al together uh there's no protection there so that that specific scenario along with some others um resulted in these mitigations called randomized K stack offset there are two of them um there's config randomized Cas stack offset which is like that halfway measure that I spoke about it compiles the code in and then consults a kernel boot parameter as to whether to turn it on the second one is randomized Cas stack offset default and what that does is it turns it all

the way on and how this mitigation works is it aims to defeat attacks that require kernel stack determinism so attacks that uh require a particular buffer to be this far from a particular buffer um and how it works is when the process transitions to Kernel execution via the CIS call a random offset is added to that kernel process stack and that offset changes each transition and this adds about five bits of entropy on x864 so with no mitigations it looks a little bit like this um right down the bottom there it's probably really hard to see I'm sorry uh but the first thing that happens is user space registers get pushed to the bottom of the stack and

then the stack continues from there what this means to an attacker is that the layout is deterministic and the offset to the end or the beginning of the vilic allocation is known and what that really also means is the offset to other objects in V malic space is also known so that means doing those out of bounds rights and reads becomes way easier the mitigation here is to add that random padding down the bottom which once again I'm really sorry it's probably hard to see um and so what will it will do is it will allocate this random size that it never actually uses and then move which moves the kernel stack around and so what that means is

there's no longer that deterministic um way to figure out how far from the end of the stack you are how far you are from adjacent objects which really helps out quite a bit in defeating these types of attacks that require uh kernel stack determinism uh you will likely find this turned on on any current Dro because it is quite effective the next thing we'll talk about is reference counting um so if you've never heard of it before reference accounting is a very common pattern in software engineering um to control the lifetime of allocated objects and the real question that it answers is if I have an object that is being accessed concurrently by multiple CPUs that was dynamically allocated when

can I actually free it because we don't want to free it when something's still using it because that leads to vulnerabilities or crashes um and it actually ends up being quite a tricky problem uh and reference counting solves that and it works quite simply so as a consumer of an object if it wants to use that particular object it does something called it gets a reference and all that really does is atomically increments a count when it's done with the object it puts the reference which decrements the count uh and when the reference count equals zero the object can be freed so reference counting is a really big deal across the kernel because it is used

almost everywhere uh and it's really used in a lot of things that have security relevance so so things like struct file which is the object in the kernel that represents an open file cred for credentials SK buff Etc um like I said it's histo it's historically implemented as an atomic T um and really all that is is an indication to the programmer that they need to use Atomic safe functions to access it because it can be this count can be incremented and decremented by multiple CPUs concurrently historically reference counts have been a source of bugs uh and it is really really tricky to get this right sometimes and so in correct reference counting would in my opinion

be the biggest case where this arises and so things like if I have a dynamically allocated object that has a pointer to a reference CED object and I duplicate that somewhere else and don't grab a reference to the dynamically sorry the reference cter object that's a bug um and they tend to lead to after freeze and if you think about how complex the kernel can be uh these things arise quite a lot reference counts also present an interesting Target for attackers if the attacker has the correct primitive so if I as an attacker have the ability to overwrite or influence the reference count of another object what I can do is I can make that count lower than it actually

is and what that allows me to do is synthesize a use after free that I could then possibly exploit to mitigate these sorts of attacks the ref count T type was added into the kernel uh and the real mitigation that comes with this is the functions that you use surrounding this type uh and what the how the mitigation works is basically they will automatically detect when the reference count is in a bad State and so a bad state may be something like I as a consumer am asking to get a reference to an object but that reference count is already zero that is that's a use after free something else might be I'd like to get a reference to

the object however its reference count is at the maximum value that an integer can hold all of these things are erroneous and so when the ref count T infastructure detects something like this uh it does something called saturates The Ref count and what it does is it basically sets it to a static value um and once it hits that value nothing else will be allowed to happen on that object so won't be freed but you can't get a reference to it either uh it does mean that that memory will be leaked and by leaked I don't mean to use a space I just mean like it won't be able to be used by the the allocator

anymore um but it tends to be a pretty good mitigation and it's pretty effective at uh mitigating these scenarios where you could overflow a ref count or underflow a ref count uh it is it was designed to and is quite effective at stopping those things so ref count T is ining quite a lot of areas of the kernel however it's not kind of wholesale across the kernel so there are still quite a few security relevant areas uh that haven't been converted across so it's worth still knowing that you can play games with reference counts as a general term rather than these reference uh sorry ref count T's if you have the appropriate primitive and that's us thank you

thanks great talk do we have any questions in the audience just raise your

hands do I see any in the audience cool one over on the top over on up there on the left

hey guys thanks great talk um just wondering if you guys had any insights on how what you've talked about today uh impacts things like uh security context and rail sorry can I get that last bit again so how what you've talked like the colel the kernel mitigations you've talked about today how sort of factors into the overall hardening of security contexts and how that's used in uh operating systems like R yeah I mean um basically security hardening for distribution um it all comes down to risk profile right so as an end user you figure out what your risk profile is so if I'm storing a lot of sensitive material on my laptop I might want a higher security profile and

I might want to turn on more of these um one of the things we see with distributions is that uh they cater to different things and they still have to be usable as well so each distribution tends to um take a different approach so if we think of all of these mitigations as kind of like a grab bag of things that we could use to enhance security um possibly at the the cost of usability they all kind of pick and choose different ones to uh achieve their goals do you want to jump in does that roughly answer your question Cool there any other questions in the audience uh one in the middle at the front

here how well adopted is the signing of the kernel modules so far if I turn on is it going to break on my drivers uh I would say peace me so wouldn't expect to see it on on your standard Dro no um and the reason why I wouldn't expect to see it on is a ton of people still have NVIDIA drivers and Nvidia ship them so that's going to be the biggest problem you see um generally speaking yeah I would say it's probably not on if you're running ra it might be on um a lot more of your Enterprise type ones turn it on um but your Ubuntu doesn't have it um and I don't think

Fedora has it either any other questions uh just to the your yep just there are you seeing much of a difference between uh desktop distributions and server distributions in that regard is is there like a divide between Security Options that are enabled uh not so much security but um all the other stuff I talked about like networking protocols file system types uh that stuff can vary wildly between the two and maybe one last question if there is any in the audience

I've got one Silvia again um hello yeah um so hello over here hi so for us who AR konel developers how how is uh for you end users how do we keep track of what's being introduced into the mainline kernel how do you recommend uh security practitioners especially blue teamers keep on top of uh kernel functionality uh that's an excellent question so generally speaking a lot of these things will transparently filter through to end users through the distributions um I guess if you're doing blue teaming for an organization that has the budget for it a lot of them will actually be using distributions from vendors that have security teams that do this for you so so re has security teams

canonical has security teams they kind of do that in a way so I guess via the transitive property that's one way it could happen if you're on a long-term support colel uh so like with a buntu uh there'll be detailed change logs whenever new they have a six-month release cycle uh and so any big things like the stuff we've talked about today when that lands um they'll they'll usually be blogs and write ups and change logs about it so you don't need to dig into the colel binary or anything like that uh failing that if you as an individual would prefer to stay on top of it the kernel mailing list is probably the good

one the security hardening mail lists um and then keeping up with those and then you can kind of turn on those configuration options as you see fit but obviously that comes with the the side effect of having to maintain your own kernels yeah so I I just we went through the whole meltdown Saga as blue team responding and you know um the actual actually responding to it but not really not knowing what's going under the hood so thank you very much for that no worries we have some speaker gifts for you backstage but let's thank Ray and Matt one last time

Modern Linux Kernel Mitigations

Related talks