← All talks

"How much can commodity hardware help on closing microarchitecture timing channels?" - Qian Ge

BSides Canberra · 201843:4645 viewsPublished 2019-02Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
BSides Canberra 2018 Slide deck: https://drive.google.com/open?id=1xz1fWXGwy4c6ilW7SEuA7BeKmq-YyhPZ
Show transcript [en]

the talk we have now is a buy too young and it is about how much can commodity Hardware help on closing microarchitecture timing channel so let's welcome to young hello I'm Chandler so today I would like to share with you some experiment some experience we had I'm trying to closing microarchitecture timing channels with hardware provided operations this work is done together by your well Frank and Ghana so what this talk will be about this talk will be about timing channels and will be about building timing channels on shared hardware lastly it will be about our experience on closing these channels so after the talk if you have interest in the contents please just feel free to

download our paper which is currently published as a print print the name is you precise early information and there's nothing you can do about it okay so as I mention this talk will be about tiny channels what is timing channels the timing channel is done by exposing timing variations for communication so apparently for building these timing channels we need a time when your time sauce to measure something and how the information believed is linked by either fast or slow progress of the tape later on allow this talk we'll learn about how these timing channels will be built on shared hardware components so the timing channels and timing attacks normally has to format one is called

channel and secondly is a channel so the color channel is maggiorin try to transmit information to a spy through its own a fusion faulting the hardware into a particular state so here we demonstrate a children trying to leak information to a spy through a color channel as a channel is similar with color channel but the main difference is a channel is a spy trying to recover sensitive information from an including victim so here we demonstrate example where a spy trying to steal a private key that is used by a proto perfect Gordon right so we'll say that as a channel is a special case of color channel the existence of color channel represents the potential of such channel

while we say that it's because they're using the same technique demonstrating or cover channel I demonstrating the technique is much simpler than demonstrating using the same technique to build a third channel to stealing some secret so our research is trying to solve this question how effective is existing architectural support for closing microarchitecture timing channels and with what we actually discovered is the timing channel based on shared hardware components can now be closed with all the available hardware flashing operations and even though the recently the the condom Edward have recently provided by Intel for attackers macro code which is a temporary mega code fix for the spectral attack cannot be used for mitigating the attacks based on the

branch prediction unit it was fairly effective our old course but not much more recent course okay so the thread model we used for this research or they study is a cover channel where we're having a children and spy they try to leak information through a shared piece of hardware more specifically there we have a tauren who executes within a risk restricted domain and we having a spy which also excuted on the same piece of hardware but it's outside the Rishi restricted domain and they having a time shared use of a quorum we say that this attack our researcher is trying to appeal you know the timing channels on the share hardware components but how to build timing

channels the foundation of building this timing channel is by creating resource contention on the shared hardware components and we targeting on those components that within those a port right so these components now is making are the level 1 caches that is level 1 instruction cache now 1 data cache the branch target buffer and branch prediction Union for example and we say that the contention can be used to transmit information but how exactly that will be used for transmitting information in order to understand that we need to understand how the cache works so assume now we have in any way associative cache what is cash used for it's used for shot in the gap between the CPU the shutting of rickson's

frequency gap between the CPU and memory right so what does mean that it is quick it is contained the most the memory contents that has been most recently visited if I have a program I try to visit some data first of all I need to search the cache I see my content in a patch if it is within the cash that's a cash sheet right so though he served directly from the passion if he's not within a - I need to serve to go all the way down to the memory and fashion that is much slower process compared with cash kit that is cache miss so because this fast and slow can be measured by time you can

actually be used for beauty a timing channel you know timing channel techniques I want to briefly introduce how cache misses happen there are three type of cache misses one is compulsory fees which means that you will definitely feel the attachment for the first visit a second would be capacities which means that the cache size is limited whatever you need can may not be posted or in a cache at the same time and most importantly that is related with timing channels there will be the conflict niche miss which means that the same cache line is happen be used by different for Wells and in the timing channel scenario or cover channel scenario that is the children try to

create conflict miss and the spy can detect enemies using a time sauce right so here we actually come down to a camper example of of how to build this timing channel attack and here we give you an example of this tremor prop attack technique this attack is actually used by here expire and children they're trying to convert information so a shared piece of cash and this attack including three stops first of all crime that means that the spy trying to cram a cash side with his own cache lines and then a weighted that children have the chance to a direct or not touch any cache lines and eventually the spy will probe again trying to time in the cache

line that is said previously so that the time the probe time actually held children something right it tells if it is the prop time is longer that means that the children was previously trying to evict some cache line and it succeeded that could be means that the charger was trying to study information with what if the spy doesn't detect any changes in his probably time that means that the children hadn't touch any line before and that means their children what try to sign him with zero for example and this particular actually attack technique has been demonstrated before on the cloud base scenario where we having a spy that hosted by a Linux virtual machine success successfully

conducted cryptographic attack which breaks the album of private key that is actually used to fight some other algorithm inside another Linux virtual machine so this previous work actually demonstrate this attack is practical on the cloud as well and what now we know the techniques and how this community channels is built think about it we can actually know that the panic attacks are possible everywhere that means that wherever the resource contention is possible we can see a time attack that could be possible means the attack can between the applications are authored by different developers by running on mobile phone or could be that the attack between the workloads that conducted by mutually destructive users that hosted by cloud

provider what is more severe it could be conducted between the untrusted and trusted components on the major platform where those components share a piece of hardware so what we learned so far we know that the micro object catcher attacks computing it we need two factors when you timer when you that is a time sauce from Agra our purpose and we also need ability to create result Caucasian so in summary the existence the existence of mycological can attack is because the fusion are previously running program can affect the performance of carbon planning program that a team of making of timing channel work so just building the channel is not enough right so for doing a research study we

also need to fertilize and represent the channels that whatever we build and matter in this world we're trying to use a channel metrics graph to demonstrate the resource contraction between the spy and Victor aspire so with the hope of quick and translating the resource contention in two colors here we're having example of the channel matrix and color in this graph indicating the probability of observing a particular hot wife for Jenna in P and as per the scale on the right what does that mean it means for example will be highly likely to observe an output as 24,000 for a gala input X 250 than all the other axes and in this world all the experiment we are most like we are most

of a our others experiment ways and in this way we have cited as color diamond channel on the shared piece of hardware so here we're having that cover timing channel build on the shared catch that using Premiere Pro technique what we have again is we having a torrid inspire the executing on a different security domains they're having a timeshare user core they are separate but not a lot hot so the tauren here whenever event if it trying to change a number of cache size that a use on each run we're regarding that as input act right and whenever spy runs the spy try to measure the total cost of probing abhava and that is regarded as of the one so

now we know what is that X is the number of taxa used what is y the Y is the total time in the CPU cycles for probing the buffer valleys measured by the spine okay so now we learn how we set up this experiment let's bring the meaning into this graph what we can see here we can see that when the ax increases we having a highly likely had an increase Y that means that the heavier the contention created by the children the longer the probing time detected by the spine so another way to say that the problem high maintenance by the Spy is highly influenced by the cache contention traded by Jordan that is actually you

know demonstrated in this heat map so all my demonstrating the compassion will not be enough right so any study or research you need a number to represent the result and in this study which choose to use channel capacity to represent the capacity of the channel we might refer in this particular example that just a graph we show here this is the channel capacity is 4.0 this for usage that means that after each one of the Georgian inspire their maximum ly candle information four bits and in terms of the channel bandwidth that is calculated by multiplying the capacity by the m2 the symbol race so this experiment setup the tauren inspire they can have 500 watts per second and

therefore the channel bandwidth the channel bandwidth would be 2,000 feet per second for the students so hardware cache flashing operations has been love believed as a very reliable method to mitigating cash base timing channels because it is believed as a reliable method to recite in hardware to a defined state right and it has been you know mentioned a reference in a lot of previous study and literature the cool thing about is it is it is doesn't cause much because if you think about you flashing a small cache right the alwah and data cache the most all the most recent actually six platforms that is only survey to okay and these the contents are left in the cache after

contacts which will be called an epoch and sloshing caused polymers and a downside of that is you relying a hardware manufacturer support for doing these operations so these will sounds very simple and easy right so you just conduct hardware type flash you're a contact switch and making sure that the cash would all be always be cold after the kernel exit but no one so far has been asked the question does this method actually work and that actually print the scope of our work we are trying to examine the hardware support for eliminating microarchitecture timing channels with the hope of tryna answer is question would a channel still be able to observable well all the hardware

munication operations are all engaged here for we design this work as a following first of all we implement multiple color channels I secondly we identify the processor instructions that can be used for mitigating these channels certainly we measure the compact capacity channel capacity with and without a mitigation technique so it sounds very much straightforward and simple right so in terms of the microarchitecture typing channels what I we actually implement is we implement a form of grammar prototypes on all the intro called cache line components and these are the l1 instruction cache there are 1 data cache the table look aside buffer the branch target buffer and branch history but remember the tag reviewed or the primer product technique it it is

all about creating conflict misses right as we might have before so how we created complimented mrs. and this different hardware components it just depends on how this hardware components work so the l1 instruction cache because the actual caches instructions we're creating the conflict message by executing change jump instructions the l1 data cache as most people know here it is it is caches to memory contents that use for reading or writing about so we created the conflicts by reading about and the table lookaside buffer it is work as a cache for paging so for the pages the memory pages you visited there for repeating conflicts by visiting pages the branch target buffer is actually caches the destination of the

jump instructions therefore were creative in the conflict by executing jumps lastly the branch history table is used for caching the result or a history of conditional branch instructions therefore were creating the conflicts by taking or not taking the conditional branch instructions okay so now we know that how many hats we're going to run but where are we going to run it so we choose to run this color channels both acts actually affect an arm platform and we select three active sites - platforms as well as - arm cosmic platform for the accesses platforms which choose to use Sandy Bridge Pasquale and skylight and that represent three different generations of active six platform and for the art platforms will choose to

use the Armco taxing knife which is implementation of protecting of arm 3 7 and we also use cortex a53 which is implementation of our operating so if we look at it here the manufacturer idea of these platforms we can see that we actually selected so likely we selected these platforms along the the generation we hope our result will be representable for a different type of microarchitectures so in terms of what we use as hardware for flashing the the hardware state laughter by you know the previous running for well we actually spend a lot of time digging the hardware for web programmer menu and trying to find what is offered by the hardware manufacturer so this table here list

what we actually find and what we use eventually I'm not going to the details what is listed in this table rather than going to the details I would like to give audience 20 seconds to look at this tape to see what is inside

while you're looking at it I will ask for a small favor could you actually using a short sentence for one sentence summary what is inside

I see people shaving shaking their hands

okay so I'll try I'll try to summarize because I've been spending so much time on I'm trying to understand my work I'm presenting my work in front of people but uh so this table absolutely represented like trying to at least all the information we gathered that used delete they offered by both exclusives and arm platforms to flashing the different part of you know the hardware right we see that we have a upper level cache TLB branch predictor professional and something else I found that was a very lonesome without the message I actually tried to deliver every open is after you spend so much time digging the problem values you find there is actually no systematic way to describe

and summarize the constant but you don't really know what is offered by the hardware menu and you cannot confirm that easily alas you making sure you've made every piece of the information in the related session on ISIL e6 this could be easier because x86 you only need to understand one big menu that's 2,000 pages for example you know where things are Paula arm is a little bit more difficult because arm for example our bracelet has is own programming menu and an implementation of it also has its own programmer menu I still ARM Cortex a9 as its own programmer menu but anything externally like how to flashing the l2 caches to actually indicate it in another separate

menu that is you know that is used as l2 cache controller so that means that the matter are very specific on different architectures and we can see a very huge back gap between the assays that in instruction set architecture and actually six actually offer different things you don't know what is back until usually you know research searching to try to understand a conferment what is more interesting is there is no hardware support of either disabled or flashing the instruction professor in this table we can see here some architecture can give you support on disable or flashing the professor used by theta but there is none provided for professional instruction because interruption profession is highly related with your

performance that need to be protected okay so in this work the system setup we use is we actually use separation kernel configuration of the SEL for microkernel what does that mean what adds for the separation kernel configuration mean that means that we use a round robin scheduler and only timer interrupt is allowed to you know trigger the switch between the children and spy and the children is fine they used physically destroyed physical memory this is this system setup it actually as much separation as a system can offer which means it is a bad place for mitigating the timing channel or to any time in China study and we're doing the tight flash operations that we've

previously mentioned that all of them in a big table during contact switch so before we're actually going to see the result I would like to spend a little bit time introducing the format of our result so we're going to list all the channel capacity wait and without mitigation in the in a Cell right and that's for all the testing machines for all this implemented channels right so we have a channel from the top to the bottom valve II the l1 data cache and one instruction cache TLB BTB and branch mister Bava so in each cell at a lab decide that is unmitigated channel capacity so if we can see a channel there that means that

we successfully implement a channel on the right side that is the right side value is the maximum really mitigating the channel capacity so that is a capacity to imagine after a way contact or implement or you know engage all the hardware provided cash flashing operations or hardware residing operations so the number in the parenthesis finish the statistical part allowed by the simulation task for zero compatible channel because these study is and for doing experiment they're probably some chance of having the sample errors therefore we'd like to have an idea of the SEM is the channel we see is due to the noise or it there is actually channel so we gave you a specific test

what the zero capacity channel would be and that is in the Parana system we say that even mitigated channel capacity is larger than the stated Papa therefore a channel is highly likely to be real so here comes our result what we actually discover who actually see that for the channel we implement here we successfully implement all the channels but what is more important most interesting is none of the mitigating channels we see at the right hand side in each the cell is larger or sorry it's a smaller than a statistical bomb right so it's highly likely the channel even after all the mitigation method involved that provided by hardware manufacturer it's your real channel and all these

right numbers were highlighted here is the case where we have seen from the visual representation the heat map that we introduced before that there is a definite Channel so I'm not going to tell you to give all the details of our discoveries but here I would like to hear the highlight about what we found for using the tag fraction operations to mitigating the l1 instruction cache shadow so that is remember the big sale here we involved in which will enable all this massive across all the testing platforms during a contacts page but we still be able to observe actually observe a one instruction cache on all the testing platforms so we have seen some you know

example of what a channel look like in the heat map here is an example of you know would be a close to zero capacity channel before we actually looking at a bad place hi guys so when we actually walk to load if that's a zero capacity channel it is we see Y is evenly distributed for all the axes right we see a very you know no matter how wide the bar is with the even spread that means that we cannot cast just basic knife just based on looking at Y what act could be for Darwin

okay so previously we imagined that you know people normally believe flashing caches can actually mitigate all the cash based timing channel and this is a understanding bridge you know we use the cash flashing operation provided by 96 for mitigating the l1 infraction catch bass channel and we see the operation is probably it's Model T only moderately effective it only car reducing the capacity by 64% not a way to say we cannot fully mitigate the channel why we say that you see this you know the bedroom visual representation we have in here when the ax is between I say one to five with that Lake observer and I have the Y's definitely having a higher average than all the other why

that's true that is you know affected by or influenced by the other accents as a result we can conclude that you know the cash flashing operation doesn't work it cannot mitigate the l1 I cash this channel with with the cash flashing operation what about well we see the cat fashion operation doesn't work right so what about employee engagement everything that's provided for accessing by including the TLP large disabled they have prefetching and cat flash operations that is the big hammer as big as as much as you can ask from the hardware manufacturer these days as a six-run in town we still be able to observe the channel with those engaged I hear still right so when axis between 1

to 5 with that way world why have the why of the average on why definitely higher the rest of the world that is influenced by the other axis so we say that the channel is doable because he hid her from the Bob okay that is interesting I think it's what what else we are with our we found a similar story which when Holly firfer unit and branches free channels it's because there's no specific mitigation available for targeting reciting the state left on those components what is more is the TR b channel cannot be mitigated with TLB entry flashing operations I would like to you know refer the paper for details so people may ask what about the

spectral mitigation provided by Intel spectra was such a big safe during a place Marissa given New Year it's like jumping information bomb in front of people saying hi new year so what about the mitigation master introduced by intel for medical inspection inhale introduced a set of mechanism that is called interactive branch control and you can that is enabled with some kind of micro act my cocoa things which you can use for mitigating that particular time but it doesn't declare yourself to clear branch predictor state even though that's a case we still apply the temporal temporal my cocoa fix for the spectrum pack all our testing machine to see how it works why we prefer forensics is because shortly

after that operate release inhale Rick recorded microphone feeds because people find them severe rebooting problem uncertain microarchitectures therefore it was recording animal for after we apply that time for my cocoa fix what we discovered is the branch target buffer channel capacity reduce more than 50% on hostile machine but not so much on skylight and where is more interestingly the black history before our channel reduced to almost zero of hospital by doesn't change our skyline while we learn from that is the interactive branch control mechanism that offer the Intel can close the channel but is only on certain testing platform or it's effective to close the channel but only on certain tangent problems so just to summarize what we found what

about is the cash flashing operations cannot fully mitigate in the intro call timing channels battery that is contrary to the common theory what people believe you know type flashing operation actually faster and as a result we demonstrate that my name Michael act actual features aiming at improving average case performance they producing high-capacity tiny channels that cannot be closed with any available hardware provided PopMatters even including those often suggested for closing those particular channels despite some of them being very expensive right so what we show is the big hammer offered that hardware manufacturer doesn't work as people expected so some people ask me the question what about as Jack could ask you have to do better so I she asked by

definition it provides logical data flow autumn and clean by including inquiry in the memory and spark scrubbing registers once there is a unclip it a dozen describing state Alice's directly observable another way to say timing channels are all side there's record so nice reacts doesn't prevent timing channels as a result there recently there are lots of research literature published the actual real task that demonstrated timing channel attacks inside as yes so another question people may ask okay now we see the hardware flashing operator doesn't work what about software swapping can I do some software you know piece of software to try to we talked about raising the hardware stage so the problem of using the software's revenues it requires a

correct understanding of the hardware state and that is very hard without support of official documentation and for people actually you know working trying to understand how these caching works inside CPU whatever city are you using you will know that this is basically Nala days impossible and it's not can be committed by the manufacturer if you want to reverse engineer the details of how this this hardware components works it is very hard as the current hardware club platforms are too complicated to understand because the comfort would have so many common components by including the professors and gradually power and the replacement policy used by the parish components every tiny details of how this hardware components work to actually highly

related how reliable the software is grabbing my needs I think about these details will be changed in a cheap stop right therefore I personally believe guises about their operations cannot be validated and not used for support effective device so we all see what current look like what about future the future will come even better what can we offer we argue that there must be architectural mechanisms that provided by hardware manufacturer go flashing all the hardware dependent on car state and what is more important for providing them is you should an effect performance where they're not needed for example of characteristics there's no way to selectively flashing a level of the cash so as it will only

offer infraction for flashing cash lines that contains in all the cash level of cashes what we see from the spatula tack as the spectral defense that are provided by intelligence the channels can be closed if the hardware manufacturer walks I will see that the Rajvi through our channel was actually closed almost down to zero on the hospital platform there was a rabbit fix that cannot buy Intel for mitigate what else would can I do I think we can offer some dedicated hardware design I just use at the sandbox for running very secure software or some piece of software that need to be highly trusted or there could be any other kind of hardware design which gave

you you know isolation from the rest of the system okay so that brings to an end of my talk what I was trying to tell everyone here the timing channel exists you cannot stop them and we need new hardware if you interested in this topic and related work please feel free to take a look at our previously published work we also have a website that contains the current status of this this research in his pocket there's timing channel pocket adapted by the trustworthy system in [Applause] you