BG - Virtual Breakpoints for x86/64 - Gregory Price

Name: BG - Virtual Breakpoints for x86/64 - Gregory Price
Uploaded: 2019-10-19
Duration: 42 min 12 s
Description: BG - Virtual Breakpoints for x86/64 - Gregory Price Breaking Ground BSidesLV 2019 - Tuscany Hotel - Aug 07, 2019

BSides Las Vegas42:12183 viewsPublished 2019-10Watch on YouTube ↗

About this talk

BG - Virtual Breakpoints for x86/64 - Gregory Price Breaking Ground BSidesLV 2019 - Tuscany Hotel - Aug 07, 2019

Show transcript [en]

good afternoon everyone and welcome to besides Las Vegas breaking ground this talk is virtualizing big points on x86 64 and it's presented by Gregory price a couple announcements before we begin I want to say thank you to our sponsors especially our inner circle sponsors critical stack and Bala Mayo and our stellar sponsors Amazon blackberry and silence and Robin Hood so without our sponsors and our donors and our volunteers this would not be possible so big thank you to them another announcement before we get started is this talk is being recorded and stream to YouTube so if you have any questions please raise your hand and I'll bring the microphone over so that YouTube can hear you all so please be

sure to silence your cellphone's out of courtesy for others so yeah let's get started thank you thank you very much so yeah this is my talk on virtualizing breakpoints for x86 a little bit about me to begin with I'm a hypervisor and emulation developer at Raytheon codecs I've got a background in computer science I got my BS and MS from Northeastern and I spent about six years in the u.s. Navy doing network analysis before I eventually went on to the position that I have now and I only say that for any vets that are in the office in the audience you know if I did it you can do it too you can make the

transition and it and it works out well a special thanks to the Northeastern University HPC lab for helping me to develop the solution that I'm going to talk about today and to develop this talk as well so the first thing I'm going to start off with is a little demonstration on this screen I have this really compelling piece of software here that does basically nothing because if you can read it it does a simple comparison and it's checking to see if 0 equals hex CC and so we should never ever see this print statement occur but if I set a breakpoint on this program at a special place 5 3f and I run the program we run into two things here that

are kind of interesting you can see one I have printed out the thing I am not supposed to be able to print and we didn't actually break in this in this program so the question is why does this occur the answer is actually pretty simple on x86 and in most platforms we use a form of binary replacement break point and what this means is that we're actually replacing the instruction in memory with in this case an int 3 instruction which is one single byte hexie see the problem with this though is that if you choose the wrong spot in the instruction to break point you simply corrupt the instruction and so that's what I've done here on on this

program is I've actually modified this instruction here by using an offset break point such that I replace that zero with hex EC and that's why we get the result that we have here now this is kind of an interesting problem and it was one that as I was developing to develop a hypervisor platform that we write we had kind of a small slip-up where we had loaded some wrong symbols and we were causing the guest OS to crash now we were looking for bugs in the OS so we were like whoo we got it to crash but then we realized what the problem was and we were really concerned about solving this for the future it

took a couple of months but after a couple of months we realized you can't actually solve the problem of where it is safe in memory to break point because you have no idea what the program is attempting to do and so that's what we're gonna talk about a little bit today and what I what I call this problem is the critical bite problem basically there's a critical bite in an instruction that we have to hit and can we solve this critical bite problem so one quick caveat to this talk what I'm going to talk about and the the design for this solution is not quote unquote the solution right there are many different ways we could implement this

solution so I don't want you to walk away saying that this is the only way to do this my hope is that you walk away thinking ultimately that we're being ruled by some debugger rules of the past and we need to change the underlying hardware to create new forms of debuggers so but before we can get into that I want to prove to you a little bit that there we can't just use other forms of break pointing methods so we're gonna do a quick review of how we do break pointing across all different types of debuggers the most common one on most architectures is a form of binary replacement so we already went over that

but there's also this idea of single stepping and branching and that this is basically implemented via a special flag and for example the RF Flags register or a special MSR that lets you do single branching instead of single stepping and the idea is you set this trap flag you ham the processor back to the program it runs a single instruction or runs up to a single branch and then it will break and enter and give you a chance to interpose once again so that's one example of another form another thing that most people are familiar with if you're an embedded developer using JTAG usually most architectures provides some form of break point registers it's these

simple comparison registers in memory that when you enable them allow you to break on an address whether it is uh necks acute or on read or on read write etc usually this is limited in scope or limited and availability most architectures provide somewhere between four and eight debug registers on x86 we get 8 and the configurability is rather limited you can only set breakpoints on up to 8 bytes in memory and the configuration doesn't let you set a breakpoint on execute read and write you have to pick just a couple of those features so some limitations there another form of debugger or another form of system that we utilize to debug programs is emulation and this this is

basically we're going to interpose on every single instruction that runs and this is just a sample list of ways that I can imagine in the process of emulating an instruction what I might want to do right I might want to break in and do something on instruction fetch I might want to break in and do something on memory right or maybe I have some data flow type analysis going on and I want to see when the value of our ax and if we're jumping to our ax if that our ax register contains user control data for example so we can come up probably with a list that's like 100 or 200 times longer than this of

ways that I would want to interpose and finally kind of the hot area of research in recent years you'll see a lot of papers talking about dynamic binary instrumentation or DBI and this is kind of an amalgam of all of these different techniques it's basically we're gonna use binary replacement here we're gonna use emulation here when that doesn't work for us we're gonna use break point registers to optimize things maybe we were gonna add some special trampolines to the code and you know it's a combination one example of this for to drive this home is there was a paper from like 2012 or 2013 called spider stealth break pointing and the goal of this paper was to make sure that a

program couldn't see when a breakpoint is being set on on a you know that basically to prevent detection of a debugger there and the way they did this they built a rather complex system not that it's bad it does it does a very good job but they have to have this code modification handler so that if you have a piece of malware that's trying to relocate itself in memory it's no longer safe to to breakpoint those areas of memory so you have to detect that you have these different cache views between a dataview and an instruction view based on whether or not that page is being used to execute or you're trying to read from that page and you actually have to

use special page table management in order to implement this whole solution so very complex and very low-level has to be done in the kernel driver it's not something that we can necessarily expose to a user land debugger for example so there's some limitations there and other people have tried to solve this problem that I'm proposing here this critical bite problem this idea that I don't want my breakpoints to be detectable and I don't want my breakpoints to corrupt the program and so I have a list of papers here that I've done some background reading on but really the problem that they're all trying to solve is the same one and it all of this reduces to the

halting problem and so you can't actually solve and determine where in memory it is safe to break point if you're using binary replacement you kind of extract it out what you're really talking about is how do I determine something behavioral about a generic program and if you have any computer science background you're running straight into the halting problem you shouldn't even attempt to solve this problem because it's a non decidable problem so if we look at throwaway slide there but many so if we look at these breakpoint constructs that we have here we can kind of pick a lot of these apart as to why this one's not quite appropriate to use here this one's not

quite appropriate to use there and the reality is they really do all suck in some way either their corruption prone or they're slow we have a limited number of breakpoint registers or they're slow and complex or they're extraordinarily complex and they'll never make it into modern day tools because the adoption is very limited in scope and so the question that I'm kind of asking is there's something that we can implement a new a new extension to the system that we can implement that will allow us to both solve this problem or do away with this problem and and build better debuggers for the future so one one extra thing here is the question that I

started with was you know when when did some one last review breakpoint technology when when did advancement in this area kind of stop and so there was this paper called a survey for supporting and implementing debuggers from 1990 and I picked out some choice quotes from this where they talked about replacing the instruction and single step thing and instruction emulation and page protection and hardware comparators you can see that we're using the same technology that we used in 1990 to do malware analysis today malware analysis didn't really exist in 1990 so why are we handcuffing ourselves to the techniques of 30 40 50 years ago they did talk about one interesting technique that's not new that we're

starting to see echoes of today this kind of idea of a tagged architecture and so data tagging in the present day kind looks like arm is implementing this what they call memory tagging extensions and so you you you allocate something in malloc and the top bits of memory have some permission bits and if you ever you know smash smash that area of memory or you try to access a pointer that doesn't have appropriate bits set well then then you trap and you're able to detect that but that doesn't really solve our problem of how do we set execution breakpoints that maybe solves the problem of mitigating some memory corruption but we're we're talking about something a little deeper here and so

what if you take nothing else away from this talk what I want you to remember is that debuggers today are built on hardware techniques that stopped iterating in the 90s really so let's start to design what might look like a solution to this problem I'm gonna borrow some goals from this spider breakpoint paper that I mentioned previously because I don't want to try and start from scratch I do want to stand on the shoulders of giants who came before me and so I'm going to take their goals directly and I'm going to I'm going to examine where they kind of fail to solve this problem we want our breakpoint system to be flexible that in their regard they called this

trap on any instruction we want it to be efficient which means we should run as close to real time execution as possible and that pretty much throws emulation out the window we don't want to have to emulate anything we want it to be transparent which means we don't want a program to be able to detect our breakpoints we don't want it to be able to see that we have changed memory in some way or we don't or we want to prevent having to change memory at all and we want this to be reliable which means that if a program is we setting itself somewhere in memory or trying to evict a breakpoints because it's a piece

of malware we basically don't want the program to be able to interact with the breakpoints that we've said in any way and I'm only going to change one goal out of this whole thing I'm gonna change the flexibility goal and the idea that I that I'm kind of hitting here is that instead of trapping on any instruction I want to trap on any address in particular without corrupting the program and I'm gonna add a bonus goal here of trying to make it work for virtual machines because if you know if we can scale this that means there's high adaptability as well so before we can really jump into the solution we need to go do a quick refresher of some MMU

internals and talk about how VMs manage their memory as well so the x86 MMU works like this basically there's a table and we're going to convert this address at the top and that address is chunked off into different offsets and eventually we're going to drill down to a physical frame and so you can kind of see there's this separation there is operating system or debugger resources and there's program resources the actual data in the eventual physical frame the problem that binary replacement breakpoints have is that we're jamming operating system or debugger resources into program space and there's a trust violation here that I will I'll talk about in just a moment now if you look

at this in terms of virtual machines am I going to kind of simplified view here we have two different page tables in one is owned entirely by the guest and one is owned entirely by the host you can see these is kind of your D buggy and debugger but your data frame is still owned by kind of both right because we're jamming host material into the the programs data space if we look at you know this spider breakpoints or d bi breakpoints we see the same pattern re-emerge again and again we have this abstraction where we have guest stator and hosted or D buggy and debugger data but we still have a frame that is shared

and it's seen as property of both so this is the abstraction failure that we're dealing with basically we have debugger data and debugging data living in the same physical frame and this is why the critical bite problem is present so the design that we're going to talk about today we're gonna just eliminate this problem altogether and that will let us solve kind of this critical bite problem from the get go and so the trust failure I was talking about just a moment ago is the fact that the debugger is depending on the debug e to play nice that's never a good sign and and on the other side the the program is expecting the debugger not to

corrupt it to data right so if we come back and we look at the existing dbi solutions you'll tend to see that all of them have this kind of code modification handler and this is their attempt to solve that where is it safe to breakpoint problem because they're dealing with this abstraction problem it's not any problem that this solution created it's that they're fighting a problem that's existent from the 90s and no one that has really taken taken it up to solve it at hardware okay so you get it at this point every other form of break pointing is awful and and I should just get on with it so let's let's talk about the the system that we're going to

design here first we need to talk about the design limitations if we want this solution to be efficient and we want the solution to be adoptable which means that you know a Linux kernel will take the design in after it's been produced in hardware and actually produce something we have to retain old behavior which means we can't just say throw the old the old system out the window altogether we need to retain that behavior and furthermore it needs to be totally optional we can't just force this new thing on everyone and say this is the way it's gonna be now too we have to be able to implement this in hardware for two reasons if we're gonna do

additional lookups or add some additional resources in order to maintain efficiency the translation has to be simple and it needs to be fast and - we can't rely on operating system constructs because that basically means it'll never be adopted at scale it has to be fully supportive all purely in hardware and every operating system should just be able to utilize that interface in a way that's straightforward and three it has to be compatible for the TLB and this one is result of a few months of looking into different designs that we ultimately decided failed but a quick look at the TLB when you're doing a translation of a piece of virtual memory to a physical

frame you basically have this additional buffer where if you've done that lookup before you don't have to walk the whole page table right but there's a limited amount of information in that TLB entry that we're going to be forced to work with and that's basically we have the virtual address when it comes in we have the physical address when it goes out of the TLB we have the permission bits the read/write/execute bits on the page table entry and then we have the actual page size which we can infer from other constructs so this is our design limitations and this is a pretty constrained system we're not working with very much information here we're we're pretty we're pretty forced to sit

down at the hardware and let's see what we can accomplish and obviously bonus points if we can keep virtual machines in mind the whole way so the approach that we eventually landed on after a few months of talking us through is the idea of a breakpoint frame so kind of the high-level abstraction of what you're looking at here is we started with drawing this line where we're gonna say never shall debugger data cross this line into program data and never shall programs being debug to be able to read across this line in the same way that extended page tables provide that line with the guest page tables and the extended page tables that's what we're going to apply

to breakpoint data and that naturally means that we need to allocate a second frame for breakpoint data and we'll get into how we're going to accomplish that in a moment but this is where the name virtual breakpoints come from we're taking the the the concept of virtual memory and we're extending it to breakpoints so hence virtual breakpoints now we have a mystery virtualization layer that I'm going to talk about in a moment but before we get there let's talk a little bit about the buddy frame that we're gonna allocate basically we're going to say we're gonna allocate to physically contiguous frames per and instrumented page so this is this means it is completely optional if you don't want to allocate the frame

you don't have to we're going to accomplish that by placing a breakpoint bit in the page table entry and down at the bottom here and if that breakpoint bit is set then we're going to do this additional lookup if the breakpoint bit is not set then we don't have to do that lookup and since we're able to apply this to the eat the extended page table permissions that also means that a guest operating-system can't view the breakpoint data either so we've actually in in this manner we've solved that flexibility problem already but but what are the actual contents of a breakpoint frame basically our data frame is going to contain our program data and then our

breakpoint frame is going to contain for every byte in the program data eight bits of configurable data so byte for byte parity with program data and this seems like a lot of overhead and we're going to go into that a little bit later but it actually turns out to be no worse than existing systems so what what we get to do with eight bits of breakpoint data right we tend to think of breakpoints as either being read or write or executable we want to break on instruction fetch or maybe on instruction execution but what do we do with four additional bits and this is kind of an open question right we can start to think about new forms of

breakpoints that we can implement in a debugger maybe we have a taint bit that when you access it the first time you mark that area of memory tainted by some other area of memory that you tagged previously maybe you have a simple coverage bit where you just want to be able to collect code coverage on a program you've instrumented or maybe you implement a specialty trap where if that bit is set we're gonna drum directly into a special interrupt handler that implements a very easy logging structure right so we get to there's some creativity here that this opens up and this is all implementable and Hardware right so what is this mystery virtualization layer that I talked about

I kind of said that we're the construct that we're dealing with here limits us to a limited number of information so we have to talk about what is what does the translation look like so here I have circle the key points this virtualization layer is basically gonna say whenever this breakpoint bit is set we're gonna take our virtual address we're gonna convert it through the page table or the TLB and we're gonna get the physical address and then we're gonna simply add the page size right and utilizing that we've now identified the appropriate place in memory for our for our buddy frame right so we've if you remember back to those limitations we've had we've limited

ourselves in this translation to just those small bits of information that are available in a TLB lookup so at this point congratulations we did it we designed the whole system it kind of feels like it should be more complicated than this right but if you remember I said we're borrowing the idea of virtual memory and applying that to a break point frame fundamentally nothing we I have discussed up here is complicated enough that it requires years and years of research to determine whether or not it's going to work it already works with virtual memory and so the problem is that the requirements in the 80s and 90s when debuggers kind of stopped iterating didn't include things like what do if I

need to reverse engineer something that's malicious now we have that requirement now we can start to look at how do we iterate on existing structures this way so one last mention here a common question that people kind of come up with is how do we actually do the lookup in the in the breakpoint frame in a virtual address the very last chunk of an address is used for the offset in the frame right so where normally we would access the program data that's how we get the byte for byte parity and the lookup so kind of circling back around and looking at our goals right so we set some goals for ourselves and we set some

system design limitations have we actually met our goals at this point so let's start with flexibility can we trap on any address without corrupting the program well under the old system of binary replacement we only had the ability to interrupt on and the reason for that is you actually had to execute the structure the instruction itself the n3 instruction to produce the interrupts and that's what causes that critical bite problem you're trusting that that instruction will be there for you with the virtual breakpoint system we actually have the ability to either interrupt on execute or interrupt on instruction read and that's the difference the interrupt on instruction read we no longer have this critical bite problem because we don't

care about what's in the data frame we're gonna do a we're gonna maybe use the micro code on an x86 processor to do this additional lookup and fire an interrupt before that instruction ever gets a chance to execute or retire in the case of speculative execution yeah so we can kind of look at the system and say at least academically we've met our goal in that regard but is it efficient and this is this is one that most people kind of get stuck on a bit so we need to look at two things memory usage and execution speed memory usage first of all straight off the bat most people go yeah you're sacrificing half of memory wire you know who would use

that system you remember we designed this with the flexibility of use as a primary goal so we only sacrifice a page of memory for a page that we've actually instrumented in the worst case where you breakpoint every page on the system yes you will sacrifice half of memory but in the case of typical debugging sessions you're gonna sacrifice a couple Meg's at most if you're highly highly instrumenting your program that's nothing and on the top on the topic of execution speed the additional memory lookup per memory access is a pretty high overhead but again this only happens on pages with breakpoints on prior systems this code modification Handler and this different view of memory forced you to exit the program

for every access on a page already now instead of exiting the program and introducing all of that additional overhead we're simply adding a an additional memory lookup and bypassing that together so it's a matter of your advertising the cost in this regard and you're getting rid of a whole lot of old instrumentation that we have had to hack around okay so academically speaking we've at least a addressed the efficiency issue but have we solved the transparency in the reliability issue kind of interrelated so in the transparency of virtual breakpoints the problem that they solved with that code modification handler it was the fact that you were sharing that data frame now we no longer share that

data frame and if we look at this abstraction and we think about it in terms of a hypervisor there's no crossing that line you know with regards to the same you know page table construct so now the debug e only has access to its own data and the debugger it could it could implement normal binary breakpoints this way but it also has the construct to be able to completely remove that from the data frame which means that the debugger can't interact or see the the breakpoints in any way so we've addressed the transparency problem now are these reliable so if we look at this critical bite problem one more time we can kind of see if we breakpoint a place

in memory and then let's say this program is malicious and it kind of shifts itself in memory just by a couple of bytes we have to evict the breakpoint and fall back to emulation because we're dealing with that critical bite problem if you just programmatically set in the breakpoint right where it was again you corrupt the program so that's not very reliable but with a new construct that provides this virtual breakpoint system we end up being able to subvert this issue by doing a breakpoint on instruction fetch instead of instruction execute and that means we can breakpoint in the middle of an instruction and when that data is fetched we'll just break point anyway even remember the demo I showed you had

two problems one we changed the program and two we didn't break into the program so now we've solved that problem here and that's that's really the the huge kicker alright and so have we also addressed making this work for virtual machines yet because we discussed that pretty much the whole way through we designed it with basically extended page tables in mind the whole way so some of you are probably sitting in the in the crowd going wait a second there's a problem here you're giving a talk where you have essentially nothing to show us because there's no proof of concept and that's the big fat problem here and that's the big fat problem that the the

question here is ultimately I'm proposing an extension to the x86 MMU how does a lonely engineer go about producing an example to show that that works there are ways that you can do that for example we could prototype this system in a simulator an emulator I could spend a couple of years implementing all the debugger constructs that I talked about and implementing the support in the Linux kernel to support the underlying stuff all built on top of an emulator but ultimately that's not very useful that's in the way I would put it is kind of lame because ultimately I've just spent a couple years building something that no one's ever gonna use it's not efficient

efficient enough to do real research better off just using a normal hypervisor and that means it just won't be adopted I could and I've gotten this suggestion a number of times implement the system on risk 5 or maybe arm or any other form of risk but I kind of started this talk talking about x86 I want to start I want to solve this for x86 but ok I take the point I could do that but there's two problems here one it's a risk there's no critical bite issue you can just set your breakpoints on aligned data and know that they're safe to set at least theoretically anyway but to most risk 5 chips lack a real MMU and

the reason why I pick on risk 5 here is the fact that that's really the only open source thing that I can you know work on as opposed to arm which is not necessarily as available and 3 the major points not x86 I really want to solve this for x86 and the reality is I can't there's actually kind of an interesting program that I ran up against in the the past few months at another conference called Intel HAARP it's this hardware accelerated research program where Intel actually provides certain institutions a preview of their upcoming hardware we have FPGAs attached to it so that you can implement new kinds of instructions and maybe implement extensions to the

system like a barrier I was introduced to this based on a paper I found at a conference that was reached recently published but there's problems with this too and the big one is there's no access to the MMU or TLB Hardware at all so the one thing that you would need to implement or produce a prototype of this using the Intel Hart program is not available because it's a very close source it's you know IP for them and so there's also only a few installations that have this available to them I believe there's a way to buy into this program for anyone that's curious but there is limited accessibility based on that cost the last one is kind of going

out and seeking buy-in from Intel and AMD and I kind of like positive that question to the audience here if you're a researcher that finds a problem like this and comes up with a solution but the solution is only possible to solve and it's only possible to implement it in hardware how do you pitch that to Intel or AMD and have you know maybe I get the CI o--'s phone number and I call them and I say I have a really cool invention for you and he's probably gonna be like who are you and how did you get my number right so how does an engineer who's doing research on this and finds this problem solve this issue

and I'll have an answer to that and so if that's that's kind of a community problem it's kind of a problem that we don't discuss very often you know when we find problems you know we think about meltdown inspector it has to take that level of a problem for anything to get addressed at that level and that's a problem we need that we need a better agile solution in that regard one last thing this is kind of you know as I was going through the development of this I had a few people say to me hey you should actually patent this design because while it borrows some areas from other places it's definitely novel in the sense that

I couldn't find a single implementation going back to the 70s that implemented a system like doing this by doing this and so I wanted to bring up two things one I could have submitted a patent on this about two years ago before I decided to publish it and there's two sections of the patent law that that are to be concerned with basically one you can do any amount of private research you want and until you decide to go public you can file a patent on it there's no bar date in that regard but once you go public with it you have one year to file a patent it doesn't matter if it's you or someone else at that point it is

first a file so I published this paper originally on on archive.org in about 2017 and I did nothing with it and I just waited and after about a year I was like okay now we'll go talk about it because now no one can take this design patent it and turn it into their IP and so if anyone at Intel or AMD is watching just take it please I just want to solve the problem I just want to not deal with this headache anymore and make new fund buggers and so at this point that's the end of my talk if you guys have any questions [Applause] you'll be able to implement what you're talking about instead of having the big

guns make the hardware for you and I get a card you could put a boss or something yeah so I presented one possible solution here I you know I kind of stopped the research after we came to the conclusion that there was going to be a hardware component once we decided that probably the most efficient way of implementing this would be as an MMU extension there isn't really much you can do external to the MMU and still make it efficient you have to have a software component involved if there's an external card right because what happens when the instruction retires for example and then your external card decides the fire and interrupts well you've already passed the breakpoint

spot so I'm not I'm not claiming that there isn't another way to do this if someone could come up with another way of doing it I'd be all yours and especially if it doesn't involve having to you know get highly embedded with Intel but you know that's the limitation that I see someone else have

um have you considered any security implications of adding this kind of hardware into a so I have thought about it a little bit but the reality is until you have an implementation or a prototype it's kind of hard you're speculating so as a process of the hardware design as a process of the extension design there's probably a lot of room to talk about that but that goes beyond the the immediate topic of this talk so it's a very good question but no basically I haven't it kind of goes back into the the abstract of the talk said that there's kind of this cat-and-mouse game between debuggers and and you know malware and that goes to that what

happens when we produce a new construct to that malware does not even have an idea about and so that's where I think we would kick that back into high gear but that's that's kind of the limit at where I'm at all right

so you said there was no value in implementing some kind of emulator version of this but if you were to build something like this in 2q EMU and then you could show results of how much yeah wouldn't that not make us much stronger case 2 intel AMD sure I would actually argue that it's relatively simple to implement this in QM you you simply on any instruction fetch check to see if you have a breakpoint set in some other page right it's it's not an interesting solution there and I'm pretty sure someone has probably done that exactly but implementing the whole construct of the MMU extension doesn't really make sense in an emulator sense if that makes it sort of makes sense

right so like I can just allocate a secondary page for every single page in the system and on every access to memory I can just check that like I've just designed the whole solution in qmu and a really naive sense but the overhead is going to be too high for it to be useful to anyone I think there was a question here so how does no one run into this before right you see in this that is a great question and I kind of go back to this this slide that I had that talks about the history of breakpoints and it's a question that I get pretty commonly when I talk about this issue basically when did someone last review

this technology as far as I can tell everyone in the industry is just kind of said well these are the tools we have so that's what we're gonna use that's what's available to us and that's how you end up with you know these kind of systems it's not that these systems are bad it's that they they recognize the problem is there but they don't think that that's a problem they should try to solve or that they should even talk about that's the intent of this talk is to say here's this problem it's been going on for 50 years people are acutely aware of it because they implement the code modification handler to the system but no one ever talks about he should we

actually try to solve that issue they say well yeah the way would put his were all kind of Stockholm Syndrome with regards to the x86 platforms right whatever they say will exist is what exists unless you have the resources of a Microsoft or an Apple any other questions did that answer your question

so I'm not sure how it works like upgrading a you know x86 so suppose Intel decides that they want to implement this or something like it Ike how does backward compatibility work does like it would it only be working in new generations of Intel CPUs or or would it be backwards compatible yes so backwards compatibility in the sense that being able to provide where's the design sorry just the moment I'll pull this up it kind of comes back to this picture here in the sense that moving forward any chips that had this construct available to them you need that additional virtualization layer here that does the translation it kind of basically in microcode because if you

try to solve it in software you have existing constructs that caused this problem so older chipsets that didn't have that extension and and weren't able to solve this in micro code wouldn't be able to implement this but we retain all the old behavior right so any software that was built expecting the old behavior would still be able to operate so that's one of the ideas that I said that one of the redesigned constraints that we had to apply to thinking about the system was we can't just tell people that old way of doing things is dead because that's how you kill your adoptability from this from the get-go

do you have any evidence that a micro code update could or could not implement this I don't and that's that's why I say I think that's kind of where the limits of my knowledge of kind of that low level in terms of AMD or Intel are limited you know I I'm by and large a hypervisor developer not a hardware developer so the idea that they could implement this in micro code is entirely possible cool thanks very much [Applause]

BG - Virtual Breakpoints for x86/64 - Gregory Price

Related talks