← All talks

Library Sandboxing with SAPI

BSides Lisbon · 202419:25154 viewsPublished 2024-11Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
SAPI is Google's open-source framework for sandboxing untrusted third-party libraries in isolation, rather than sandboxing entire binaries. Built on Sandbox2 and using Linux kernel features (namespaces, seccomp-bpf), SAPI shrinks the sandbox footprint to only the syscalls needed by the library of concern. This talk covers SAPI's architecture, code generation, and practical workflow for sandboxing libraries in C and C++.
Show original YouTube description
The world is full of sandboxing solutions which serve different user scenarios. These solutions have in common that untrusted code should be executed in a contained environment. What if your code depends on a third-party library that you want to contain? Instead of sandboxing the entire binary, with SAPI the sandbox can be shrunk down to the specific library. SAPI is an open-sourced sandboxing solution by Google. The underlying sandboxing layer is Sandbox2 (also open-sourced). It uses Linux kernel features to create the sandbox (namespaces, seccomp-bpf syscall filter). While Sandbox2 will execute the entire binary inside the sandbox, with SAPI we have designed a solution which shrinks the sandbox footprint to only the library of concern. The benefit of SAPI is that the many syscalls, which would need to be allowlisted when executing the entire binary in Sandbox2, is now reduced to only the syscalls which are needed by the concerning library. This makes it easier to reduce the Kernel’s interface to the minimum but it also allows to better manage the syscall filter. This talk will give the attendees an introduction into SAPI, explain how it works and ultimately how a sandboxed API is being used. About the Speaker: Oliver is a Security Engineer with over 10 years of experience in consultancy, penetration testing, and security design. He is currently working at Google, focusing on securing third-party libraries used by developers.
Show transcript [en]

hello everyone my name is Oliver K at the moment I work at Google in the IC s boxing team it's part of the security team I will be talking about Library s boxing with sbox API or short sappy so this is the situation that we're talking about we have a library that we don't trust usually that's a third party library but maybe it's also first party library that you know will be processing some random data from a user that you don't trust and you would just want to have a secure boundary around we slap an RPC client around this and uh wrap this in a binary and then eventually we run this in sandbox 2

which is a Sandbox driver for sappy and around this we had some more Logic for the uh sending and requesting of the of the calls sorry I need to make this a bit bigger so I need to first introduce you to sandbox 2 because sbox 2 is the vital part of s boxing in sbox API it's the underlying s boxing driver it was developed by my team pre 2019 I can't recall when exactly I joined the team in 2021 we open sourced the tool in 202 in 2019 before Co and it's open source as part of sbox API the use case of sandbox 2 is full binary sandboxing you can still use it as an individual

sandbox Tool uh but it's just part of the sbox API source code repository it's good for any random third party binary if pre-compiled or not it's good for any binary that you write yourself you can defer the S boxing and call s box me here anywhere in your binary so then s boxing only starts at that moment you can write your custom Fork server to have CPU intensive initialization covered by a fork server the fork server is used to start the sandboxy process so this is the binary that we always Fork first so it's a thrusted binary basically all the initialization that you do there is trusted before you then go into the sandboxing the building blocks of

sandbox 2 are well known it's basically Nam spaces we enable all of them it's acum BPF for the CIS call filtering it's p trace for monitoring we also have an alternative monitor by now and we use sockets for IPC we provide you a policy Builder API so different than many other sandboxing technologies that are around there we don't don't configure the sandbox policy through a configuration file we configure it in code that's pretty neat so you can actually decide on situation basis what is the policy that should run it provides you API to map files and directory into the sandbox we have some convenience functions that wrap around multiple CIS calls like allow exit allow

open so they don't have to allow individual CIS calls and then down the road something changes in a dependency and suddenly your policy breaks the the risk is pretty much the same whether you use exit or exit group that doesn't really matter much you can allow list individual CIS calls you can allow this groups of Cs calls and most importantly you can create policies on CS call arguments sampo 2 has two basically main components one is the executor the other one is the sboy the executor is also where the monitor is started it's living in the host code so the trusted code it controls the resources that the sandboxy will be given the sandboxy process that

is uh it configures the policy eventually it starts the sboy which means we Fork from our Fork server binary and then we go into the sboy phase one so we are trusted because that's our binary we know it we control it so nothing is really sandboxed you're running in namespaces but that's about it then either because you call sandbox me here or you directly want to sandbox the execut to send over the policy the RPC stop in the S boxy in our Fork binary takes the policy applies it so the csol filter is now active and now we're ready to replace the binary with the untrusted binary that we want to have samp booxed that's phase two now we

are executing fully in the sandbox so that was it primer sand box 2 now let's take a look at sbox API so again open source 2019 by the team the use case here is individual Library s boxing this product or this framework came about that the team at the time realized that lots of teams at Google took sbook 2 created a smaller binary just around certain librar calls and then added this binary into sandbox wrapped it around the loop and added some request and response logic to it so my team thought that's actually a smart idea we can provide a framework for that so that they don't have to write them many code and more and that's how sappi

was essentially created the advantages here is so before every team basically created their own sandbox they were sandboxing the same libraries but they all had to manage their own code and write multiple times pretty much the same code that's wasted s hours S cost a lot of time so we went to a model where we basically have sandbox ones and use anywhere approaches we also reduced the csal footprint of the of the binary that we need to allow list so before with the full binary samp boxing we have to allow list every sus call that the benign operation of this binary will have to take that's a lot so we expose more of the kernel that we actually want to or

that we need to with this approach we shrink that footprint to just the CIS calls that the code that we are concerned about actually needs for benign operations it's good for untrusted libraries it's good for rapper libraries about around multip unrusted libraries and it's uh it's good for just first party libraries itself so again if you have first part library and you just want to have this added Security in depth to it that you have a samp boox because you're not trusting whatever data is coming works for that as well disadvantage so I say we save some three hours but there's still some more overhead for the actual sandbox creation that goes into it in the idea case you

just give it a Target and you do some configuration you build the Target and you have your samp boox that works for simple C libraries for more complex C libraries that doesn't work and for C++ libraries it usually also doesn't work and the issue there is we have either callback functions that need to be passed as pointer in in a in a function or we have claes or we have templates all that our code generator can't really handle so you need to write a c wer in the worst case that's basically the effort that you now have to do and then you're actually sandboxing this C wrapper and not the actual Target Library another disadvantage with this

approach is every call to the library is now an RPC roundtrip there are use cases where you might not be able to stomach that roundtrip latency that this introduces and that's the best case in the worst case when you have to synchronize lots of data into the sandbox it's more than one round trip this is the architecture the blue part you have already seen at the beginning so I'm just focusing on the other part uh so far that's the host code and what changes is now you have this API object so that's an object that you create and that's the object which you will facilitate all the API call through into the soundbox you create your variables

you I will come to that later you set them up you pass them over to the API you make the call and then through IPC communication everything gets sent over the RPC stop in the samp boxy is waiting taking this all up sets all up the memory that you send it and makes the actual call to the tri to the Target library and then Returns the return value back to you in an RPC response so for sappi there is run time and there is build time again in the best case scenario all you have to do is basically this Define the build Target and then build it and that's it so what we do with the build Target is

we are building this purple box that you've seen before where we take the library and our RPC stop Mash that together into a binary then we create some more intermediary artifacts eventually a library and we produce a header that you can include We Run The sappy code generator which I will be talking about in the next slide and important I'm a Google engineer so we use basil or the internal version of that that um we also have cake because we know the world doesn't really run just on basil apologies for these slides I'm just on basil my knowledge about cmak is fairly limited so the Sapp code generator what it does is basically takes the target

Library it analyzes it it extracts type information function information and translates some types to sappy types and in the end emits the sappy header the Sapp header is what you include in your host code in order to use the sandbox Library we have two generators a python one and a clang one lip tooling actually the python one should be eventually replaced by the lip tooling we are not yet feature parity so that's why both exist but we hope to eventually really replace the python generator I already mentioned sappy types so with samp boxing we have the case where we have a post process that runs in its own address space and we have the samp boxy running in its own s

memory space but we need to have data in both of them so we need to be able to synchronize data in and out of the samp boox for that we created these sappy variable types native C types are just supported as is they don't really need much but for the rest we created these C++ classes that some of them are here I didn't list all of them there are way too many so once we we make the call we then just have to synchronize the pointers over so we create these objects right these sappy types are just objects so memory and we need to tell the S the sbox now what it needs to do with that

memory we don't want to synchronize the data in and out every time so we came up with these four synchronization modes it's point to none for no synchronization needs to happen it's pointed before for please synchronize the data into the samp boox before you didn't actually call the underlying Target library and pointer after which ises the opposite and pointer both obviously does both at the same time so that's the basics that's sbox 2 that's sbox API let's get into how actually you go about and sandbox a library at least that's the title of my talk right the basic workflow is a simple as analyze the library ideally you're the S you already know what this

library is used for or how it's working and so on I'm not I'm a security engineer so when I do this at Google I first have to learn how do this s at Google user Library so I need to identify what are the apis what are critical apis and what does need to be sandboxed sometimes I play it wild I don't Define any function and try to have the generator generate all the API footprint for me uh that usually doesn't work as I said like it's rare that the library is that simple c c libraries really you're lucky sometimes but C++ libraries very often you're not so very often I just go directly to writing the

the C wrapper because by now I know what is the problem when the generator will will fail so I write the C wrapper and then Define the C wrapper as my Target and run it this is an example that we're looking at now this is written by my colleague Christian bman there is a link in the slides the slides will be provided after the talk on the on the website from the conference this is a very good example because he created this when we had lots of interns during the covid years doing internships and sandboxing libraries for open source if you go through the commit history of this you can actually see how to build up a Sandbox Library step by

step so this is really cool basically what you see here is the target Library the rod 13 file function that we want to have sandbox we see we have an inte as return value we pass two pointers in that's about as simple as it gets for this function the return type uh the return value of minus one is what we are really concerned about then we know something went wrong otherwise it's just a time we don't really have to care about that that much so in our build file and those targets are now defined at the same time in the same build file we first Define the library for this Rod 133 that's fairly basic if you know

basil and with cmech it's probably very simple as well and this is our sappy Library Target now in use so the name is the target name basically if you want to build a Target or depend on a Target that's what you need for that the functions list is again if it's empty we try to generate the sappy API for all of the functions here we say no we just want to have Rod 13 file please we pause the input files so that the generator knows where the function is defined we we have with the libr with the lip parameter the dependency of the target library because it's in the same file it just starts with colum and the lip name

is what we use to prefix the classes that we generate with the header uh generator you see it in a second that's as simple as this so basically once we built this we have a Sandbox ready to go into use so that's the the header it's actually not that bad but I made it a bit bigger so we have the samp boox as I said under the hood it's sbox 2 so we create a class which basically starts for you the sbox 2 you see here again Rod 13 the lib name is what defines that part then we have the API clause which you will be using to create the API object in your code it

takes the pointer to the sandbox because again it's what it needs and that's the generated header for uh the generated function for the rod 13 file function that we want in the header every function that gets generated for there is a comment with the unsam booxed function declaration so it's pretty neat you can just look in the header file whether or not everything that you need was generated the build output also tells you whether or not there is an error but build output is quite verbose so you might miss it that's why usually I just check the output file we see here the generator created an Absol status or of integer every time when we have a return value that is not

void it's Absol or uh it's status or if it's a void it's just an Absol status this status is there because we are doing an RPC call into the S boox and out so you need to check whether or not that call failed that's the first thing after an API call that you now have to check you see the pointer types pointer uh input and output file and we are uh here calling the underlying Library the unap boox library so this is how we instruct the RPC client inside of the sandbox please now make the call that means everything that is linked into the binary the sandboxy binary you can call with that as well

whether or not you expose it that doesn't really matter as I said we have a policy Builder to create uh policy sappy provides you with a standard default policy you can just use it the issue here is we would like to share the file the input file and we would like to have a place to write the output file to so we are extending the policy but you could also just overwrite it so for that it's uh usual here in my team what we we do is we just create a new class building from the rth 13 sandbox or the samp boox that was created and it's usual for us that we just add sappy in between so that we

know this is a a policy that we have modified and then we overwrite the modify policy function this is the function that gets called during the setup of the sandbox you can not pass builder in here and you can just create your own builder object and really create your own sandbox policy and make make it harder than what we already have uh but here we just go on and add a file and add a add a directory the directory of course needs to be rewritable that's why the Roo is set to false and then we build or die if it dies we already crash if it builds we continue with the execution that's it we are ready to use

now our sbox so first we create the sandbox object we initialize the sandbox and then we pass the pointer over to to our sappy object we have to define the sappy type variables there is two strings input and output file so we create them and prepare them as well and now we are making our sappy call you see here the API object that we're using the RW 13 file is exactly how it is defined in our header file and you see the two sepy types we are using the point to before synchronization why is it pointed before and not anything else well we need to have these two strings inside of the sandbox ready to

be used with the Call of the underlying Library we don't need any information afterwards because we directly write out of the samp boox into the file that we're mapping into or the directory that we map into that's why it would be a waste to make this a sappy both a point to both because we would basically synchronize data out of the sand boox that we don't need then we check the result for the status if the status was okay we check whether or not the uh return value of the function was minus one if it was not we are basically good to go and that's it that's how simple it is sandboxing with sbox API

[Applause]