Protecting data vs systems: practicality, performance, and problems solved

Name: Protecting data vs systems: practicality, performance, and problems solved
Uploaded: 2024-07-09
Duration: 29 min 1 s
Description: Can encryption protect data directly, not just the systems storing it? Draper explores record-level encryption trade-offs, envelope encryption with key commitment, and searchable encryption schemes that enable practical policy enforcement across databases without revealing plaintext values.

BSidesSF · 202429:01382 viewsPublished 2024-07Watch on YouTube ↗

Speakers

Dan Draper

Tags

CategoryTechnical

StyleTalk

Mentioned in this talk

Tools used

MySQL PostgreSQL

Platforms

Snowflake SQL Server

About this talk

Can encryption protect data directly, not just the systems storing it? Draper explores record-level encryption trade-offs, envelope encryption with key commitment, and searchable encryption schemes that enable practical policy enforcement across databases without revealing plaintext values.

Show original YouTube description

Protecting data vs systems: practicality, performance, and problems solved Dan Draper Is it possible to protect data directly and not just the systems in which it is stored? Encryption covers a range of options. What should you encrypt & how? What are the trade-offs? Can we get record-level protection without unreasonable overhead? This talk will discuss all of these things & more! https://bsidessf2024.sched.com/event/70768ad6b2a317d934e925fa17dbe06c

Show transcript [en]

thank you hey everyone how's everyone going how you all doing all right I'm an Aussie but I I've co-opted the term y' because it's so useful um so today we're talking about uh protecting data versus systems as we mentioned but first I want to acknowledge uh a very important religious holiday today um Happy Star Wars day to anyone who uh celebrates um May the 4th be with you so little bit about me I'm the founder and CEO at Cy sash um an American and Australian um cryptography and data protection startup uh I'm a longtime software engineer and cryptography engineer I take my coffee with milk no sugar please uh I'm an Aussie as I mentioned but I hate the

beach so now you know that that about me um so let's talk about protecting systems this is this is a thing that many people in uh in SEC Ops and uh security professions think about but what ises it actually mean well when we think about protecting say a database what we think about is applying controls to that database to prevent uh unwanted access or uh from data um leaking or or being exfiltrated in a way uh they clearly we don't want but the problem is when we apply policies or controls to that database it only works for that database if data moves to another database say a data warehouse whatever controls we've set up on the first

database don't do anything to protect the data in the second database so we have to make sure that they're applied uh to both systems independently even though they may be saying effectively the same thing when we think about protecting data we think about protecting the data directly so how can we protect an individual value or record even though when that moves from one system to another the policies and controls on the system may be different we can still apply the same controls same policy to the data itself a universal uh protection if you will so our goal to achieve this is a granular deny by default access uh mechanism that works everywhere that's Universal how might we achieve it well

we have some prior art to lean on um probably one of the oldest uh permission systems that I'm sure you're all familiar with is file system permissions on a on a Linux box or a Mac for example I can I can set it uh permissions based on the owner and and the group and uh what things I want that person or the principal if you like uh to be able to do thinking about the database I might want to protect data directly by using something like roow level security hands up who's ever actually implemented I can't really see very well but hands up who's ever actually implemented roow level security in a transactional database like

postgress I can't see any hands if there's any up maybe one or two not very many a handful it's really hard it's hard to get right it's tedious it's fiddly um and and what's worse is it's different from system to system um postgress does it differently to my SQL or SQL server and of course we have our old favorite IM am and its equivalence in uh in the other major clouds we can police all the arms um I am I'm sure most of you are familiar with with am if you haven't spent much time with it lucky you um it's an effective system but comes with a number of challenges as well so our challenge first challenge in

order to achieve these policies these controls the systems must be identity aware we have to know who the principle is in order to apply the right policies um and so to give a really simple example if we want to apply a roow level um permission in a in a database this is using post SQL as an example naive example I wouldn't necessarily have usern names in this form but to give you a sense of it if I wanted to limit the queries that the current user in the system in the database can perform um to only see their own records then I might use a policy like this but of course in order to do that I need to

know who the current user is and the problem is that most modern applications don't think about identity on the database in that way most of the time uh the application is the thing that connects to the database and that has a set of credentials and so from the databases point of view it only knows about the application identity and not about the end user identity so roow level permissions in the database in that in that sense are actually not very useful we have to have kind of a a wide open um control set because we have to be able to support in the database all the potential things the application can do the Second Challenge as I mentioned

previously is controls are not Universal uh and that that means quite a few different things depending how you interpret it um number one is that different systems may have different syntaxes different mechanisms maybe even different cap abilities in terms of how to apply policies or controls to to their their own um data so for example the postgress syntax is quite different to the snowflake syntax even though it has the same effect so you've got to you've got to learn the two different ways of doing it multiplied by however many systems you've got to protect uh that's apre it's operationally inefficient it's tedious um and of course there's always the risk that there's another system that you're not

aware of Shadow it or in the worst case if the the data ends up extrated then of course none of those policies are going to help so my question which I hope to answer today can encryption be used to protect data directly so we could do something like this we could encrypt every individual field I column uh field row I call it a cell and think about the very smallest uh Quantum of data inside a database um how might that help us to achieve our goal well before I try to answer that um I need to do a quick primer on how ases uh encryption works so some of you may be familiar already be a recap for you some of you may not

be familiar with it so hopefully you'll learn something from this so ases and in particular ases running in what's called GCM mode which is um probably the most prevalent and uh nist ratified think what you will have nist it certainly um plays an important role in this so uh we will take an initialization Vector typically uh 12 bytes a plain text in this case love coffee 10 bytes long and a 32 by key so that's a AES 256 we'll use a 32 by key we're going to drop those values into an encryption function the AES GCM encryption function in this case and it pops out a cipher text C and a tag and the tag you can think of uh it's the

cryptographers will help me saying this it's kind of like a hash it's technically a message authenticated code but uh it allows us to prove the authenticity of the message make sure that it hasn't been tampered with then we're going to store the initialization Vector of the cyhex and the tag together and that collectively is our output note that for GCM mode the CeX output is the same length as the input decryption is pretty straightforward it's just just the other way around we're going to take the IV S text and the tag we're going to put it into a decryption function um and uh if the force is with me then I'll get love coffee coming out the other side my

plain text but GCM as GCM is what's called an authenticated mode of operation with uh uh AES and that means that we can provide what's called an authenticated Associated data and this is additional information that we include in the computation of that tag which allows us to prove the authenticity or the validity of that uh additional data it's not sensitive it's not encrypted so you shouldn't if you ever start to dig into this you should never put sensitive data into the aad but it's very useful to provide additional context that we can rely upon and so in this case I've included let's say I've encrypted a um social security number nine bytes long and I want to include in the aad to say

that this is a Social Security number so I'm just going to set that to the characters SSN and a can be any number of bytes lot what's important about the aad is that it not only proves the authenticity of the message it also is required to perform decryption so if I provide the correct uh the SSN in this case the aad that was used to encrypt the value when I provide that to decrypt everything's fine it decrypts correctly if I provide another aad the incorrect aad the tag is not correct and so the decryption fails so that's a really important point we're going to use that that in a moment okay so we've talked about a GCM

coming back to our database encrypting every value does it answer the problem well we've got a little way to go yet because now what we've done is turn this into a key management problem anyone who's worked with encryption will immediately realize or within a very short amount of time will realize that most of the problems with encryption comes down to Key Management thankfully we have some prior art we have a few technologies that we can rely on and one of these is this idea of envelope encryption so envelope encryption is used by many of the major um key providers it's something we use at Cy stash as well uh and it looks something like this

so let's say you've got a cloud-based Key Management Service so that might be Amazon's KMS uh Azure key Vault or C Zer KMS for example we can uh the client that wants to perform the decryption is going to send a data key request to that key service and the key service is going to use uh its own internal key that no one ever sees externally to uh encrypt a randomly generated data key and that's what we call a wrapped data key so it's going to return both of those uh keys and so to get into that a little bit more detail explain what's going on key server has a random number generator it has a managed key uh in most cases

that's probably going to be stored in a Hardware security module or something like that um and it's going to use that RNG to Generate random IV random data key that we're going to use to encrypt our data then it's going to use its managed key to encrypt that and it's going to pop out a wrapped data key but think back to a couple of slides previously this is exactly the same structure as our encryption it's just that now we're using the encryption to encrypt a data key itself we're going to return both of those things the data key and the rra data key so hopefully everyone's following along with me so far so now to reverse the process to

unwrap the data key take a wrapped data key send it to the key service and get the unwrapped version returned um we can just send it over follow the decryption process like what we went through before I'm not going to cover it again and get that data key back so now we have a uh essentially like a distribution of trust you can think of a uh a key service like this as a as a separation of concerns when it comes to how the key is being managed so now we can enhance the encryption that we're using in the database table by storing not only the cipher text from the encryption but we can also store the wrapped data key for

uh the unique data key that's used for every record it does increase storage space we'll talk about that um but we now have a unique representation of how each of those records could be decrypted one last piece of the puzzle this idea of key commitment so key commitment is when we're going to combine envelope encryption with this idea of authentic indicated Associated data I am going to tie all this together at the end the promise so we can add one more layer so let's imagine when we send this request uh data key request to the key service we're going to include some context we're going to say this is for record id1 and for field uh the field

name um and the key service is actually going to use that context perhaps encoded into a more efficient format uh and include it in the associated authenticated Associated data associ with the wrapped data key it's going to Output the result and now what we have is what I would call a wrapped and committed data key so that means that data key can effectively only be used when the correct context is provided we're going to use that to our advantage in a moment uh so now when we do the data key request um we're going to uh send the context and we get back the Ws committed data key so how does this play out so what

what this means is we have to provide the correct context in order to get the data key that we want you send through the uh aad the context that it was used was used to to generate to encrypt the data key initially and that means the data key decrypts correctly then we can use that data key to decrypt the value so we now have the ability to read the read the underlying plain text however if anything is incorrect if the record ID is not correct or the field name is is not correct then the data key decryption will fail and thus we can't uh decrypt the actual value so hopefully by looking at this diagram you can start

to see something it looks a little bit like a policy list this is how we can achieve uh effectively a policy or attribute based uh policy control using encryption n a key service so Access Control using selective data data key decryption can we achieve our goal so let's take this one step further so let's now say that our key service is running uh as a web service we want to send through some uh author authorization say a JWT ver token standard stuff and include our context as a Json payload and actually the key service is now going to perform an additional function it's going to say can or answer the question can I the principal the the the subject of the JWT

decrypts the name field of record ID 1 well because we now have these context values we can use an ACL list built into the key service to answer the question can this user is just a lookup table can this user access this record and this field and if they can't the key server will just send back a 41 it's doesn't have the correct context so it can't decrypt the data key the only thing it can do is send back uh sorry denied however if it does have the correct context it can decrypt the data key and now it will return a 200 okay along Ong with the data key that uh needs that is required in order to decrypt the

underlying value so we've achieved a few things here we've not only got us a centralized policy management system we have a universal interface um and we also have a way to separate the access notice that the key service doesn't ever need to know anything about the underlying values it doesn't know anything about the plain text comes with another side benefit which is some level of guarantee around logging or recording that that access was made so if the key service records that a uh a data key was returned that it was successfully decrypted and returned to the client we can say more than likely a value was accessed so there's a potential of a false positive but there's no false

negative and we will not return the data key if we can't write to the log so there's some level of guarantee associated with it which is nice we can't often get that in systems so you might be wondering now if you've been paying paying attention where do we actually do the decryption what part of the system do we do we um process that in well the cool thing is because there's a universal interface it gives us lots of options and some of the things that that we think about um are you could do it in the browser we'll talk more about that in a moment I suspect there may be some questions around that if you've got questions look

forward to them you do it in an edge worker say cloud um Cloud flare or fastly or something of course you can do it in the application this idea of application Level encryption um or in say a database proxy a proxy that sits in front of your database that performs um you know decryption and encryption operations transparently so we have a few advantages and I argue that this in many ways achieves the goal so we have a universal interface there's one way to check permissions we don't have to worry about how it's done in Snowflake versus how it's done in postgress versus how it's done in some other system it is denied by default because the data

is encrypted if there is an unsupported system um nobody's going to be able to access the data but at least it's not leaked and of course if there is a data breach somebody takes the dump of the database or they've got access to the database directly they're not going to be able to reveal any information and and a third point which I'll make which may be not super obvious and possibly more nuanced than we have time to cover today is this idea that the decryption can take place in a in a semi unrusted setting so like in a in a browser client if you had data stored in local storage in a browser and then you

had a traditional um policy control like an arbac or something like that that doesn't make any sense to do that because the client could just read the data directly from the local storage whereas if it was encrypted in local storage and there was a process that had to involve a communication with a key service then that actually allows you to to to entertain a model like that uh so it opens up other interesting patterns and architectures that that are not previously possible so one more problem what about queries if we've got all our data encrypted in a table like this what if I want to run a a search over my records want to look up my name for example or

find a a match partial match using the like operator and sequel or for that matter find all um people who were born after the 1st of January 2000 or even sorting actually sorting would work but it would just give you the wrong sorting uh which is not very helpful so I argue it doesn't work um so how do we achieve how do we get around this how do how do we solve this problem well some of you may have heard of homomorphic encryption in fact by show of hands who who here has heard of homomorphic encryption quite a lot of people Okay cool so for those who haven't heard of homomorphic encryption it's this idea

that you can take an encrypted value and another encrypted value and perform operations on them say addition multiplication or comparisons it's very very interesting technology it's really cool um but as I'm about to show you has one major problem so this is a benchmark uh we're going to share a link to to these benchmarks um at the end of the presentation so if you want you can go and have a play yourself uh this is for a rust Library T thfs which is a a fairly recent um library and a on a modern homomorph encryption scheme gives you a sense of some operations that we might care about uh in a database so for example taking a look at the more than

comparison between two values the average time is 191 milliseconds that's a fifth of a second for one comparison which is pretty brutal what sort of impact does that have on the database we've got indexes in databases right so it shouldn't be too bad well if you have this query so data birth more than uh 1 of January 2000 um and let's say my table has 100,000 records in it it's not a huge number of Records but it's a it's a non-trivial amount and I have a b tree configured correctly using say postgress and um and the planner decides to use that P tree well it'll take around 3 seconds to run that query however if I don't have an index

or for some reason there's a more complicated set of conditions and I'm unable to use an index and we have to do a sequential scan that's going to take 5 and a half hours um I've never how does an SQ SQL query run that long except one time when I was trying to run very complicated database on a Pentium 3 I think many years ago um it it doesn't happen it it you can forget about the concept of realtime queries it doesn't make any sense when you're using homor encryption which is tragic because homomorphic is really cool it's just way too slow but there is some silver lining there is a there's a light at the end of

the tunnel if you want to use that eism uh there's lots of interesting functional and searchable encryption schemes available now uh there's been a lot of research even in the last few years uh and one example of that is this idea of Auto revealing encryption um for those that are interested I'll share links to papers uh at the end um but or is quite cool so it does one thing it literally just does a comparison more than less than or equal to uh and we've done a benchmark of this you look at the numbers and you go actually that looks pretty similar but then you realize these are n's and these are M's so actually doing a comparison using or um

over two 64-bit integers in Rust takes about roughly 200 nond so that's 800,000 times faster than homomorphic um so how does that apply how does that look when we're using it in our in our queries in our table so uh I'm using a b Tre same comparison as before takes about four micros for a sequential scan obviously the situation we want to avoid in in Practical databases but it does happen we're now looking at 24 milliseconds so that's pretty reasonable that's quite fast so now we have a practical mechanism to be able to provide the kinds of query capability that we need but over encrypted data as well so this allows us to consider the idea that

encryption can be used as a universal policy control mechanism so uh o is one example of a searchable encryption scheme or a functional encryption scheme there's uh quite a lot of others and in fact there's whole categories of different uh encryption schemes and if you really interested once again I've shared some papers at the end um there's this IDE symmetric searchable encryption or SSE structured encryption things like en encrypted Bloom filters um and many others uh it's actually quite a ripe field of research right now uh so that's it really um I argue that we've achieved our goal we have a a universal encryption system um that allows us to apply policies in a single

uh using a single method that works anywhere and through the use of search for encryption uh ensures that we can we can keep it practical it works in in any database so shared some uh resources here uh you go to github.com bsides SF uh we'll have a bunch of links there um and I've shared my email here as well so if you feel like getting in touch and you want to have a chat um ask me questions tell me where I'm wrong I'm always open to hearing where I'm wrong then please reach out and thank you very much for your

time thank you Dan and again I'd like to remind you if you'd like to ask a question you can post it anonymously on slido however we do not have anything right now on slid if anybody has a question raise your hand as well one back

there sorry you're going to have to speak up I'm sorry I can't hear I'll come to you thank you I'm an old man now I can't hear very well

um can you hear me now yeah that's great thank you um so if you're using like traditional hom schemes uh uh not the oi stuff which you just me mentioned uh with like 100k records I'm assuming like the noise noise level would be really high how do you like do you boot strap after a certain time or how do you manage noise that's yeah it's a really good question I mean to be to be frank I avoid the use of homor encryption in these settings for performance reasons but you've touched on some other reasons why they they can be pretty challenging so for example uh uh one scheme uh learning with errors which is very

similar to some of the um postquantum P we're seeing come through now has this idea of noise as you suggest I mean one one thing to remember though is that um you uh that's only really a problem when you're encrypting lots of values on the table it's not so much a problem when you're doing a query because you're only comparing one value to one Val to another value then that same value to a different value so the noise is not not really a problem when you're doing a comparison but certainly when you're encrypting lots of values it is an issue so you would have to bootstrap regularly yeah but that's just a just another nail

on the cofen for FH in my view no worries over

here yeah uh it's something I've spent a lot of time uh experimenting with and and trying to make work so um there's a few there's a few gotches it's it's it's achievable we we we've actually done it um and uh the good thing about something like um postgressql is you can create like custom operator classes and things like that so it's actually not too bad um the scheme that we use for example um the comparison operation relies on nothing more than hmac and so we actually just use PG crypto and hmac cre a stor procedure and a bunch of custom postgress classes it it's pretty horrendous but it works well and it's been very well vetted but it took us

quite a long time to work out how to do that yeah no wor I can't see anyone else so I can't see you but I can hear you I'm up here cool uh if you're using o how does that not end up just being another way to reveal the data so like I could say show me all of the records where the date of birth is between January 1 and January 2 and now I've got all those people's birthdays yeah totally so um you have to think about what permissions you're or what capabilities you're giving to each um kind of actor in the system right so um one thing that's really important about the different or some of the different o

schemes in particular um one by um David wuen Uh Kevin lewi from a few years ago you actually need a key in order to perform a query so a qu that running queries is a a trusted operation um and you definitely don't want anybody that's just got arbitrary access to the database to be able to run those queries and some of the older schemes like there's a the predecessor to a the idea of Auto preserving encryption would definitely allow an attacker to run uh just arbitrary queries even though they don't have key access um which is bad because they're exactly to your point and that's that whole problem is what's called an inference attack and there's a

whole kind of there's a big body of literature on that the idea of having uh having to have a key in order to run those queries is a is a really big step forward but you also need to remember that just the act of being able to run a query is going to reveal some information so do I get results do I do I not get results that's information um and and actually um some researchers have proven that there is a sort of a theoretical lower bound to to to the amount of information that can be revealed just by the fact that we want to run a query so but all this stuff comes down to whatever your risk model

is and um how you think about the security of your systems but certainly there is the state-ofthe-art when it comes to or addresses like the vast majority of those problems yeah no worries seems like this is all the questions we have for now cool now thanks so much everybody cheers thank you then

Protecting data vs systems: practicality, performance, and problems solved

Related talks