
greetings b-siders my name is rick troth officially i'm a data protection specialist i'd rather think of myself as a rogue programmer aka hacker but don't tell my employer we want to be so squeaky clean for the rest of the world and the h word has all that baggage i work for voltage security which is a division of micro focus now who the heck is microfocus microfocus owns um a number of properties including souza we recently spun off souza but many of you will recognize souza of course in a big player in the linux distro space microfocus also owns novell attachmate and some other companies you've probably heard of in the 80s the hotness was the
microfocus cobalt compiler that's still going strong cobalt is huge business even in 2020 and we have one of the more popular compilers out there for that language good stuff visual cobol this is not a vendor pitch this is about the technology specifically i work in a the the division known as voltage our main product is uh data security secure data we also have a product called secure mail that was our historical flagship product but secured data got really really hot about um five six years ago with what we call format preserving encryption the title of this talk risk grows where your data goes and nobody knows like you some of you might remember the 1970
hit love grows where my rosemary goes so the title is a hat tip to that ha but i thought it was cute looking at you the audience as developers admins architects whether you are with the end consumers like maybe nationwide or huntington or if you're with a vendor like me my goal is that you'll understand data centric security data centric data protection specifically format preserving encryption and at the end of this you'll you'll come away with we just gotta have it like the big cup on at coldstone creamery a little bit about me i've been doing unix for so long and linux since the early 90s even before it was one dot x i remember using a linux
linux on a compact laptop riding on the bus in houston and banging out code that was a lot of fun linux was this great wonderful unix work alike that was sneaking in the back door at the big data centers most of my career though has been done with virtualization uh even before vmware which has been around since 1998 i remember that vmware 1.0 beta and in those days it didn't run on windows yet and i thought this is great somebody's doing virtualization on something other than the ibm mainframe and then later i used zen and now i heavily use kvm at home even even on my laptop regularly but mainframe virtualization goes back even further
personally i'm passionate about open source i spent some time working in academia in my most recent previous job we had an in-house ssl stack and i got to work with that that's where i popped the hood on crypto stuff got bitten and i'm never going back just absolutely loved working with it tying together symmetric and asymmetric and hashes and all kinds of stuff in the lower left is this a double r l logo that's amateur radio relay league i am a ham radio operator i'm taking call signs either by email or in the question answer session which follows this up at the top the logo there is for texas a m two things about texas a m other than
the fact that i went there tux was created there the logo for the linux mascot was done at a m and a m is one of less than two dozen universities in the us with a nsa sponsored center of academic excellence for cyber security another one being cedarville down outside of dayton and i'm hoping to work with them later this year if things all work out and present this technology to those students as well data centric data protection the goal is ease of use of the data so that you'll have a reduced impact when you're integrating your protection solution with your applications and existing systems and databases compare this to the traditional methods which would i
would call the easy button transparent encryption where the encryption is transparent to the applications format preserving encryption is also transparent to the applications but provides a different mode of encryption of the data it allows that the data is processable in the protected state and this is this is some really slick stuff i get really geeked about the technology a big performance benefit would be that you don't have to decrypt it to use it for some of the point solutions the transparent solutions there's overhead that you just don't realize is there it winds up making things more secure because you don't have to cross gaps when you jump from one uh transparent solution to another transparent solution and
we're talking about an industry standard nist has blessed this as the aes ff1 algorithm anybody can use it not just us so again not a vendor pitch this is just about the technology this is what it looks like i'm going to get into more detail later but i wanted to show you as early as possible how this fpe concept works the ciphertext the encrypted version has the same character set and length as the original clear text so if i've got a social security number one two three four five six seven eight nine it comes out as still digits so rather than bits in and bits out which you would get with straight aes we have digits in and digits out or
letters in and letters out or any combination of that you could encrypt my name and it comes out it's unpronounceable but yet still quite printable form that you see here below you don't have to make any database schema changes you don't have to make application changes often only those applications which are actually ingesting the data would need to be modified or those which are handing it off out the back door like a credit card number to a bank for reconciliation the data can stay in this protected form throughout your operation through across all of your applications that need to use that particular data good stuff
someone said that i should add this statement here that the data could be lying on the sidewalk and still be safe imagine that to where you you have a 16-digit credit card number that somebody left on a post-it note but it's not the actual credit card number that's the goal here and and it's a huge boon to you and me trying to defend the privacy and security of our users and customers because when the bad guys lift it what they have is just useless and always of course assume that somebody's going to get it so if somebody's going to get it what can we do to secure it try this try fpe obligatory headline thrown in this is
from earlier this year there's so many headlines i could have stolen and you know and in other presentations i've i've used different ones you've heard this this is not anything new to you these are the headlines that you want to avoid and i want to avoid and we want our customers to avoid data protection and privacy is an increased problem because of the increased demand for more and faster development of applications especially with the cloud more and more stuff is getting automated and it's just natural that there will be risks uh applications with gaps and and uh vulnerabilities it's going to happen so the more we can do to protect things the better so in a context of definitely still trying
to keep your applications hardened in fact my my employer sells a terrific product for that you might have heard of the fortify product that's a great way to have your source code scanned and and then you'll just see a lot of vulnerabilities that you can quash right away but again not just trying to pitch our products you do want to continue hardening your uh your apps but also harden the data itself look at it this way we're dealing with securing identities access controls we're also hardening the applications themselves here we want to harden the data itself in a way that it's still hardened even when it pops out of a traditional transparent type of solution
look at it in the development life cycle that's fine too either way that's what we're trying to do just shore up the whole process where we're already hardening identities in applications let's also harden the data itself i mentioned gdpr as the first of several it's been active for a couple of years now i should say in force for a couple of years now we've had some companies already pinched by gdpr for not being compliant or for not shoring up their customers data as much as they should and there are countless others ccpa is a big one in the us because so many companies deal with businesses in state of california we've got other states presenting their regulations and every
continent has something unless you want to go to um antarctica and the point of this of course is uh the the phrase the new oil data is we just we keep it we use it we grow it we expand and we collect it and especially when you think about big data what is that other than just a perpetual collection of in useful information for your business and you need to make sure that the data itself is safe so that it doesn't get so it doesn't walk off uh we want to we want to be able to support full scalability and still have mitigation and defense against breaches and and all of that the bigger it gets
the more the risk grows threats are all over the place not just online shopping but it it's bizarre to me to realize that many of us are doing banking from our smartphones i'm not i do plenty of other things from my smartphone that make me nervous even more nervous is for me i get even more nervous when things go into the cloud with format preserving encryption you could have the data protected push it to the cloud ah no problem i just wish more people knew about it of course social media that's a whole nother thing because you wind up having the relationships themselves facebook is constantly trying to time you together with people it's it's a
little scary so that database is something i'd really love to see secured
the regular transparent methods they're sort of siloed they're not really well connected with each other and that leads to scalability problems but also more significantly there's so much manual work that would need to be done to eliminate any gaps and sometimes you just kind of live with the gaps what i mean is this this is a good picture of a half dozen environments or sample systems within your enterprise where you you've got something securing the data but the data then has to be decrypted to be usable has to be decrypted when you hand it off to the next phase particularly think about database encryption transparent database encrypt encryption the underlying storage is encrypted and it's really solid
i mean it's really good stuff and it it's it it can't walk off and be of use to anybody but for it to be used by the applications it has to be decrypted and at that point it's exposed when you need to hand it off from a record into a file then you're crossing from the database itself down into the file system file system might be encrypted great but at the handoff the date is in the clear at the hand off that's where the attackers are going to hit you so at the handoff if it's format preserved and encrypted then it's still usable but also still secured that's what we want to do then your your uh
flow your your process flow looks something like this where you still have all those subsystems and you may still be using all of the transparent solutions you could maybe turn off your database encryption i wouldn't recommend that but some of our customers have because they don't feel the need they have other tools in place as far as access and they have format preserving encryption to keep the data safe when it's you know when it's at rest and when it's in use
breaches continue to grow we hear about them but the goal of data centric protection is that when the data leaks when not if um it's it's of no value to the attacker it has zero street value then the headline would be kind of funny you know so and so just lost you know had had five million credit card numbers stolen but the joke's on the bad guys because those credit cards were were protected with uh with some sort of um a surrogate scheme like format preserving encryption
the promise is that the protection follows the data the data itself since it's encrypted everywhere it goes everywhere it needs to be used then then it's just the protections in the same place the data being usable in that protected form is what lets this happen that's obviously safer but it's also faster because you're not having to decrypt the data to use it and compare that to the transparent solutions ibm by the way is doing very much the same thing they are now have a solution uh data protection passport neat idea where for a given field they can represent that field by uh um i'm not sure what to really call the object call it an object
but that object can be carried from one system to another the problem in their case is the object itself is not just uh something simple like format preserving encryption it is kind of more of a blob that has to be processed by compliant applications and databases and subsystems it does allow that object to walk around from one thing to another but they've all got to be part of that ecosystem but with format preserving encryption you don't have to do that you can just use the data you wind up with a lot of existing applications that might not need to change and not having to rewrite code is is a huge boost this is of course most effective for
structured data start there we can do unstructured data that gets a little bit more typical with with like whole file encryption um but start with the structured data and then let it flow from there and you'll get a lot of a lot of benefit this is a couple of slides here on before and after and again you've already seen this but this is the stuff i wanted to show you where i get into some more detail let's just look at the credit card number here in the middle it's obviously clear text here and so everybody's using it everybody's just doing fine except you might have this malware sneak in and pill for some data or a rogue user
or rogue employee heaven forbid who gets access to that and sneaks off with it but if the stuff is format preserving encrypted then um that you can see this here this example on the credit card is uh leaving the leading six in the clear or actually leading yeah leading six in the clear and last four in the clear that's a common uh mode that our customers use it's not mandatory i mean this is an option that people take you have flexibility to um encrypt the whole credit card or parts of it and we find that people have to make adjustments and so one of the things you want to look for when you're looking for a solution like
this is make sure that the formats are flexible so here the data is protected and the malware hits and the rogue users attack and they don't have anything that they can get that's worthwhile in fact simplifying the picture a little bit malware and and bad guys obviously we want to protect against but also the dbas dbas aren't bad guys but in their day-to-day work they have to touch the data with format preserving encryption and similar methods they can do their job without having to look away and they don't have to worry that they might have seen something when they go home at night they can sleep format preserving encryption or tokenization or surrogate solutions that's not the only thing you
want look also for stateless key management where keys can be pulled in on demand from a c a key server also look for keys that are derived rather than pre-generated if you have pre-generated keys that amounts to a key vault which becomes yet another high value target that the bad guys would hit you don't want that similarly with tokenization if you have a large and tokenization's even worse because um token vaults get to be huge so they they don't uh they represent a big deployment problem they don't scale well at all so look for on-demand access and on-demand key derivation of course the keys can be cached over at the client end for reuse but you
want your clients to be disciplined enough you want to a library in the data protection solution that will discard cached keys after a period of time just to keep things safe because they can always be re-fetched by an authorized user
i like the term identity-based encryption we actually in in the voltage land we started with our secure mail product and that's where identity-based encryption started for us uh it was an attempt to address the lack of scalability of straight pgp uh pgp is wonderful i'm a huge pgp user but it doesn't scale so well in the enterprise and so we had a solution that came up with that where the the encryption was identity based i still personally i think of it as identity based because we use the term identity in my world to mean a key name and that gets a little confusing because you say identity and somebody might think you mean my username
no but the reason i like it is because you can align key names with functions with accounts users and groups and things like that so the term identity-based encryption is really a a great thing if you use an identity to refer to the key also when you're dealing with a key name or a key identity and you move from one security district to another you can still use that same identity it won't represent the same key because a different key server will present to you a different bit pattern for the key even though you're using the same identity think about having a security district for production data and it's got its own key server then
you'd have a different production a different security district for uh qa or for development i would recommend multiple security districts and then within any given security district the identity will return a different value for the actual key bit pattern this means that you can do things like take production protected data drag it over to qa so that you have a large bulk of test data a large quantity of test data but it's not at risk because the qa environment the qa security district will not have the same actual keys we have used the same name so that the code as it's flowing from dev to test and then forward into prod the code doesn't change and that's
that's kind of essential you don't want to change the code after you've tested it so with a given identity for the key when you go from from qa to prod you're talking about a different key so the data with symmetric cryptography the data would decrypt but it wouldn't return the original clear text so that ugly looking mix of letters for my name would not return rick troth in in qa i had mentioned something about the formats that you want them to be flexible i mean i showed the example where we have leading six clear the trailing four in the clear you might want all digits encrypted that's that's great so look for flexible protection flexible
formats we have identities for key names and then formats for the actual kind of processing that gets done in the payment card industry they also want things tokenized now at the 30 000 foot level tokenization and format preserving encryption are really kind of the same thing but there are specific rules that people need to adhere to so find ways from the people that you're you're getting your solutions from to uh to meet the definitions and yet still get get the protection that you're looking for so whether it's called tokenization or fpe uh what you're looking for is something where the the encrypted version has the same character set or alphabet we like to term alphabet the same alphabet as the
input and the same length there is also a difference on that embedded format preserving encryption would be where you can people need to do key rotation and you probably have already heard about doing key rotation uh so how do you do that you know you wind up having to decrypt the whole shoot and match and then re-encrypt it that gets impractical but with if you can embed the index of the key that was used let's say you have a a group of keys in a in a rotation group there's an index that refers to which one was used and these rotate like on two year cycles or five year cycles so they don't rotate that often so
typically the key groups are are fairly small like maybe a dozen keys or so so that's not a big number that you would want to throw in alongside the protected data but you can actually embed it in the ciphertext if you expand the alphabet so another thing to look for would be embedded uh format preserving encryption and then there's the concept of a format presuming hash a hash of course is one way so it can't be decrypted because mathematically it just doesn't reverse this is great for the right to be forgotten where you've got customers says i really really really want you to pretend like we never met but their data is in your database you
want it non-reversible a format preserving hash would do that for you so lots of different things that you want to do here so look for solutions that are flexible if you use the algorithm that's put out there in nist for ff1 you'll get a long way down that road and look for these other features as well essentially fpe is radix math and just to explain radix math it's permutations it's where you have sets of things like uh the letters of the alphabet would be a set of 26 things and so the radix point would be 26. so with uh digits the radix point would be 10. things like that but it's just radix math so the output alphabet or the input
alphabet uh is going to have a radix in the output alphabet the same thing another thing to look for is what i call page integrated encryption just to cover all the bases think about a banking customer who's at home he's typing in something really sensitive at the browser hey we're https everything's cool yeah but when that sensitive data hits the front end web server it's in the clear again with page integrated encryption you can inject some javascript into the web page that will encrypt the sensitive fields account numbers and whatever else that you deem sensitive encrypt it before he even hits the the go button on the web form and that way when the sensitive data
hits the web server hits the landing zone it remains protected it remains protected after it pops out of the https tunnel i got to give a similar talk to this at owasp earlier this year oh wasp interestingly enough has like their top 10 security issues one being protecting sensitive data so this is it guys format preserving encryption is the way to do this encrypt as early as possible decrypt as late as possible leave the data protect it as long as possible really you want to encrypt the data at ingestion and decrypt it only when you need to hand it off to a third party leave it protected even for your big data stuff that just sits there for
years and years and years where you need to do correlation again format preserving encryption uh retains the um reference referential integrity so you can do joins and and searches and lookups and things like that um i'll mention lookups again in just a minute and and look for platform agnostic mechanisms one thing that bugs me is when you have a tool or an application or a solution that runs on windows but not linux or maybe it runs on the unix systems but it doesn't run on the mainframe or maybe it runs on the mainframe and doesn't run on windows or something like that or you need to use it on the mac and it's just not available it's look for
platform agnostic solutions that just run everywhere uh simple and easy to integrate and uh recompile for the different platforms um about performance coming back to this i've mentioned it already the traditional transparent solutions often do the whole file the whole database at least the whole record that means your decryption gets a lot heavier you have to decrypt just to access it and the decryption winds up covering more than just the individual fields uh the data centric solutions though 90 of what we've encountered can work on format preserved data right now searches and in particular i talked about indexing and joins and things like that but searches if you need to do a search don't decrypt
the whole database to do a search just encrypt the search term and look for that easy enough that's not i'm not trying to oversimplify things there are definitely some details you'd want to get into about that but it's really not that hard to have format preserving encrypted data and still do searches got to be able to do searches that's just obvious so regulatory compliance these are some top reasons for using regulatory compliance is a big reason for using it because you've seen just a handful of the regulations that are coming at us the payment card industry they are pretty tight and largely self-policing they run regular audits audits cost money any systems which have which don't have the clear text are can
be uh deemed out of scope you can have a qsa confirm this and if a system is out of scope you don't have to audit that is real dollar savings right there obviously we're talking about breach protection as well and scalability particularly when you get into big data think about a data lake where the data is there and it's going to be there for years but it's protected just in in itself and cloud workloads pushing the data i mentioned how sensitive i am about the the cloud if you have the data itself being secured and you push it to the cloud then uh i know i would sleep better at night not worrying about um
somebody uh sneaking in and getting some stuff so uh about the uh this standard uh we worked with them this is just kind of re reviewing some of the concept of fpe but um and it preserves referential integrity uh it is a mode of aes we submitted long before i joined the company we had our cryptographers work with nist lots of people were working on this at the time we submitted ff1 somebody else submitted ff2 there was also ff3 f1 and ff3 made the cut ff2 dropped out ff3 later was found to be flawed and is now uh has now been withdrawn but ff1 is still out there it's been vetted by lots and lots of
um much smarter people than myself i'm not a cryptographer the math there is just a little more than i can handle but uh i i know cryptographers and i i know them well enough that i trust them and they uh they tell us this stuff really does work but you want to use an open standard you don't want to be locked into one particular vendor this is an algorithm that we put out there the voltage team supplied but it's available to anyone and several of our competitors are already using it so it's not just about us uh use standards some of our competitors have their own algorithm and it may be great it may even be
better than aes f1 but nobody knows because it's it's wholly proprietary so you're better off using something that is a published standard and oh i should also say that what we originally submitted was slightly tweaked so we do have a traditional mode some of our older customers at my company use an older algorithm called ffsem not a re no no reason in the world why a given solution cannot handle multiple algorithms so ours our solution does and what whatever you employ should also but should include f 1 because that's the that's the really juicy standard so coming back around to where we originally started we want the data to be easily used in a protected form to minimize
the integration in the impact of integrating the solution and you want something that's going to be more more enterprise-wide than just the transparent methods and so that's what format preserving encryption is all about again if the data can stay protected and not have to be decrypted so much the better in terms of your performance and so much the better in terms of your security and of course we're talking about a standard so the uh the lyrics to the song could be expanded and i hope that kind of rings with you and maybe i've put an ear worm in your heads thanks again for listening and we'll take questions