
[Music] hello good afternoon welcome to engineering privacy from the get-go again my name is Christina Lou and who am I well I am a senior security engineer at Cisco moroi I am also a certified information privacy technologist I am at Lulu on the platform formerly known as Twitter and I have a website Christina li.com and I actually have a post about this topic with a little bit more information as well since this is a short talk all righty what will we cover first we're going to talk about why privacy is important then we'll get into what is personally identifiable data or information then we're going to get into what is reidentification why is it dangerous and three oh four we're going
to talk about some practical takeaways that you can go and uh try to implement so to better understand the power of personally identifiable information I want to play an imagination game and this one is going to be for all the burrito burrito lovers out there cuz I'm from San Francisco and it's burritos out there so I'm here to talk about burrito match this app is the hottest thing in burrito recommendation engines basically you give it some information and then it will find you your perfect burrito based on all the flavors that you like and you don't like so in order to do so you have to provide it basic information about your diet so are you pescatarian vegan omnivore do
you have any dietary restrictions like like allerg or do you need your F food to be gluten-free do you need your food to be Halal or kosher and with this information it processes all of that and then burrito match can not only find you your perfect burrito it will find the perfect burrito closest to you right now because time is of the essence when you're hungry and angry or hung over so oh okay well now burrito match is so freaking good and so accurate with his recommendations that you basically use it every day for six months because ain't nobody got time to cook and which is great but what if this app was not forthcoming about its data sharing
policies what if the information that you like all of your burritos with extra sour cream and two modelos which exceed the doctor recommended uh recommendations for foods like this that information gets passed up to insurance carriers and for some reason your health insurance premiums go up or Worse what if the information from this app gets passed up to organizations that do religious surveillance which is possible because of the location data on the phone and the Halal or kosher filters now this app then goes from being Whimsical and fun to disturbing and dangerous so thank goodness this app is completely imaginary it only exists in the minute that we talked about it but there were and are apps that are
personal data nightmares so does anybody remember the super super old iPhones the ones that were the light only turned on for Flash photography which was terrible um what we actually wanted was the light to stay on in a steady beam so we could use it as a flashlight and because of this user driven demand there was a proliferation of these third-party flashlight apps on the App Store and I'm going to talk about one in particular which is flashlight app by I handy so an analysis conducted by appthority which is also a mobile security software company they found that this flashlight app had the ability to access the the users's phone's location data could read their calendar
could use their camera and also had access to the unique ID number of the device itself and with that it also had the potential ability to then pass that data on to advertising networks without user consent which is a problem because users actually care about what happens to their data so in a 2022 consumer privacy survey done by Cisco uh it was shown that 76% of those people surveyed said that they would not buy from a company who they do not trust with their data and not only is this a trust issue this is really a user respect issue because whatever code you write whatever software you work on it will ultim impact people whether it's a burrito app
or a app that uh does software deployment orchestration it impacts people and you want that impact to be positive and you want to be building better products whatever that product may be because you don't want these unintended consequences hiding in the code or the architecture of the software because when private pracy and security are mishandled the consequences can affect people in very real ways this is a chart from Experian it's a little outdated because it's from 2017 but it shows the value of data on the dark web at that time so data that is sold like this has a price so for example the social security numbers surprisingly are only worth about a buck or were worth about a buck but the
passport information is worth ,000 to $2,000 so there is significant monetary value in having this data or selling this data so it should be pretty obvious that privacy is important but like what is it so privacy is usually talked about in like buzzwords and rants on the internet and we usually hear about it in terms of Damages right like how many millions of dollars were lost or what how many millions of dollars a company was fined for for is handling this data or leaking the data but really at its core privacy is an individual's rights to maintain control over their personal information because privacy allows people to be themselves by giving them the ability to control what to share
where to share it with whom to share it with grammatically correct um and privacy can be achieved through legal Pol through policy both legal and corporate policy and technical engineering controls so hand inand with privacy comes security and Sh all know the uh answer to this one but again our industry is full of Rants and buzzwords uh both popular buzzwords being you know things like fishing and threat actors and hackers hackers hackers but at its core uh security is really the systems and the controls we build to protect information and that information is things like proprietary code credit card information and personally identifiable information often referred to as pii so uh security can help achieve
privacy but it alone is not enough to protect privacy or NPI so when we talk about pii it's usually lumped into two categories what's considered sensitive pii and non-sensitive pii and a note here is what types of information count as sensitive or not sensitive pii can differ by state by country by region so you need to be very careful when you're trying to like bucket information into what is sensitive and non-sensitive so sensitive pii as defined by the Department of Homeland Security is data that if lost compromise or disclose without authorization could result in substantial harm embarrassment inconvenience or unfairness to an individual the tldr for that one is if the information can quickly and
accurately identify an individual it is sensitive information or sensitive pii so example of sensitive Pi is things like a social security number that is a number that follows us through our lives we can't work without it we we can't get housing without it um driver's license numbers generally that number doesn't change unless we move to another state and Biometrics information because I can't change my fingerprints unless I lose some fingers so so non-sensitive pii is really information that by itself is generally not considered to be a risk to an individual's privacy or security and this information is very commonly collected for things like marketing customer service research um but care is still needed to ensure that this data is
still protected from unauthorized access use disclosure destruction all that good stuff because it can become sensitive if you take if you're able to take multiple pieces of non-sensitive pii to put it together to then quickly and accurately identify an individual so for example some of this information here like if you just have an address you might not be able to like quickly and accurately identify someone especially if they live in like a large house or an apartment building but if you have an address a gender and an age then suddenly you may be able to to find this person quickly and accurately so to protect data and to be able to use and store this type of
collected data we can use a concept called de deidentification which is the tools and the techniques that organizations use to minimize the Privacy risk of storing and Publishing data containing pii here are some common deidentification methods so special note here is they may be called different things across different Industries but the idea is very com very similar first one I'm going to talk about is the idea of redaction which is removing pii from a data set so if you have a data set with people's names and Social Security numbers can you remove it to still make that data set do what you need it to do another one is masking also known as pseudonymization which is the idea of
obscuring the pii so that you can't read it in clear textt manner so if you have a data set with social security numbers can you instead of having those those numbers in clear text replace them with all stars or run all those fields through an algorithm that then turns them into you know random numbers and strings the third one is generalization which is the idea of grouping all of that pii together so for example if you have a data set with a specific ages of people instead of having the actual ages can you say that all these people here are under 18 over 18 they're under 65 over 65 and then finally there's obviation which is the idea of adding noise into
that data so again if we use the data set with ages as a example instead of using the actual age can we just round off the values to the nearest um or lowest decade again can you just uh take an average of all these ages and say does everyone here is like between the ages of 35 and 45 now obus can be aggressive and can make that data harder to use but uh if you have data sets that have very sensitive data like healthcare records um this may be a good way to go for you so safe data handling and disclosure is more important now than ever before because um for as recently as May 22nd
of 2023 the Ireland data protection commission actually fined meta 1.8 billion with a B dollars because they had found that they were sending um EU users information to the USA So Meta is appealing uh we don't know if they're going to have to pay this or when they'll pay this because it's kind of is working through its way right now legal system is not fast and and like money aside protecting pii is important because we're really not Anonymous anymore on the internet uh and that is because of reidentification and reidentification can happen from deidentified data sets and this was proven actually in 2006 by two researchers in the University of Texas their names are Arwin nen and vov
they're not actually Lego people couldn't find any good pictures of them so in 2006 Netflix had a contest called Netflix prize it was a million dollars awarded to whatever software team or engineer could come up with a better movie recommendation algorithm for them and in order to facilitate this contest they uh Netflix Rel released a data set that had uh information of over 100 million movies had information of almost half a million subscribers and six years worth of that data so our researchers found that they could use the public records in IMDb to re-identify the user from the deidentified Netflix data set by matching um whether or not by matching you know whether this user liked
specific movies or disliked specific movies and a with a posting date that could have a 14-day error and they found that they only needed eight movie ratings that matched and again that 14-day error and even with that tiny bit of information our researchers were 99% confident that they could re-identify the user in the Netflix data set so they also published that because movies are ranked very specific to our own personal interests and what we like and our own lives they found that other traits like sexual preference and political party can actually be inferred based on how people ranked these movies and if and here's another example of uh reidentification happening from unlikely data sets if you need to
convince your friends families or managers that privacy is important so this one is an experiment done by Dr latan Sweeney who is the director of the data privacy Lab at Harvard uh her experiment showed that you can match hospital records to newspaper articles to re-identify people so what she did was she paid the state of Washington 50 books and she got a data set of information that contained patient demographics clinical diagnosis procedures this data set was deidentified their names and addresses were removed but some of the records s had the zip codes still so she then went to Lexus Nexus which is a newspaper database and she she searched for uh newspaper articles with the term
hospitalization in the Washington area she found 66 articles and newspapers are in the business of informing the public of current events so they do publish specific identifiable details like name age treatment hospital and other information so she was able to match the newspaper article to specific records in the the hospital data set and here is an example of one of them so the the long one is the hospital record and the short one is the newspaper article so in yellow we can see that it's a 60-year-old man which match back to the age in the data in the hospital data set the teal is a location Soap Lake man which which match back in in form of uh the ZIP code then
the when this person was hospitalized because in the newspaper article it was a Saturday afternoon which matched to the newspaper uh the hospital record and finally oh and then the uh reason why this person is in the hospital which is a motorcycle accident then the treatment Hospital in Orange which is Sacred Heart Hospital which match back to that data set now in pink in the newspaper article we find that the person that was affected is Ronald Jameson uh for this purpose they changed the poor guy's name he's already having a bad day so no need to make it worse now because we've successfully identified reidentified Ronald Jameson we found that we can find more information about this person about how
his Hospital State cost him $71,000 and he also has a slew of other um health things now because of this accident so due to the work due to her work um the state of Washington did make changes to increase the anonymization protocols of their public health records so this is a feel-good story here which is awesome now in addition to these the human consequences of mishandling user identity data there are a lot of legal challenges and consequences as well that need to be aware of special note here I am not a lawyer so what to let you know there are many many different laws that companies can be sued for or fined for for data
breaches and mishandling data different Industries have their own specific privacy laws so work with someone that is a lawyer now at the time of this talk right now in the United States there is no comprehensive federal law that standard izes how pii should be handled we had one going through um I believe it was the Senate but nothing happened and they think they're trying to reintroduce it now so all the existing laws that we have are patchwor and Reliance on the individual states to enforce them and here is a current map of what states have laws and what states have laws working through them this map is current as of like September and a special note for Oregon
so Oregon actually one's coming online it's called the Oregon consumer Privacy Act and that one is effective July 1st of 2024 so be aware things are coming for Oregon now there a lot of laws and and like not a lot of hope it seems like here but what can we do well here's five things five things that we can do um to start with and the very very first one and the first rule Ru of pii Club is don't collect or store unnecessary data the second rule of pii Club is don't collect or store unnecessary data if you like literally don't remember anything else from this talk don't collect or store unnecessary data like this is such
a big deal that I actually made stickers that you can come get later especially if you have a question I'll give you a sticker don't collect or store unnecessary data now you can forget the rest of this talk so number two if you do have to like store this data create a schedule for when that data is going away that's usually called a data retention policy it makes it less stressful for those that have to maintain the system and also uh modern cloud storage systems like d AWS actually have configurations to be able to make this data deletion automatic so you don't have to deal with it three use only the data needed to get
the job done be an advocate of being incredibly selective of the data that will be processed and stored don't collect it just because you might need it later and don't be afraid to ask the person that is wanting this data if they can get the job done with a more smaller limited data set because we want to make it harder for those reidentification attacks to be able to succeed four build for privacy and Security in the beginning because it's never cheaper or less effort or faster to bolt it on later and if you try to force it in later you may end up building Mission critical systems that then have to be materially changed or
just retired due to Privacy Law violations and remember those privacy laws are changing very fast so be careful you want to be uh designing and building to the strictest standard at the moment it's gdpr for most of us so use that as your G finally five work with a privacy lawyer so again because Privacy Law is complicated varied and quickly changing you want someone that actually knows uh all the in intricacies and the legal stuff so remember the code that you write and the software that you work on has a human impact even if at the very surface level it doesn't seem that way so we as security Engineers we're also stewards of our users data and it's
important to know how our users are expecting us to protect their identity cuz it's the right thing to do even if it takes a little bit more time more effort to build that in cuz after all I know that you want the company responsible for your pii to be also taking the utmost care and consideration with your data and be doing the right thing too so again my name is Christina louu want to big thank you to everyone here to V sides the people on the camera our av staff and our volunteers so yeah oh oh I have this uh the QR code goes to my LinkedIn it's not nefarious is there do I have time for
questions or oh yeah well let's do it what's up yeah so I think the question there I because I I'm assuming they're camera people um is how can one help uh mitigate the risk of using these free apps and things like that that have access to where we're the generating of generator of the data for them to make money off of and how do we mitigate damages from that right so that is i that's kind of the million-dollar question um so basically we're the landscape right now it's there's no good answer for that one unfortunately unless I think like we as an industry like really bring privacy up to the top of just like hey we shouldn't be doing this
right we shouldn't be making people's data the product necessarily um I know that one thing okay there's a few things you could do like see see if you can limit any of the data that's being shared like if you go into the apps right you can see kind of like what information they're sharing turn those off if you can um same with browsers like you can now like our web browsers there's always a little thing on the bottom that's like except the cookies I know it's a pain in the butt but like if you want to do it just say no don't accept the cookies so they can't track you um and I think it's very very recent the White
House is actually um starting conversation and thinking about data Brokers cuz that's where all this information is ultimately going up to is like data Brokers that have your information and that sell your information to um companies that want to Target you for ads essentially so the company uh the White House is thinking about um potentially putting some limits on data Brokers or or at least monitoring that somehow but it's very new I think it was like two months ago the White House like came out with the statement so uh so this is more of a comment in my experience working with application developers the biggest privacy problem that they accidentally get into is logging everything just in case they
needed for debugging like they'll log the whole web request and response to a service and not only will that take up a lot of space if it get Ed lot a lot uh last week I saw that they were doing that for a credit rating or a credit application service so there was a lot of thei being loged and they hadn't thought about that they were just like you know we might need a couple of these pieces of data so we'll collect everything yeah and now logs are full of social security numbers yeah um yeah that's like a tricky thing right because that that means that like I mean people have to like start thinking about the
fact that like you can't log all the things just in case you really need to like log like especially with this request like if you're working with an app to debug like maybe you don't need everyone's information you just need the pure request information right and and maybe that's what's broke like that's what's breaking um also another thing is like can they log things for a very short amount of time and then delete that after they found the problem right so yeah that's a good comment okay I'm getting cut off cool all right uh those two that have uh come get your stickers okay [Music] bye