← All talks

Beyond Labels: Evolving Data Classification

BSidesSF · 202430:17194 viewsPublished 2024-07Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
Beyond Labels: Evolving Data Classification Rob Oden Beyond Labels' delves into the intricate art of data classification, expertly balancing security, privacy, and business needs. Uncover strategies to craft a flexible yet robust program that navigates the complexities of departmental demands while ensuring standardized compliance and protection. https://bsidessf2024.sched.com/event/6033b9967d538f747afaf64745df3513
Show transcript [en]

thank you everyone for coming on a Sunday afternoon at right at lunchtime to listen to the Beyond labels evolving data classification talk my name is Rob Odin I am the senior data classification specialist on the roblocks information security team and I'm going to be speaking surprise surprise about data classification and more than just speaking from like a consultant or here's what Gartner says you should do I hope to integrate my last 20 years experience uh supporting United States Air Force multiple threel agencies within the intelligence Community Consulting Aerospace and defense and in my couple last roles in Tech especially with what I'm doing here at ROBLOX now a little bit about roblo and why data classification is actually

important to us we have more than 72 million daily active users who use our platform but we also have over 2 and5 million developers who are creating uh amazing experiences that's why we have over four almost 4 and a half million unique experiences and as you can imagine the types of data that we hold and the needs to protect that while at the same time sharing it appropriately is really critical and core to our business so with that we wanted to talk about kind of share how to look at data classification but I found like when you dive right into a topic it's really hard to contextualize it so we want want to kind of use like the Mackenzie method

right of going very broad at first that's the section one picking the right framework and then the section two will Deep dive into a framework that I recommend and believe is really impactful for creating an actual holistic data classification program I will have a moment and I'll will read a question or two between one and two to give the digestion at that kind of broad level before we dive in so going right in picking the right framework now I know with a bunch of security Engineers right and a security conference it wouldn't really be a security conference unless someone from GRC came and told you how important GRC is right like yes we're real boys too

right um but as I see it GRC or governance risk and compliance is how the infosec program information security cyber security communicates to the rest of the business as well as to the rest of the world from our Regulators to our users right it drives our program GRC is not a monolithic it is three components your governance from how should we do security to risk management understanding the operational drivers of our business and those things challenging us from accomplishing those goals and rightsizing our approach to those risk and finally compliance understanding the Regulatory and Industry whether PCI DSS or gdpr and what's the appropriate level of compliance your organization needs to meet now not all GRC programs are the

same right usually there is a primary driver and from that primary driver you can see what kind of information security program you have and so what I call these as the big letter so first we start off with the big GGC program a governance L security best practice this is very common in the US right if you see a security program we have firewalls we have a dart this is the things that we should do for security what we've been seeing growing especially with the changes into the SEC announcement as businesses have been trying to quantify and qualify the security risk in a more financial terms now you've been seeing Big R Gru programs from a risk management my right

siiz insecurity and then finally you usually see this in either heavily regulated Industries within the us or you see this a lot in European agencies uh a compliance of Big C right so it's like you must do these things for ISO 27,1 or gdpr or your nist 853 if you're government agency at ROBLOX we we Leverage The Big R GRC program so we look at to our business partners um the board from our hiring from the tools we spend are we approaching the right problems and are we appropriately dedicating not only the people resources and time towards those things now why am I talking about GRC when we talked about data classification well you can have that same perspective

from data of classification to the GRC so let's first start off with the things that we're most common with the big G right from a governance Leed if you put how do I do data classification in chat GP this is what you're going to get or maybe a consultant says here's the things you should do right they're going to tell you to identify your data to classify it put some protections around it educate your team members and into do monitoring and I'm not going to talk about that there's a lot in materias on there if you're interested n is actually doing trying to codify all this for those best practices and their data classification but we're not going to

focus on that right nor are we going to focus on the compliance right there's organizations that already have very detailed whether you're a PCI DSS for your 16 digigit pans in your card holder data environment or your user pii very explicit or your ISO 270001 using 270002 controls right it'll be very specific to very specific data types and you see this a lot for departments or teams that hold that data but it doesn't really expand out and from that you don't have to determine your data you just classify what you're already told to protect you put the protections you have to show compliance to an outside organization and then you're constantly doing plans of action

Milestones to show that you're constantly meeting that and we're not going to go into that because I do think that is important to know but that is a component of the program especially when you're looking at it from the Enterprise instead how how we approach this again at ROBLOX we're a Big R cyber security program break it out to a little bit simpler scope defining why you are actually trying to classify data right what is your drivers who's your actual stakeholders what's the impact you're trying to get to they stratify how do I prioritize uh classify categorize segment what's more important in one situation or another right how do I understand that not just from where I

stand but from my partners and then finally the secure am I putting in the appropriate measures for my organization right am I right size in Security based off the sensitivity of the data and the impact to my operations now like I said this is the broad focus on here we looked at GRC and we looked at the different Frameworks um one of the things from the feedback is just kind of a chewing on that um do we want to see if there's a question that I can answer or we can dive in if there's any in the QA not yet not yet perfect cool okay great and the first question sometimes GRC has to interface with Enterprise or

financial GRC teams how do you effectively communicate security and privacy priorities to the stakeholders um one is to understand so that's great so every single team is going to have their own priorities where you're talking about engineering where you're talking about legal privacy where you're talking about internal Department you have to understand that they do have that priority and for them that is their burning platform but you also have to understand and communicate what does the organization as a whole is facing right so maybe for you as an individual or for the company it's not a p0o right the priority might be a little bit slowing but you have to ensure that they know that in your road mapping in your

prioritization and in your communication plans you are taking to account their main drivers and they have that uh assurance that that is actually being addressed that's how I feel like from there great so with that we go broad now let's go drill in with the scope stratify and secure now I have this in a linear path but I want in the back of your mind is that you you usually will start with scope first but you're going to kind of do some of these in parallel in somewhat circular so just keep that in the back of your mind so first and foremost like all things asking the right questions right driving and this comes into why

are you doing this in the first place I had one person says you know I thought I was protecting this and then my CEO said this is the most important we've had conversations I know what my CEO thinks is the most important but I also know some of the directors and VPS very disagree about what's important what is our sensitive data what can break our business right so asking the questions it's really easy to start I think I already know the answer but taking a moment doing that due diligence looking at it from multiple perspectives kind of sh like why am I doing this in the first place which kind of leans into some of the

common issues that I have faced in my career first and foremost what is your burning platform I think this is something this seems simple but I promise you not only is this something you should be asking at the beginning but you should be asking continuously continuously continuously because as you evolve as you grow as you drive to solutioning you can sometimes lose track of this so understanding what is the operational need your organization is trying to solve with the ability to categorize and classify sensitive information there is the leveraging next is the leveraging what already exists crown jewels and data classification is not a circle but it is a VIN right you might have conflicting

priorities or they might be really a line sometimes when we get brought in says we need to relook at how we're classifying data and it's really easy to throw the baby in the bath water uh but you can leverage components you can leverage pieces of it again if you have a department such as your payments you might do PCI DSS because they maintain a CDE or maintains parts of it there's really good things you can take from that you don't have to be directed or limited by it but it's really good to take that the thing that I have found the most impactful is understanding that we all have unconscious discipline biases and you stand where you sit how

you prioritize something is usually going to be based off of the role and the priorities that you have understand that legal might have very different ones compliance might have very differ they will have different they're going to be very opinionated and you're going to first when you approach this if you approach this as a security professional you know that might not be right size for the organization because the operational need of that so just taking to account your own biases taking in these other perspectives and really trying to drive to again why we recommend from the risk what's actually having the impact to get to the business operations while reducing the risk to do so and then finally something that I

have struggled especially moving from like government to Aerospace and defense to tech tech moves really quick really fast I don't like putting subpar I'm very proud of my work we I'm surrounded by people who I learn from every day at ROBLOX so I want that perfect product but you're never going to have perfect and you can have analysis paralysis with this problem so having something good enough that you have the drive and being able to just land something push it forward and with that also understanding you're not alone right and I and something I've heard really good is like the democratization of vulnerability management making owners pushing it down it's if security becomes the owner of

the rest of the organizations we have failed same thing with data classification make them responsible have your business partners of saying how do you define what's sensitive in your organization why is it what's the operation right bringing other people in and the more people in again from there you can really drive your impact so kind of now that we've kind of scoped out the problem how do we actually Define categorize segment prioritize all that right now fun little thing about these slides every image in here was generated by chat gp2 these are not Roblox images these were things I told it to do either chat gbt or some other thing so with that spelling is still a work in

progress so you're going to see a lot of misspelling in here but you know we're dealing with AI um the most common that you have from a data categorization is the industry standard right your four levels your public you're internal you're confidential restricted they might use the words a little bit different you might go three you might go five but the idea is how do you bucke tize your information to set a minimum level of protection and so the entire Enterprise understands if it's this I do this with it well the problem with that is that's very Broad and doesn't really meet a lot of use cases right so then we go in something like this where we're being

very specific right how I actually identify for II for example someone's email versus someone's Social Security number I'm going to be handling very different maybe I have their gameplay and I don't have any direct identifiers how do I Mark that how do I identify that is going to be something that you're going to have to work through but with the understanding that you can just like I mentioned with the McKenzie right you go wide then you go deep it's the same thing with data classification understand a common lexicon of how you define what sensitive data is for the Enterprise but also understanding when you need to go drill in do you have the capability and that those can marry

because there is rarely a department that only has data for its Department you have you're going to have to share either externally to the organization or externally to different departments and so they you're going to need to know how to handle that data so right sizing what that classification is going to be a significant effort and again you're not going to be perfect so having something that works and you can ship and then constantly iterate on is much more impacted than trying to figure out the perfect solution and again just like we did with um scoping there are some common issues thinking past the initial needs this wasn't uh a thing that we ran into

when I was at Facebook or meta uh cuz I don't know maybe some of you heard in 2016 we had a little bit of a hiccup with how we were doing user privacy um and so we had FTC we had a big compliance s privacy privacy privacy and so how we looked at data classification was from a privacy standpoint well the problem is if you go and you build your entire program based off of one specific Department's need when you look to expand that to the Enterprise your guidance your direction your handling is going to have that bent and things where it would be a little bit more easier to kind of move you're really going to be

restricted so having in the mind again what is the operational need that you are trying to meet as your organization you're trying to buy down the risk having that in the back of your mind what's that burning platform while accounting for these Department needs to be really granular be really protective sometimes that's supplemental guidance we can talk about that in the secure section but really keep that in mind again if this was a college lecture this would be my foot stomp moment leverage of what you exists reason I mentioned this twice is because I have done this wrong so many times because a lot of us come into a new role and we want to show

how smart we are we want to show how how we are the ones who are producing the great things and so we'll come in be like great ideas well there's a lot of things even if have a program that says look we didn't do data classification right there's always something that can work and there's a lot of Frameworks out there don't be constrained by the Frameworks but leverage them Leverage The Language leverage the Lexicon you can be really impactful from this but also understand that some people are so tied to those lexicons tied to those Frameworks that there might be a little bit of massaging of how you communicate to this focus on the Enterprise not the

team again whatever you do is going to have to be not even if it's like let's say credit card information or payment inform rarely if ever is it just the payment it might be customer service or support that's going to be transferring on your organization it has to be able to communicate and then finally something that we do here at ROBLOX that I think is really good if there is no action there is no category why would I put a label on a piece of data if I don't want people to do something specific with that category that they're not doing something specific with another and as Security Professionals we fall into this trap right because we want context IED

data security is about context so if I can get more context about who what when and where at the data level yay but the problem is from a usability from adoptability from actual enhancement this is going to impact you and we had this at hell3 Harris where our first data category was 48 different uh categories of data as you can imagine wasn't well received um security loved it our engineering operation didn't so then we went down to six right so you have to rightsize that but there's something else is semantics are actually important in this world what you call something so if you in an organization let's say you in a US Government lint even using the words

confidential secret top secret are no goes because those are US Government classifications and if you have an organization that holds classified data they're not wanting to see that on your corporate or your prod that you have confidential or secret information understanding that understanding the industry for the languages the word sensitive from a data classification and a data protection and a data security sensitive means that this data is more important than this other one from an impact standpoint but privacy has adopted the language of sensitive and they think sensitive only means sensitive pii right even though that they took sensitive from the industry standard to say this is pii that is more important than the rest of piia so

understand that how you describe the language the terminology might ruffle some feathers might have different views so getting to that right this is a fun little exercise when you start saying this is here your prioritization this is more important than this understanding that people might get tripped up on Words and so with that in the back of your mind say okay cool now let's roll back to actually maybe there's a different word I can use or let's set a definition internally from a taxonomy okay great now I've told you how to we're going to scope it kind of break it out now let's actually do what we actually care about right securing well a lot of what we were talking about

was a taxon how we describe something and why we describe it and most people they think hey let's label it either a metadata tag which is machine readable uh maybe some marking right attorney client privilege header Foo a uh if you're doing export control um your control numbers whatever right I want a human being to know and do something or maybe it's an index meaning that I have a database that says in this data story this database these columns are these data types and I can call from identification great I'm done no cuz you're not actually getting any security from that you have awareness but I had this I worked for this lawyer who said you never want to

go from ignorance to negligence overnight so you just tagging and saying hey I have data it's not being protected great now you have compliance findes you actually need to protect the data and data classification in it of itself is an enablement right again we've been talking about risk and right sizing this is allows us to say where do we prioritize our funding and our resources and so starting with the thing that most people think about for data security and that is access management we've had some phenomenal talks about right sizing who should Access Data this is a amazing resource to say this repository or this service access this level of sensitive data maybe we should be a little bit

more restrictive or in inverse which I think is really good hey we have 80% of our data that has just internal data Maybe we default the access to those repositories that if you ask for it you get it right make it really simple and really easy next is your data tagging Discovery especially if you have compliance regulations or if you say you have sensitive data types that can possibly bleed over into other parts PCI is a good example they always want you to look if you hold any credit card information they want you to look for credit card information this allows you to not only just rely on your people but use technology to start searching for

this that goes into your data loss prevention data loss prevention is a Transportation Control right I am moving data either outside an authorized storage moving to external parties I'm moving external teams can I control with that right if you know what your data is or even if you're doing content inspection like you're inspecting the data as it's moving you need to know what that data should be and why it's categorized that way next is policy enforcement right where we go and say the direction I've given you you should do this and then obviously for data security our second favorite control is encryption right whether that's full dis encryption for your authorized storage whether it's file based encryption

whether your irmd or whether it's your application Level encryption from like your databases where there specific data types right this analysis where do I put this very high intense resource right there's there's a cost to implying encryption now I'm right sizing it and then finally you have the audit of that because there's a lot of people saying hey we've done this how do we prove that we've actually done this and do I have the tools and capability that I can go and say is my tools effective and I'm answering that question and why this and the thing that's in the center I cannot expect people to do stuff if I do not tell them to do it and how to do it

right a big thing that I got complaint when I was at boo Allen is we would tell people what to do not how to do it or in which situations to do it right like great I'll come back in six months and tell you the same advice um but from that is being like they they need to have actionable results and with that comes into kind of your phases right A lot of people especially as we are looking at staff reductions efficiencies we all want to go to automation but if sore security orchestration automated response has us anything right s made a lot of promises that it hasn't really delivered but you know what it did do it

forced a lot of teams to actually write down what the processes should be right that orchestration has been pivotable and this is what I say for data classification first and foremost if you want people to do something write it down make it guidance make it available make it easily understandable then you can tie in manual controls about what you wrote down that should tie into there's a change there's a modification we write it down and then finally once we start seeing activities and behaviors that we expect to be certain way then we can start automating to relieve the load we can abstract some of this from our Engineers from our users from our teams we can make it easier and then we

reinforce that with guidance and manual control so we can write size but all this being said let's say you're do this let's say tomorrow boom you're successful data classifications in place but just like access management just like Key Management just like certificate management it's the management that's the hard part the initial rolling out is easy for the most part your uniqueness your situations your regulatory environment is changing how you prioritize data maybe you've gotten the low hanging fruits and now you need to really dig into your business processes right that is going to be an ongoing effort and this is go back to the conversation I point at the beginning your risk management approach what's important to the business not

what's important to cyber security right cuz we would all prefer people to do the most secure but it c a lot of friction we really need to go with what we trying to do at ROBLOX we are wanting to build to one billion users that is our bit like that is our goal we want to connect 1 billion users for Unique experiences if I'm causing friction in that growth right I'm going to stop us from doing that but if I don't protect our reputation uh do the data stewardship and the trust that our users and our creators have given us us from their sensitive data then we're not going to do the 1 billion users right so it's

that right balance from there and so that management that interaction is really key and then with this tying it all together the main key points i' hope everyone's able to walk away from is every organization is going to have a different business and Regulatory needs and your cacation should reflect that meaning you should be able to frame it should reflect your business your business shouldn't try to modify to your classification program which is something that I have fallen into that trap right here's how you do data classication we should build it it should be diverse ensure you have a clear road map because you're going to be challenged you're going to have new situations where are you going and why

what is that story right if that story is not aligned with the business that story is not aligned with where the organization is going you're going to divert you're going to focus on the wrong things and you're not going to get the buying you need from Key Partners taking a holistic approach to right siiz your classification strategy accordingly right sometimes you're going to be a little bit smaller just so you can get impact and roll with it and you add you add and add you're making sure that are we prioritizing Security in the right way not just because this is what we should do and finally just like security by Design privacy by Design shift left

the more that you can integrate your data classification how identify why we identify and how we're prioritizing data at the earliest from controls from even collection of data is the more impactful that we can be as a program as a whole and in with that I want to thank everyone for your time um and then we are going to go to questions but with that U find me on LinkedIn find me on Roblox and we are hiring from a number of position so if you are looking for a rle please come to our uh roox shops okay yep okay I'll be reading off the questions now uh obviously thank you everyone whove attended so far do you

see value in decision trees to help people make decisions that watch which data is classified where absolutely right my my intent is to default into what makes the adoption the simplest because if your average user your average person is able to incorporate this now the problem is how do you make that decision tree right do you make it very Broad and go in and you're going to have to tailor this based off the type of data so let's take pii for example you're if you're doing a decision tree you're going to have to have supporting documentations for that and to kind of streamline people with the IDE a that you have very clear definitions of what

term what the the categories are as well as what the expected protections is for that so yeah I think decision trees are great how do you handle people who don't want to take ownership of their data ah this is a spicy one because we are struggling with this right now because Define me an owner is it a service owner is it a system owner is it uh a process is it data right so for us we are working with the data ownership and data Stewart because responsibility is kind of okay accountability is a real struggle so that we are working with our senior leadership we're working with our key xfn partners and say this type of

data should be owned by this group and we're looking to default that at okay at the VP level at the pillar level great no one in your team wants to own this you own this now oh no this shouldn't be me how do I push that down again pushing that decision pushing that integration to the business saying look we are trying to right siiz security for business operations but that means that when you were saying you were accepting your risk you were taking ownership of that and certain risk you can accept at your levels other risk you can't so it's that communication hopefully I've answered that question uh next do you see value in decision trees to help okay we already

did that um how do you align the business risk tolerance with employee incentives um you know this one's a fun one uh I I would see employee centers not from like stock or compensation but from a I'm going to make your job easier so I'm a big fan of the golden path if you're familiar with that like the thing I want you to do I make that easy and the thing I don't want you to do I make that hard and I should be meeting 80 to 90% of your stuff on the easy path and the hard path we're going to do exception process and you have to really justify and we go up to a level not just your manager but

a little bit higher for that exception right if we're going to go outside the bounds there's a reason why so sometimes GRC has to interface oh I already got that one and what metric is there for measuring uh program management over time um this one I would see is uh you know hit me up after this one cuz this one I I probably spend a lot more time on than I should um you mention having an appropriate road map for the journey what's the biggest challenge putting together the road map um that the CH the priorities are going to change in your organization right A lot of times people say hey here's our H1 here's our H2 um I've had three

different views on what data classification should be in my last uh 18 months here at ROBLOX right and sometimes those priorities change I think the biggest thing is like what we're trying to drive and to achieve um and then there's just the unknown unknowns that sh like I'm starting to go down someplace and you agreeement from this pillar and this pillar and now let's say Creator says actually this completely contradicts or I need X Y and Z and that can significantly change your road map how you prioritize and which just that you push forward I think I got question for one more um can you walk through a sample data tag how to create can someone hit me up after that one I'd

love to walk that but that's going to take a little bit more and is there a relationship between a data size and classification type I personally it's really easy to go and say something you know based off the petabyte and zetabyte I think it should be the data type first and then you can see the size of it right because like say from a data exfiltration if you look at oh we had this gigabytes left out but it's mostly your public and internal but you have a couple megabytes that's your most restricted that's what I care about more so first start with the data type and the sensitivity then the size comes into play and that's how I go and with that

I'm over time thank you everyone for your time