
good evening everybody the real Troopers are in the room this is the last presentation for the evening for besides San Francisco 2024 woohoo all right now we have uh these handsome gentlemen we have Mr Gupta who is a senior security engineer and then we have Mr parala who is a a senior software engineer both at snowflake so what I would encourage you to do we love participation um if you're not familiar there's something called slido sli.do um you can find it on any QR code outside of the theaters um or you can just sign in by doing bsides sf.org Q&A spelled out so that's Quebec November Alpha and um I'm going to turn it over to these gentlemen their topic
is about security data all right thank you gentlemen take it away thank you um so hi everyone um Risha and rishik here and we are here to talk to you about how persisting and storing uh and structuring your Security review data uh can help you deal with and Tackle Security review challenges or questions that you might face in the future uh right and so both of us uh work on anti-abuse U and given that we do uh we inevitably found ourselves uh having to answer the question which systems are affected by abuse threats right and on face value this can be a very daunting question to answer right like now you have to think about all the different
systems in your uh in your organization and now you're thinking about all the different kinds of abuse Ts that might be applicable to them I mean where do you even start right but as it turns out some very smart people uh chose to structure and uh store all of our Security review data and then we were able to use that to answer this question and of course we used some NLP along the way to help us because that's what everybody does these days um and so uh before I get into the solution that we built and the tool that came along with it and how that was able to identify abuse related threats uh I want to start
with the groundwork uh and all all the things that led to Gathering of all the data that we were able to then use uh when uh we try to answer the question about abuse right and so I want to start with uh the security review program and what were the challenges and the choices made when it came to it U so the challenges of scaling or security review uh I guess one of the challenges that we had was the security review had to be developer driven it had to be scaled so that the developers are able to do the security reviews themselves right but that comes with a couple of challenges uh one is if the developers are doing
Security reviews the security teams want to know how does a change or a new feature a new architecture change or whatever have you affects the security of a system right uh and that can be a difficult one to answer if you uh because Security reviews generally tend to be unstructured that can be a document like a conference page or something uh a design doc um they tend to be informal and they're very rarely stored so it's it can be difficult for a security engineer to then go back and look at uh how does this particular uh piece of Security review that a developer did uh affects the security of a system right and then the other part
of that is Security review or threat modeling in general can be very non-deterministic uh and that means it tends to be more of an art uh than a science and that becomes very challenging when you hand that over to developers uh because it it depends on the individual security knowledge uh more or less for them to be able to uh do a good job at threat modeling um and then if you want consistent reproducible results from developers uh that problem of non-determinism becomes a big challenge um the other part of that is developers themselves right like they don't want to take on additional burden they want something that's like Fast low friction and they they don't necessarily
want to spend day or two or week thinking about Security reviews they want to be able to do this as quickly as possible uh and so these are the challenges that we uh thought about solving um and that led to all the different decisions right so first we set out uh to solve the problem of uh persistence and storing the data right and then so the First Security review data that we stored um was stored in GitHub um and the the choice to store that in GitHub was very conscious one because we wanted the security review data to live right next to the source code that was about uh and we stayed away from the conference pages and the
design docks because those tend to have a very short life uh they get misplaced loss in folders just tend to get forgotten about versus something like source code and test cases they tend to live forever right and then uh and so we we decided to purist the data andru them uh in markdown for the uh risk assessments U markdown for our threat modeling data uh it was XML for the uh for the DFD or the data flow diagram and that came from the draw.io shapes and then one of the goals for the security review uh in general was to be able to have a repeatable reproducible process for developers that can output security test cases for them uh and to that end
uh the uh the the threats and the mitigations uh which are the output from threat modeling are stored as uh girkin test cases right uh and we chose these formats because these are easy to uh for machines to parse you can put them in code and you know qu them uh or parse them and uh do all kinds of different things with them but also they are somewhat easily readable by human beings when somebody's trying to look at it in GitHub uh and that's uh very useful for somebody who just wanted to like quickly take a look and make make some kind of determination about it right so that then solves our problem of structuring
the data uh and storing it but there's still a problem of uh developers spending a lot of time doing Security reviews uh because at that time we had a problem we we were following an approach called stride per element and stride per element for developers tend to be very timec consuming and inefficient uh for anyone here not familiar with stride it's a it's a acronym for a general class of threats which is uh s is for spoofing uh tampering repudiation information disclosure uh denial service and elevational privileges right and the problem is when you start asking developers to think about going to every node or every element in the data flow diagram and then start thinking about
all these different things and then come up with a subset that is actually applicable to the DFD uh it again becomes very very complex for them and then also it becomes very subjective because it's reliant on an individual developer's experience with security and so what we wanted to do was to take out the non-determinism from threat modeling uh to basically make it less of an art and more of a science uh and to do that uh we discovered this approach called rapid threat modeling prototyping or rtmp for short uh and what that does is it allows developers uh or the only input that it needs from Developers is a data flow diagram and based on that data
flow diagram uh we can in exchange give them uh about 80% of the threats that might be applicable uh to a DFD or a feature and that allows developers to then do their uh threat modeling in a repeatable uh in a reproducible way and we we can get consistent output out of that right uh and so this is an example of what a data flow diagram that a developer submits at snowlake um might look like um and then you you can start applying all kinds of like uh rtmp notations to it uh so the first one the first notation that you apply to it uh is the trust zones right so trust zones uh and the rules for applying trust
zones to all these elements is very simple uh anything that's untrusted or out on the open internet gets a trust score of zero any any node that is connected to a node that has a trust score of zero gets a trust score of one or any trusted node connected to zero gets score of one the data syncs are always a nine and everything in between gets a subjective trust score based on its position and importance so something that's closer to the data sync gets a higher trust score whereas something that might be closer to the open internet or an edge connected to it gets a lower trust score right and so here uh is an example of what that trust score
assignment might look like so as you can see a few different things here the nodes on on the far left has the trust score of zero because that's a browser uh coming from the open internet versus if you look at the the databases they have a trust score of nine because they are data syncs and everything in between gets a subjective uh trust cord according to it uh right and so once you have your trust zones um all that needs after that is you you you start applying your uh stride rules uh and the stride rules uh the five rules that are applicable here are also like more or less pretty simple and you can
consistently apply them to a threat model that you might have on your hand uh which is for elevation privilege uh you're looking at any data flow from a node in a lower trust Zone to a node in a higher trust Zone uh for spoofing and denial of service uh you're looking at any node that has a trust score of zero that is connected to a node that has a trust score of a non-zero trust score and in that case the destination node gets a stride or a spoofing threat an S and then the connecting Edge gets a d which is a Deni of service and then for tampering you have when the data flows from a lower trust Zone to a higher
trust Zone The Edge connecting them uh gets a tampering threat and then for rediation if there is an element or a node that has a spoofing threat and you have a tampering threat on the edge then the destination node gets a repudiation threat and for information disclosure if the data is Flowing from a higher trust Zone to a lower trust Zone it gets a information disclosure which makes sense because it's flowing from a higher trust to lower Trust and with that uh all of that done you can very easily produce a thread bottle right um so uh with all of that done uh with with our threat model uh figured out with with our uh dfds at least
figured out we get our data flow diagrams in kind of a a directed graph right and so we what we have are nodes and edges and all of them uh have at least a flow Direction connected to them and all of those nodes and edges also have property is attached to them um and with the use of these properties and with the use of these flow d uh these directions and we can at least ask them the simple question of what threats are applicable to this flow right and expand that to the whole diagram to say what threats are applicable to this feature uh given all these notes and edges um and then the other uh Choice uh here was
to to answer that question we we basically persisted this node and Edge data uh in the form of two tables as a graph which one was uh nodes and edges and they get stored with all of their properties and uh everything else and then you can use this this basically became these tables uh with all the nodes and edges and their properties and all that thing became the the basis on which we then went on to build uh whatever the abuse detection uh tool and uh questions that we had to answer but before I get to that uh just to quickly explain once we have these stored in tables uh it is very easy to then apply
just simple SQL queries to them uh to come up with all these uh stride related threats right so uh the examples at the bottom here so if you if you're looking at uh spoofing for example then as I explained you're looking at data flowing from a trust Zone zero to a destination that's non zero or one in this case um you're also looking at for tempering for example where test zone is uh where the source zone is less than the destination zone for repudiation you're looking at if a node has tampering and uh spoofing if a particular flow has that uh information disclosure if if it's flowing from a high TR Zone to a lower
TR Zone and you can boil these all of these down to SQL queries with where conditions and with with all the conditions attached to them uh to come up with a threat model for all these individual nodes and that gives you your threat model right and so this is to show that we have basically all of this data all of this risk related nodes and edges and gens and test cases uh We've gathered all this data for years now and so that's the part that made it easy for us to build uh what we built right and so at long last then uh we come to the part where we talk about the question that we
started with which is how do abuse threats affect my system right but now with the uh arm with you know all those tables with nodes and edges and arm with the directed graphs and all that stuff it becomes a slightly easier question to answer right so for the sake of you know this conversation uh will keep the abuse scenarios limited to compute and network uh abuse and a quick definition here a compute abuse is basically running malicious code on any compute node uh such as like a virtual Warehouse that we have uh Snowflake and then your network abuse is just running any code that can carry out Network attacks or facilitate like incoming attacks such as like a
reverse shell of some kind right so with that said uh what we built was retrofit modeling uh so we have our nodes we have our edges and we then had the problem was then boiled down to coming up with SQL queries that we can apply to these tables and come up with the threats right so for compute abuse for example we wanted to do uh find compute nodes where so so basically select all threat models so far that we have where nodes that can execute code right where an incoming Edge to the above node indicates code execution so you want to see that something can trigger code execution there and you want to see if
that's basically shared code by between people or you know different organizations and stuff um and then for compute abuse or network abuse you have select all thre models where a node can execute code and there is an outgoing Edge to the open internet or there's an incoming Edge from the open internet and both of those will have a trust zone of zero right but again the the problem here is then how do you define a node that can execute code or a node that can indicate code execution in SQL like how SQL doesn't understand plain English but what what does llms right so LM to the rescue because if we had to then take that problem and boil that down uh to
basically make it so that an llm can go through all the nodes and edges and everything else and answer that question for us like what notes can execute code which edges indicate code execution and things like that and we had to be careful about it you know do a lot of massaging with these these prompts because llms prod tend to produce like a value verbos output and then you you can't necessarily account for verbos output when you're trying to test for where I conditions and so the answer had to be boil down to a yes or no or true or false uh your preference um but so so this is what the quer you look like so
basically given the name of nodes answer only in yes or no and tell me uh if a node indicates code execution right and similarly for notes or edges that can uh show like if a code is shared on the marketplace or for network connections for example um so that boiling that down to a yes or no or turned out to be the key because then we could actively check in their conditions if if it's true or not all right so so that's more or less uh What uh all how we were able to utilize all that data to build uh A system that developers can use um and come up and look at the abuse threats
that might be applicable to their feature and then try to solve them or build test cases around them um and I'll I'll now hand it over to Rik here who will walk you through a developer's perspective on all of this [Music] hey everyone before I start can I get a show of hands like who's a pure software engineer over here say again like uh so can I get a show of hands to just to understand who's a pure software engineer over here cool okay seems like you you might be able to save me if I say bad stuff around all these people cool so uh cool so I'm going to add on to what rashab said from a developers
perspective and how all of this process helps me uh in right um so I'm working on a feature and seems like I have to do a Security review now um seems like there's no way around it I just might have to get it done um so first of all I have to revise what the threat modeling process looks like it's been a while I have done one and I'm not really sure what all I need to complete so I go through all of the docs and the training videos and I have identified my entry points my assets my trust zones and I've created the DFD that we talked about that was fairly simple but it involved a lot of work and
I wasn't really sure if I have labeled stuff properly of created like the right structure right the diagram is one thing but all the trust zones things seems a little bit complex to me but still I'm getting a hang of it and like let's let's proceed right so I go ahead I determine the threats now right stride model like what uh repudiation I haven't really use this word in any conversation forever in my life I don't know what this means and every few months I'm Sear SE ing what it means every single time right and then after a lot of searching for this stride model Gathering a bunch of references looking at Old threat models to
understand stuff here I am I have generated a few threats and I'm still not sure if I done the right thing but okay hey somebody's going to help me with it finally um and then I'm kind of getting done with it uh I want to really hopefully get this done by the day today so that I can proceed on uh coding coding some stuff that that I planned on so now I'm determined the mitigations I'm really struggling over here like I some of them are fairly straightforward so i' I've got the hang of it but then for for the most of it I don't really have a clue of how do I even mitigate all these problems right so I'm going to
need some help on this one as well and the last part which generally doesn't get addressed is the whole process of the threat modeling needs to be iterative we need to change it along the way when the design changes and get in touch with the security team to get consultations and it's like an iterative process but then once the original approval I have gotten is done I'm sure I'm not going to revisit this because of like other pressures and timelines and deadlines so and coming back to the whole review process I'm sure you might have heard some or all of these terms that developers say when you are interacting with them so it works on my
machine like how is it going to be any different when it's going to be product in the production environment or any other environment do I really need to worry about it maybe I don't even surface this to the to the security Engineers I trust the third party Library even though it's been worked on by a single person and the last commit was like 2 years old it it satisfies my uh functionality so I'm going to use it uh why do you need to review this it seems like a very small feature right I mean come on this lot of process for me to ship something right this is this is uh understandable I I don't really
understand how to fix this particular security issue but just because of the lack of familiarity with the domain I am kind of struggling should I address this should I surface this maybe it tends to get ignored and the best one we'll address this later right uh I really intend to address it later but then again because of other pressures timelines design changes I need to accommodate for there's always a lack of resources and PMs ask for more stuff so I want to finish all of that as well so maybe this gets overlooked as well so what is what are the real reasons behind all of this right uh the first one is lack of security knowledge uh it's just
developers are not really trained even in universities you're you're primarily focusing on getting uh your projects done nobody's focusing on how secure the project is right uh difficulty in thinking about mitigations right I mean you need to be really creative had have to have a knowledge of the uh space to understand mitigations and now there is this whole abuse threat scenario I need to worry about like how do I know I need to consider this and why am I not thinking about this right this primarily because I'm kind of focused on technical threats which are tackled by the uh traditional threat modeling processes like I'll I'll probably identify SQL injection all right and I don't have a
attacker perspective we are primarily trained on being like a having a defensive approach that's why we end up identifying SQL injections buffer overflows things like those but then really failed to understand how an attacker might misuse the system third one is that because I have to think about this whole new class of threats there's going to be a lot of time involved in thinking about it and it's going to be a lot more complex there's going to be this will always result in a lot more documentation which I find overwhelming to get stuff done sometimes and and probably tend to stay away from it now and then it's a generic lack of awareness and I'm not even sure if my
feature is going to be misused should I be even worrying about it like who's going to help me understand cases or point of views where it can be misused and the last one which is the concern that we have been trying to address is that the tools today the stride model the traditional threat modeling tools don't really are not really concerned or don't really address all of these new classes of threats that have emerged right it doesn't help me when I'm doing the threat modeling process doesn't prompt me doesn't guide me through this so let's look at let's look at how the uh the lrit threat modeling approach that we came up with helps us identify uh all of the abuse
threats and helps us fix this right how do I
all right so I'm going to look at all of my old thre models to identify which of them might have a abuse risk so I select the kind of abuse let say compute in this case and in the background what we are doing is processing all of the property graphs with a combination of directional flow analysis and NLP to generate candidates for abuse so in this case we have a threat model which contains a presence of a provider a consumer who's installing some app applications that the provider built and a and a compute node the tool has helped us identify uh in red the compute node that might be vulnerable for compute abuse it's like the virtual Warehouse in
this instance also on the right hand side the tool has generated a tool tip for us to understand what the threat actually means and has also provided us a different class of threats there since it's like a provider and a consumer scenario we would at least want to select the Crypt Miner as a use case that we need to address you click on show mitigations which shows us a possible list of mitigations and in this case we would want to implement all of them and then create girkins for them once we create the girkins the girkins can actually become test cases to be validated later which means the graph turns blue for us to indicate that the
issue has been resolved with that we can prove that using your existing security security data we can identify latent abuse cases suggest risks suggest ation and this guy's laughing at me and resolve the issues thank you with that we open up for
questions what was the most surprising
sorry what was the most what was the most surprising thing about this project uh um give it how daunting it looked in the beginning just how easy it turned out to be to answer like not to say that it was easy but it was easier than I thought it would be and then oh um yeah it's is not to say that the solving or coming up with this was easy but it was definitely easier given all the data that we had and then second is uh I don't think it's shown here but we were also able to correct some of the data flow diagrams because uh one of the theories we came up with was if if a node is susceptible
to like compute abuse then it also has to have like an elevational privilege threat so if if the tool identified something that was that had compute abuse and it did not have an elevation of privilege we like for I think we found three cases and 100% of them the the data flow diagram was wrong and had to be corrected for there to be an elevation of there so that correlation also was very surprising can also add one more Point here from again a different perspective no matter how much structure you try to add there's so many different ways uh the users of the system will tend to use it and that's how like we ended up
having llms trying to help us because there's so many ways different descriptions come up and that was surprising to find that use case and use it to kind of discover a new class of threats for us have you had to describe the data security model and any particular standards I'm not sure what the question
is example something from miter or nist we haven't had to deal with mitro Nest uh at least in the context of this exercise maybe in the future some something might show up yeah do you use the greekin scenarios only for the specifications or do you also use them for on reporting only for this specific use case could you please explain the RTM acronym used when you describe threat model uh it's rapid threat modeling prototyping uh and that's an approach that again can rapidly help you uh come up with thread models final question can you please repeat what stride stood for and thank you stride is s is spoofing T is tempering R is repudiation I is
information disclosure D is denial of service and E is elevation of privileges very very good gentlemen thank you so much you did an amazing job you did snowflake crowd uh this concludes uh this presentation thank you so much for being Troopers um right next door in 13 is going to be closing ceremonies from 6:00 to 6:30 I want to thank um Marco in the room he's our AV Specialist he's been my right hand all day taking care of audio and video so we appreciate him specifically all of our volunteers yes thank you so much