← All talks

A New Architecture for Data Security to Free Incident Responders from False Positives - Rob Quiros

BSides SATX40:0921 viewsPublished 2024-06Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
A New Architecture for Data Security to Free Incident Responders from False Positives - Rob Quiros 2024-06-08, 16:00–16:45, Track 1 (UC Conference Rm A) Incident detection today is based on protecting the conduit to protect the data inside. Lack of correlation between APIs and data has lead to high false positive rates that are inundating incident response teams. We present a new approach to data security incident detection and response that avoids correlation with the goal of maximizing the real incident to false positive ratio. This approach is particularly useful with APIs that feed RAG and fine tuning models in generative AI. If you’re not already be numb to the statistic: High false positive rates in today’s security tools are killing incident response teams. Many true positive alerts go unseen. Security job vacancies can’t be filled. OWASP Top10 authorization and access control failures often are not detected at all. Today’s tools share a common doctrine dating back to the first network firewall 30 years ago: Protect the conduit to protect the data it carries. But in today’s applications APIs are the conduit. Does a trustworthy correlation between API and data even exist? Over the last 4 years we architected and built a new solution for incident detection and response free from parameter interpretation, pattern matching, keyword searches, and other correlation-based techniques. In this talk we’ll detail the approach and the tradeoffs made to maximize detection of real incidents while optimizing for near-zero false positives.
Show transcript [en]

all right good afternoon folks thank you for joining me here at uh bsides joining well and thank you thank you for welcoming here to bides um my name is Rob Kos I've been in the it and security world for last 30 years or so companies like Cisco aamai riverbed um and most recent uh I was at a company called SOA systems and we pioneered the the sizzy space if you know about the secure access service edge what Gartner calls it but basically application security one of the things I realized there was we're protecting applications with a lot of the tools that we have um but our goal is to protect the data so I want to talk

a bit about what it means to be protecting data um and one of the things that if we could just take a moment just think about the nature of data you know it's the thing that we value our customers value our business's value and yet as we try to secure it and protect it um we have to think about just the fact that it's a bunch of ones and zeros right it's ephemeral it's data singular is the same as data plural and my data and your data and sensitive data that's all just a bunch of Wis and zeros so in our quest to protect this data how do we do it you know we think

about a data Lake you know we talk about data warehouses and data Lakes but here you just have this big pool of data and you dip your hand into it you look at it and what do you got it's indistinguishable from the rest and yet this is what we have to protect we have to make sure that this handful of data doesn't go to the wrong person or go to an attacker right so how do we do that we have to do it in a deterministic way we can't just guess and this is where I believe most of the false positive problems that we're having within our uh within our applications and the security controls that we use are coming from

because we can't deterministically say this is my data or yours and who should get it so you all know what deterministic means if a then B if Bob is able to access the data then okay give him access to the data the funny thing is that we only have that on one state of three states of Digital Data data data in rest right and so how do we get that determinism from this data that we have stored on in our file systems in our S3 buckets and database Etc how do we get that and apply it on data in use and data in motion so that's really the majority of what I want to talk about today and also

to look at different ways of doing that so let's look at a typical microservices application so you've got multiple services so you've got a front-end service that authenticates the user you've got a backend service that actually goes and reads that data from a storage system so this is typically what we're doing right with our storage with our uh with our storage controls you can think about you know partitioning the network and and things like just service to service access controls uh c apps Cloud native application protection platforms and service meshes they're all doing the the basically the same thing the front end knows who the user is makes requests on behalf of that user to the back end so what do we pass in

between the service count of the front end to the back end the back end says okay I'm authorized to talk to the front end makes the request to the storage reads an object sends the permission or sends the data back to the front end but didn't send the permissions that go along with the data and the front end didn't send the user credentials up to the back end so what do we have we have no way to determine if that user is allowed to get access to that [Music] data and that's a real problem I mean we typically would in the computer science world will call this the confused deputy problem we have intermediaries that are

acting on behalf of the user and the data but they don't actually know uh they don't actually have the the the set of credentials and uh permissions to be able to actually do what they're supposed to do so something we should fix right but we don't we don't in the name of agility because we've built our microservices independently we have isolated teams working uh without having to communicate to other teams or we can pull these services off of GitHub and and deploy them pretty seamlessly into our applications and we don't actually know what data they're going to handle until it goes into production so we could fix that but then we're back to the monolithic world that we were in before

because then every service has to be coordinated with every other service to either pass the credentials or somehow be able to evaluate the policy and the credentials of the the entities that are getting past the data that they that they move so we use all of these firewalls and controls that we think are deterministic to protect data when in fact we're not actually doing that we're failing because we're not protecting the data the and I'll talk about this in a second the correlation between what we are protecting and the data itself is tenuous at best so if you go to the OAS Group website of course you'll be able to pull down their top 10

and see that the top problems that we have are broken access and broken authorization of data bad people are getting access to data they shouldn't so basically this is how things are working today our zero trust network access we're looking at API parameters we're looking at HTTP headers and we're trying to figure out oh okay what is this and should I be able to give this person this user this entity access to an API but that contains no information at all about what's inside that API what what that API carries so we can apply DLP and there's a lot of companies that are taking uh you know basic firewall type controls and adding DLP on top of it which is

good but the DLP is telling us what that data looks like it doesn't tell us if it's yours or if it's mine or if it's somebody else's right so it says it looks kind of like it's it's money maybe it is maybe it's not but we'll classify it as sensitive but it's not an access control right it's not going to keep you know an attacker from getting access to everybody's data within an applic great for compliance and data governance however but if we if we go and start asking ourselves you know is there any real correlation between these API parameters in our Cloud applications and the data that they carry can anybody answer that no we don't have an idea so it's

basically our expectations for what the data is it will carry that we're putting controls on and our expectations well often wrong maybe but statistically right it's basically no longer a deterministic process it's statistical and so our controls aren't related to data they're not deterministic oops um we're not seeing a correlation between them and so we end up with all these problems and it's interesting that you know companies just you know you've heard of Defense in depth right and we just keep layering on more controls to try to act as back stops between things that we can't detect deterministically so this is an article from uh earlier this year uh so $2.6 million on average that

companies are spending on 11 different API security tools and every one of those is spewing out false positives or spewing out alerts because if you're a vendor you want to throw out alerts even if they're not all that relevant why because you can't just have a screen that doesn't show anything if you're a vendor so unfortunately we get into the situation where almost half of the the alerts that we are false positives so we got to fix this so what if you could directly control access to the data in motion I mean just look at the bits that we have that are flowing in apis and to be able to identify what those those bits were where they came

from who the owner is what are their permissions and then be able to make a decision solely based on the identity of those bits now what could we do with that so our attackers are still going to be stealing credentials we can't fix that attackers are still going to be stealing credentials we can't fix that problem but we can fix a lot of other problems right if an attacker is able to get access to a server that backend server and force it to do things on the user's behalf well it's going to pass dat data either via privileg privilege escalation or a SQL injection attack or other sorts of attacks be able to pass data to a user that shouldn't be getting

that data so we can detect that we can detect certain misconfigurations we can detect broken software bugs that pass data that shouldn't there's a plethora of things that that we can directly control based upon this idea of being able to identify what that data is and there's a new problem that's come up if we think of generative AI is anybody familiar with the term rag retrieval augmented generation so this is this is something that a lot of Enterprises and companies are adopting as a way to store their Enterprise data in a format in a vector data base that can be used to uh add information to a user's question so I could ask you know analyze the company's

financials for the last three quarters and project what our sales are going to be for the next quarter and it would pull that information directly out of the rack problem is all of those vectors are stored as chunks they're not objects they're chunks and if you think about um the problem of connecting chunks to permissions it's a completely different world so I'll talk about that in a second but let's talk let's look at how we can fix this how we could identify this data that's flowing in uh in apis data in motion or data in use well edrm or Enterprise digital Rights Management is a technology was developed know decade ago or so and it's

basically um cryptography we encrypt the objects that we put into storage systems and then we have a central manager that determines whether it could pass out a key to a service or to to a user to be able to decrypt that and use that data so it's relatively straightforward sounds good but then if we go from the world where we're dealing with objects to the world where we're dealing with just chunks of data from objects we get into a whole lot of other problems do you put the same permissions on a chunk that you do on the object H if you have the chunk and you want to pass it on to another service well in this world you

should encrypt it right so how do you do that you go ask the the central manager for the private key so you can encrypt it well that is compromising security unit of itself so you pass the chunk up to the central manager have it encrypted send it back and then you pass it along and the next service does essentially the same thing down the chain and who wants to take the performance hit for that nobody so this good idea but isolated to specific use cases where we can really only uh we really only need to look at the the uh the objects themselves um thought S I was just saying the messy picture that we would end up with Google

came up with a different approach and this is what I would call the the most technically correct way to solve the problem um you can go and look up this paper the the uh slides will be available to you U to you after the uh the event here but basically what it is is well let's not be stupid about not passing a credentials of the users in the apis and let's not be stupid about passing permissions let's do both we'll send them credentials down and we'll send the permissions up and every place where we can evaluate whether or not these entities are authorized The Entity is authorized to access the data we'll do it and they use a central

authorization system called zanbar um and you can read about this one as well but the interesting thing about this is that we've solved the problem about that encryption we don't have to send the data anywhere we have the same permissions and we can move them along with the data even as we chunk it or transform it right one little problem with this you've got to change all of your services in order to pass the permissions and if you're Google that's not a problem if you're not Google then it is a big problem because your services aren't ident aren't built to do that so there's another problem though when it comes to dealing with chunks and I alluded to

this earlier if you have a chunk that comes well when you're dealing with chunks basically if you think about it from the perspective of data D duplication and here you're all familiar with data ddop yeah okay so in data ddop we use content to find chunking we find common chunks of data that come from many different places and we'd store them once instead of storing them for every object they occur in or we send it across the wire just once so I worked at riverbed we were a w optimization company doing data D duplication we got 99% of the btes off the wire with data D dup and in storage they can get 80 to

90% so what does that say about the number of duplicate chunks that are flowing around in Enterprises it's a huge huge amount so this ends up being a real problem what if that blue chunk that came from those two different objects has permissions that conflict with each other which one do you pick that's the problem with just straight passing the permissions so in this world where we have to deal with chunks we have to deal with this problem specifically um and figure out a way to solve it so what um what I looked at for a long time thinking about this problem as I was at riverbed as I went to SOA and then to

aamai was how can we think about reverse D duplication how can we think about looking at the chunks and then figuring out where they came from because really that's the problem that you have to do if it or you have to solve um to determine well is this a unique chunk that came from just one object is it a chunk that belongs to multiple objects or is it noise like the company logo that sits on every PDF in the company or the boiler plate on every contract right and so in order to do that we have to we have to be able to inspect all of these different objects that might exist to be able to figure

that out and that's a difficult problem so instead of trying to boil the ocean and solve that completely we can constrain the uh we can constrain the problem to just the data that we see moving within a given time window so if we think about it the way that an application works let's let's think about the time around an API call how long from the time that that API call is made say it's a get operation how long after that API call would a read of an object happen and how long after a put operation would the right happen and if we can expand that just a bit to try to cover those operations now we can focus ourselves

just on the objects that are being read and written within that given time window and over time we can then build up our knowledge about the different sets of objects and chunks that exist within the environment so we don't have to borrow the ocean from the start we can look at it on an incremental basis we can grow this knowledge of what's going on in the environment incrementally um and when we get to the point where we have enough information about where these chunks come from and we can start distinguishing well are they junk are they unque or not then we can throw it all in a big graph database and come up with something that looks like this and

basically this is matching chunks that are in common between apis and objects and the chunks of the little blue dots and the apis or the larger kind of light blue dots objects or the green dots over there and then we have the users that called the apis and the and the services that that moved those objects so this graph can be built basically by looking at log events and having uh in this case a single plugin to an API Gateway that was able to see the traffic on the front end but this init of itself isn't enough to be able to give a complete picture of what's going on we can certainly connect the the chunks to objects we can figure

figure out if those chunks are in multiple objects we can figure out just statistically oh we've seen this in 80% of the object so it's probably just noise we don't we shouldn't use that um but this isn't quite enough for us to be able to do the authorization on the reason is we don't have any rational way of connecting these events together that's really what we need to do I'll say the the front end user makes or the user makes the request to the front end well that's going to be an event the front end makes the request to the back end that's another event and their backend makes the the request to the storage to

get the data we have to connect all the three of those together in an environment where we have really no idea what uh what the connection might be so if you think about um open Telemetry open tracing have you heard of these these Technologies good so open open tracing is basically the idea you put a trace ID in have your API call and you copy that to any API calls that you make on behalf of the first and so you can get a whole path of uh of you can get the sequence of events that happened so what if we do that with a data we'll just look at the data we'll think about that as a trace

ID and we'll follow that data as it moves in these API calls and connect them together in a in a Time ser series and that will give us uh kind of the overall picture of what happened I said kind of um because we have to eliminate things like impossible paths right this read of the object happened after the user got the data well okay so we can toss that one out right um we also have to be able to um make some level of interpolation about events that we don't see right so we saw the read from the from the storage we saw the chunk go to the user but somehow there was something that

happened in between that we didn't see so we need to give some parameters around you know how much uh how much information we don't have to be able to then uh determine if we have a connection between those two services and ones that we don't see obviously you can put more sensors in and try to see everything which would be great you put it on uh you know a plugin into an Envoy proxy on a service mesh and you have full visibility of every event that's happening but it isn't strictly necessary so I built this by the way um and doing all of this time series analysis we can get that's difficult to see there but we can find all of these

authorization violations based upon the data that's moving in the apis put them together in the time series and come up with a deterministic detection of authorization failures that's happening within the system in this example user Bob gets seven chunks of data out of about 10,000 from this nuco do/ eroid 116 and we were able to determine based upon that object name and doing a lookup into a database what the permissions and ownership are on that object and then apply them to the chunks that we're moving to Bob and we found out what this is Amy's data going to Bob and we can flag that and like I was saying before this is deterministic right we can look at this

strictly from an access control perspective and not from something like a intrusion DET intrusion detection or or any anomalies uh that are happening within the system and we can do this without having to change any code because we're just inspecting payloads and apis and looking at log events so when it comes to false positives I can't say there's no false positives but I can say that within the probability of the hash Collision in the given environment will have no false positives and the hash Collision probability is much much much less than one this turns out to be something that's uh at least in the uh the current prototype very successful at being able to track and identify these sorts of

failures but there's a there's a big side benefit to it because all of that tracing of the data that we did through the application as we saw all of the events and piece them together allow us to build a complete picture of where did the data come from how did it move when it was first stored who went and got it later on and how did It ultimately end up at Bob and so here you see the initial uh right Amy puts to this web dab API into uh the service nuon next Cloud it goes into cach or memory and then the service account writes it put three to put object NCO dos in an S3 bucket and

invites it to uh in this case number four put for it goes to earn oid 89 and then we can see it's retrieved by new code disrupt following the five and six and this guy does something to the data the fat line means it's a lot of data The Thin Line means it's a little bit so he took a bunch of of text that was in this PDF and it wrote it to an extracted text file over here and put it in a shared bucket and this is not necessarily real attack but you can imagine if an attacker was to do this um you could exfiltrate it from the shared bucket that everybody has access

to and it would never be detected because you're if you're not following the data and you're not looking at the fact that here's a subset of data that came from this larger object that belonged to to Amy you won't see this and that's where Bob gets it from he does a this post apps Tech session and his actually his post here isn't a get request doesn't pull down the data he does a preview of the data and so this service nextcloud actually just reformatted the data as well put it in a Json structure but it was the same text so we could see it but all of this isn't possible if you're looking at it from if you're

looking at data from an object perspective right if you're looking at it from Just tracking hashes of objects moving from one place to another you'd never see this even if you're looking at any of the hashes on API payloads as that data is transformed by apis in this case like I said putting it from like just a string of text into a Json structure well you wouldn't see it there either so we have to think about treating chunked data differently than uh than we do object and like I was saying before as we're heading into the the world of generative AI this isn't the exception anymore this is this is going to be the norm at least

in my opinion what we're going to see is a significant shift in the way that we're building applications going forward where the applications are fundamentally going to leverage generative AI at their core for example do you want to have somebody check a bunch of boxes in a configuration or do you want them to just say hey I've got an S3 bucket that contains data that belongs to finance go find it I'd much rather have that in my application right so it's things like that I think and also in terms of the analysis of the data that um that we're going to see uh coming in our in our next gener generation of application architectures so this is really the

unnecessary way to be able to skure how that data moves so um that's kind of it for the talk I would be happy to take any questions and uh get your thoughts yeah if thre is to get access to that application be a to topology who has access to it seems beator yeah so the question is if a threat actor gets access to this application which is trying to secure your application then it's a gold mine of data for them well if it's a gold mine of data for you as the security engineer and security team then yes it is as well to them so uh it does need to have significant protections around it

in this case it was designed to run in your own application environment with your protections around it so no data is actually getting exfiltrated um and you can go and look at the results directly within your your environment apart from that and putting all of the levels of protection that you would on any software yes it does have vulnerabilities that we can't uh we can't count for all of them anyone else yeah your model where you have different um and you know leading out that seems like application for artificial intelligence anyway because you're learning before you par right so not about yes that's actually a great question so if we go back to uh this picture right yes yeah why don't

we just turn an AI onto this picture and let it figure out where the problems are well a you want to do that to be able to say how does this application work from a high level how does it move data how do that data transformed and what overall is going on within this application that actually opens up a lot of opportunities for doing things like debugging the application in production across all of the services rather than having to debug service by service think about like blackbox testing where you're just going of probing the inputs and outputs of services the problem though from a security perspective is what's the opposite of deterministic AI you know hallucinations and um just you

you type in the same prompt twice you get different data you get different results so there's a lot of Hope EXC just a [Music] second there's a lot of Hope in the ability of AI to improve our security to improve the way that we can do data classification DP and uh do detection of problems within the applications and all of that is great but it's not deterministic and so if what we need is is a way to to do access control specifically and if we're looking to bring down the levels of false positives that we have those neter non-deterministic approaches are going to be an issue

any other questions well we have 10 minutes so I can show you a little bit of a demo if you'd like [Music]

awesome so basically this is the screen that I was showing you before I showed you a screenshot of this so this is a um this is a canned demo that we have it's deployed into an AWS account it has the oops you're not seeing this so is there a way that I can show this oh I need to mirror the screens

displays range no extension to the display

ex I which one are you trying to uh it's [Music] the is the browser window here [Music]

uh right shift yeah I'm just trying to click it to drag drag it off to the EXT oh okay I see I got you I think we can do it from here

there we go all right there we

go yeah like I say before this application is running inside of an ad account right now and we're just looking at the the uh the management portal um and here is basically the the list of violations that it's finding and I will have to admit some guilt to what I was talking about before we're flagging a lot of violations it may not be a big problem um but here a lot of these service accounts getting getting uh data from documents is because there's no explicit permission that says that that service account should be able to read Amy's data but we can fix that um pretty simply if you wanted to to click on for example this actually

doesn't isn't uh fully coded at this point but you could click on this and say add a policy acception and say nope that's okay and then we keep track of that policy um for these other ones like I was saying before it's it's all a script that's running selenium script that's running and uploading data downloading data and such and when we want to do the analysis of a particular failure like we had before um we can just find let's find a good one like that one and then we can get the the call graph of all of the events that led up to this and in this case we have no data because it might have shut down in the

back end um there's an automatic undeploy function so I apologize for that but you were able to see it uh see it before there's another thing that we can do and this one probably won't work either because it needs to pull the data but in this case uh we basically use AI in the way that we were talking about a moment ago to analyze the graph and say how did that data move and what is the likely source of the problem and it's in this case you could if you looked at it and you say well wait a minute this is the service that extracted data and wrote it someplace else that's probably the problem so we can use AI to do that as

well and in this case um I don't think it's going to

run but we'll give it a shot and we can have ai Analyze That graph and figure out where we should put a policy to remediate the issue so we can say oh well one thing we can do is don't allow that service nucco disrupt to read that particular object or we could say don't allow it to read any object that belongs to Amy and then we could push out damn it worked we can push out a policy if we want uh to go and fix it in this case we get the summary back in the explanation and here's a new bucket policy that's based upon recognizing that um that the objects that we want to

control access to are tagged and the system can tag those objects as well so that's it for the demo any questions on that particularly or is it time to go and join the party all right well thank you very much for your time and I really was a pleasure talking with you thank you