Data Splicing Attacks: Breaking Enterprise DLP from the Inside Out

Name: Data Splicing Attacks: Breaking Enterprise DLP from the Inside Out
Uploaded: 2025-10-30
Duration: 44 min 3 s
Description: Researchers demonstrate data splicing attacks that bypass major DLP solutions by exploiting architectural flaws in endpoint and proxy-based systems. The talk covers five techniques—including data sharding, encryption, encoding, and alternate channels—and releases Angry Magpie, an open-source toolkit

BSidesSF · 202544:03117 viewsPublished 2025-10Watch on YouTube ↗

Speakers

Vivek Ramachandran Audrey Adeline

Tags

CategoryTechnical

TeamRed

ResearchTechnical Deep-dives

StyleDemo Talk

Mentioned in this talk

Service

ChatGPT Claude

About this talk

Researchers demonstrate data splicing attacks that bypass major DLP solutions by exploiting architectural flaws in endpoint and proxy-based systems. The talk covers five techniques—including data sharding, encryption, encoding, and alternate channels—and releases Angry Magpie, an open-source toolkit for red teams to replicate these attacks.

Show original YouTube description

Data Splicing Attacks: Breaking Enterprise DLP from the Inside Out Vivek Ramachandran, Audrey Adeline We uncovered a data exfiltration technique, capable of bypassing all major DLP vendors listed by Gartner. We will dissect the architectural flaws in endpoint and proxy-based DLP, showcase live bypass demos, and launch Angry Magpie, an open-source toolkit for red teams to replicate these attacks. https://bsidessf2025.sched.com/event/bb5e71bf516d4e546d4de8bd21d2ed85

Show transcript [en]

Well, welcome everyone to Besides SF 2025. You find yourself in theater 6. We would love to introduce our next presenters. Uh we're going to be talking about data splicing attacks and here to tell us all about that Aribec Ramchan and Audrey Adeline.

Okay. All right. Much better. Hi everyone. Today we're going to be talking about a new class of attacks that we discovered called data splicing attacks and have some fun breaking enterprise DLP. So just a quick introduction. My name is Audrey. I'm a security researcher at Square X. uh there I read a lead a project called year of browser bugs. So it's really a tribute to um the earlier month of bugs project in the mid and early 2000s. Um and since at the start of this year we've been releasing a couple of architectural vulnerabilities in the browser and attacks that leverage them. So we'll continue to do so as well. Um the other thing is we've also published

a book called the browser security field manual. So we actually have a couple of copies on hand and my colleague Sasha do you want to wave? Um we have a couple of interesting questions so if anyone um can answer them we'll give you a free copy as well. Um and prior to that I spent a couple of years at Sequoia Capital where I was advising on cyber security investments and um I actually came from quite an unconventional background. So a quick fun fact on why we call it data splicing is is because it's a step that removes noise from your mRNAs uh before it can be read meaningfully and translated into proteins. Um, so that kind of reflects

my biochemistry background and I hope you'll see why we named it that throughout the talk. So someone else who's really instrumental to the data splicing research is my colleague Jesswine. He's the chief architect at Skorex. Um, but unfortunately due to some last minute issues he was unable to join us today. Um, but he did record a quick video to say hi to all of you. So just go ahead and play that. Hello everyone. This is Justin. I'm the chief architect at Square and in this session we'll take a look at some very interesting techniques on enterprise DLP solution. Unfortunately due to some last minute visa issue I could not make it to present there but regardless you are in

the best hands and I hope everyone will enjoy the session. Hello. All right. I also like to uh give a shout out to our research team who without this um without their help this research wouldn't have been possible. So huge shout out to Dakshita Pankach Arpit and Teswar as well. And then uh before we get into the meat of it, I'd like to introduce our founder of um he can introduce himself as well. I'm the manager who funded the project and the travel and accommodation. That's why you see me here. So I'm Vive. Been in cyber security 20 25 years. Started on the offensive side, you know, founded a bunch of companies, pentested academy. Now I'm doing Square X. Uh spoken at

Defcon Blackhat I think 20 25 times. The key thing of course which I've loved doing and the rest of the team at Square X as well is spending a lot of time looking at bleeding edge security and one of the main things of interest to us recently has been enterprise security products. So over the past 2 years we've spent a lot of time breaking enterprise security. Uh you know we went ahead put out research on secure web gateways Google's MV3 architecture much of it presented at hacker conferences around the world. Now carrying on that tradition this year we wanted to take a look at DLP and specifically in browser DLP. The reason why DLP is so

interesting is almost every organization has DLP incorporated in some shape or form and you have endpoint DLP and you have cloud DLP and what we've started seeing is given that the browser is the used most used application whether existing DLP solutions are able to cover the browser properly and that was the whole thing which led to the research we ended up looking at a lot of enterprise solutions trying to see if attackers insider threats All of that could subvert DLP at this point in time using the browser. Right? So I've been told that as security people like structure. So quickly running through the agenda for today, we've already done the introductions. Um, but before we go into

the demos and the meat of the conversation here, I thought it would be helpful for us to go over a quick recap on how enterprise DLP work, how their architectures are built and what are some of the fundamental uh limitations that we're exploiting here. And then we're going to move on and show you five techniques within data splicing and how we can use them to bypass DLP solutions. And we're also going to be releasing an open- source toolkit called Angry MacPie. So don't worry about taking any videos. we will make sure that you guys have access to all the kind of attacks that we're releasing today. Um and last but not least, we'll save up some time

for key takeaways and questions as well. So let's quickly look at um enterprise DLP. What do we really mean by enterprise DLP here? Primarily, there are two DLP solutions when it comes to enterprise user workflow. So the first is endpoint DLP and the second is sassy SSSE proxy DLP. Now both of them primarily identify sensitive data in two ways. The first is they look at tags, you know, files that are tagged as sensitive. And the second is they use some policies that use regular expressions. And we'll go into a little bit on how rex works later. Um, but in terms of where they sit, endpoint DLP typically sits at the interface between where data enters and leaves the device.

So think when you fi when you upload or download files to and from the browser while sassy and sec proxy DLP inspect network traffic to see if there's any sort of abnormal requests or sensitive data that's being requested by sites that are unauthorized and to some extent some of the more advanced one can also check for unauthorized users as well. But if you take a deeper look at how these endpoint DLP solutions work um they don't actually have any direct visibility into the browser. Most of them rely on uh browser APIs that are opened up by vendors like Chrome. So for example, the clipboard API allows them to inspect the clipboard in the browser and the contents. Similarly, the file

upload and download API allows them to inspect data in files that are uploaded and downloaded. Um so essentially if you think about it, they don't actually have any real context on the user workflow and the user workflow is really a blackbox for these endpoint DLP solutions. So they don't have the ability to differentiate between identities. So let's say if you are downloading a file from Gmail, they have no idea if it's your personal account or your work account. And similarly, they have no contacts on page content, network requests, and can protect on specific data leakage vectors in the browser such as things like info sealer, browser extensions, credential leakages, and so on. And I think the key thing to

highlight here is that not all uh endpoint DLP solutions actually have access to these APIs. Only if you're whitelisted by Chrome can you get access to these. So, does anyone have a guess on how many vendors out there are actually whitel listed and have access to these APIs? Anyone? You get a free book if you do. 17. Any other guesses? It's lower than that. So, it was five, 17, and less than 10. Two. That's right. Um, so today there's only two vendors to our knowledge that have access to these APIs. So Sasha can share with this lady one of the books. Um, so the only vendors that have it is actually Trellix and Semantic who's acquired by Broadcom.

So that's kind of insane, right? Imagine like out of all the vendors out there, only two have any form of visibility into the browser. And even if they do, it's very very limited APIs. Um, so just for the fun of it, all the demos that we show you guys today for Endpoint DLP will actually go ahead and use Chrome Enterprise DLP instead. The reason is because as Chrome they have the fullest set of APIs, right? And I hope this will convince you that even if one day they open up all the um APIs um the data splicing attacks that we show you guys today will still work. So what about Sassy and SSC proxy DLP solutions? Um they work by

inspecting network traffic at the proxy layer. So without direct access to rich browser metrics similar to endpoint DLP, they don't have any understanding on what web app the users are using and how they're interacting with it. and to inspect any data. The way it works, it it actually has to be sent to the provider's cloud. And as many of you know, this leads to a lot of latency issues. So the uh the way that a lot of Sassy and SSC solutions solve this is they would limit the size of the files. I believe most of the large vendors and this is all public information only scan files up to 120 megabyte. So anything above that, it's either a blanket block

or a blanket allow. And additionally, there's also a myriad um kind of data leakage channels that they don't inspect. So think your encrypted files, um your binary channels, browser extensions, thirdparty app APIs, and so on. And even if they're able to detect this data, oftent times they don't have access to things like DOM changes, tab context, or user interaction to figure out exactly how the data was lost and prevent similar attacks in the future. So I think the key takeaway from these two slides is that neither endpoint or proxy DLP have direct visibility into what employees are doing in the browser. So to understand why there's a bit of a mismatch between existing DLP solutions

and um you know uh the the DLP protection that we need we need to look at how the way we've worked change in the past two decades. So thanks to the proliferation of clouds 80% of enterprise data is now stored on the cloud and in the past decade another thing that has happened is we've seen a lot of browser technologies like WM and web RTC being developed and this is probably largely thanks to Chromium Beam open source and the result of this is the web app experience has improved significantly a lot of them has now match matching experience to native apps and as a result more and more enterprise workflow is now moving to SAS apps in

other words the bulk of enterprise data today is actually being stored, created, shared, and accessed through the browser um without ever even involving the endpoint. So, quick question. Is there anyone in the crowd there's a student or a recent graduate? I know there's some friends from Stanford here. Um but you know, I was speaking to a lot of my younger cousins who recently graduated and most of them don't even use a single local app for their work, right? They spend the majority of their working hours on apps like Google Docs, Figma, Salesforce which are all SAS apps and even traditionally native apps like Photoshop or Office 365 suite now have a SAS version as well. And similarly we

rely really heavily on services like Google Drive and one drive to store and share files. So the number of files that we are downloading nowadays are becoming much and much lesser. Um despite this most of the DLP solutions that we saw earlier are really focused on you know um the endpoint and the proxy layer still and it doesn't not take into account data leakage in the browser. So if there's such an obvious behavioral trend you know why do you think incumbents haven't developed stronger capabilities in the browser and the truth is it's really not easy. There's a lot of challenges that's unique to browser DLP. So for example there's a lot of data access paths they

have to think about. It's not just clipboard. is not just file upload and download that are regularly included in DLP but also data access paths unique to the browser. So these are things like SAS app data integrations, how different apps are sending datas to each other, binary channels, browser extensions and so on. And the other really unique thing about the browser, it's probably the only app that's being used for both personal and work use cases. So if you think about let's say your Gmail or your Google Drive account, most of you probably have a work account and also personal account. So it's really hard to implement certain policies where you have to think about balancing security

and privacy especially as more laws like GDPR come up. And in addition when it comes to you know your typical device a lot of companies just have managed device where they have full control in whatever apps they want to control. Um but when it comes to the browser you know pretty much everyone with an email or credit card can sign up to a SAS app in five minutes. And the last point isn't really something that's super unique to the browser, but it is an issue with a lot of data loss uh DLP providers. So I just wanted to highlight it and this is the point around data complexity. Right? There's a lot of data types that you have to take into account

too. So when it comes to files, we already saw the limitations of SASSC when it comes to different file types, sizes, and executables. And when it comes to memory buffers, not only do you have to inspect system clipboard, you now have to take a look at how it interacts with the browser clipboard as well. And again the definition of sensitive data varies widely between companies, jurisdictions, industries and so on. And of course there's encryption technologies that you have to take into account as well. All right. So now on to the most interesting part. So as I mentioned today we'll be covering five data splicing techniques. Um but before this I think it's really important for

us to understand what are we really exploiting this. So I'll invite VC back up to share that. So what we are assuming for this talk is you have endpoint DLP which is connecting via browser APIs and as she mentioned we're going to be using Chrome Enterprise because that has the fullest set of you know most rich APIs at this point in time and then we are also going to layer that up with Sassy SEC secure web gateways because most enterprises tend to use that as a way to actually look at files and you know make sure DLP is in good order. Now what are the architectural vulnerabilities? So let's actually start with sassy secure web

gateways. The way a DLP policy triggers and if we just pick up files is a secure web gateway is looking at file headers in network traffic and the moment it detects that a file is being uploaded. It immediately caches that file in the cloud goes ahead applies whatever DLP policies that you have put out on that file. Now, interestingly, if in some way an attacker can make sure that the solution can't even detect a file upload, what that means is nothing ever triggers. Now, this has been happening in two different ways. One is attackers and the other is UI frameworks. Interestingly, most modern UIF frameworks because you want to be super responsive end up using binary protocols

like websockets, gRPC, WebRTC and whatnot. And these are literally just binary channels which is there's no real way of knowing unless you are the application author as to is a file being sent you know is this audio data transcoded or what's really going on in between and this is really where more and more applications automatically are starting to bypass DLP. Anyway the second part of course is attackers are taking a look at this and they've been abusing this a lot as well. So I'll give you examples. How many of you use WhatsApp? Web WhatsApp Telegram. If you upload a file via web WhatsApp or Telegram that actually just uses websockets. So you should be able to

send in a credit card file or a P2 information file right through the Sassy SSC proxy. So that is one piece which will talk about a little later. But coming back to the first piece Chrome Enterprise, our perspective is that this is still very early. So if you are the browser and a file is being uploaded ideally you want to make sure that every path of that file getting you know uploaded processed you are able to monitor that because if you don't then an attacker can insert himself inject himself somewhere in between and make changes and this is exactly what DLP splicing is when it comes to endpoint browserbased API bypasses and then later we'll talk about secure web gateways.

All right, thanks VC. So um the first technique that uh we're going to uncover is what VC mentioned earlier on these communication channels. So how many of you guys have heard of WebRTC, Firebased cloud messaging here? Anyone? All right. So um these u binary channels um they're used for a lot of communication protocols. So for example, WebRTC is being used for P2P communication. So think about your audio and video conferencing apps and then firebased cloud messaging as um you know stated in its name is used for a lot of notifications for web apps and web transport you know low latency data exchanges for a lot of your games. Um so this is a bit of a catchall attack

because as VC briefly alluded to these binary channels are completely uninspected by both Sassy SECDLP solutions or endpoint DLP solutions. Um so the attacker can pretty much smuggle any data they want through these channels without any inspection. And the reason this has happened is because a lot of these communication protocols are only invented in the early 20ou 2010s to mid 2010s. Um and that was way after a lot of the DLP solutions has been built and truth to be told the adoption took some time. So it was only in the past 5 to 10 years that you started to see it widely being used. So I think just a quick evidence on how Sassy and SEC don't cover this. A lot of

the Sassy and SEC solutions actually state publicly on their website that this is something they don't cover. So a lot of the complex protocols um as you can see in Zcaler's website what they have said here is you know we can't inspect quick and the best practice is to just block it. And similarly there's an example for pal also networks uh for web RTC and of course with every for every security practitioner you know the safest way to be safe is just to block everything but as you know um this is not realistic especially as most modern um tools like video conferencing tools apps rely on a lot of these protocols to work properly. So first um let's take a look

on how sassy and sec solutions work. So we've created a sample application here which we'll also make open source later on where you could upload files in different ways. So right now we're just doing a normal file upload without the Sassy and SSC solution on here. It contains some credit card details. Um and given we have no policies on it was allowed. But now we'll go ahead and turn on the Sassy SSC DLP solution. And we'll we'll see that um they're going to go ahead and detect that there's a credit card number and the file upload would be blocked.

All right. So, how does it work with um these data smuggling uh via alternate communication channels? So, as Vivec mentioned earlier, the way that Sassy and SSC figure out that a file upload is being triggered is they look at HTTP headers and other heristics to figure out, okay, there's a file being uploaded. Let's go ahead and cache it and send it to our cloud and compare it to a bunch of DLP uh policies. If one of them hits then we'll go ahead and block the download and if not we'll allow it. So here is an example where we've used websockets. So instead of doing the normal file upload we'll go ahead and upload it through websocket. And as you

can see it completely bypassed as sec solutions and the attacker can view it on their servers. Now the next technique is called data sharding. Before we go into what it is, we should quickly take a look at how regular expression works. So a lot of enterprise DLP solutions as we mentioned earlier rely on reix to figure out PIIs like social security numbers, phone numbers, credit card numbers. So this is actually uh reix for social security numbers. Um it basically says that okay you have a sequence of nine numbers that comes in groups of three followed by two followed by four and neither of these groups should have you know all zeros and for the first group

you want to make sure it's not triple sixes or any number between 900 and 999. And as long as um all of these requirements are met, it is a it's a social secure number and we should go ahead and block the upload of this file. Now, so what we're doing in data sharding is basically because we know that there's a specific sequence that we're looking at. Then why don't we go ahead and break the file into small shards? And uh the key thing here is you want to make break it to such a small shard that um it's much smaller than the reax detection sequence such that the individual shard will be able to bypass

Sassy SSC solution because they're not long enough to trigger any reax rules and then once um they bypass Sassen SSE the attacker can then go ahead and reconstruct the file on the server side. So here's a quick demo on how it looks like. So this is the same file as earlier. Um we'll go ahead and uh shard the data into smaller sizes. Um and this helper uh this application will allow you to decide on the size. And again the key thing is as long as the size is smaller than the reax you're trying to evade um then it should be able to bypass the policy. So here we've done uh 100 bytes just for the sake of example. And you'll

see that the individual shards would be able to bypass Sassy and SSC solutions. And once everything is collected by the attacker, they can go ahead and reconstruct it and you'll be able to see the file in its original state. All right. And the third technique here that we'd like to share is called data ciphering. Now, I think we briefly covered this in the beginning. Um but most Sassy SEC solutions cannot detect encrypted files uh on the client side because they don't have the decry decryption key especially if they're encrypted um in the endpoint or in the browser. Now when Assassin SEC sees an encrypted file two things can happen. First they can either blanket block and say that any file that are

encrypted we won't allow upload or download. The second option which is the most likely option is that they'll allow the upload without any inspection. So if we go back to the application earlier, we'll go ahead and upload an encrypted file. Well, actually this file is going to be encrypted um on the client side uh by JavaScript. Um and the key thing is you can use any encryption technology that you want. Um we've used a pretty simple one here, but I think this just shows you that uh no Sassy and SEC solution can go ahead and figure out and decrypt this files on the fly. So once the content is being encrypted, it it can bypass Sassy and SSC

solutions and the attacker can then go ahead and decrypt the file um and gain access to

it. Okay, so the next technique that um I'm going to share is actually one of my favorite ones just because it's so elegant and so simple. So this technique is particularly effective against um you know re reax based policies and the reason is if you remember reax has to work in a very specific sequence. So the moment you encode the data and turn it into a different format it completely breaks the detection. Um because this way the encoded message in this case a social security number will no longer meet the reax requirements needed for detection. So here again is a demo where we're going to add a JavaScript to any file that you're trying to upload. Um which

will go ahead and encode it using B 64. And the reason why we picked B 64 here is because we just wanted to show you that you know we're using a very simple encoding method that you know everybody knows about. Um so this really shows that Sassy SSC's don't have the ability to detect encoded data and much less decode it. So what happens here is uh once the encoded file is being uploaded similar to the encryption one um the attacker can go ahead and decode the file and get access to everything that's inside. All right. So now we've uh given enough pain to sassy and sec solutions. So let's take a look at endpoint DLP. So

just quickly um we'll we'll show how uh Chrome Enterprise DLP works. And as mentioned earlier, we're using Chrome Enterprise uh just because they have the full suite of API access. So here a policy is being created for uh credit card numbers and we'll go ahead and apply it to different data leakage vectors including cop copy paste uh file upload, download and printing and we'll go ahead and bypass it one by one later on. Um here you can of course uh do your own reaxes but I think Chrome also comes with a set of different regular expressions for things like credit card, social security number and so on but it's the same technique. Um they're also

using regular

expressions. So we'll go ahead and apply that. And then after that it's just adding you know the severity level or the likeliness level and kind of the action that you want um to happen. So in this case we'll just block it. All right. So now let's take a look at how we can bypass each of these actions using using data transcoding. Uh first let's take a look at file upload. Um and in order to it's important to understand what exactly happens when you upload a file. So when you trigger a file upload let's say by clicking an upload button in any SAS app what happens is um the endpoint DLP solutions will go ahead and inspect this file um

using the file upload API and if it does not trigger any DLP policy only then will the file get uploaded to the browser and then it gets sent to the service provider. So in a data transcoding attack, um the attacker or the insider thread will already encode the file in their own device before uploading it. And after it pass the DLP tracks, it'll go ahead and decode the file before it sends it to the service provider. And the reason why we've done this instead of just sending it directly to the attacker server is to show you that this attack works on pretty much any SAS app that you want to exfiltrate data to. So if we just take a quick look

here, this is the same policy that we've implemented earlier. So any file um that contains two or more credit card numbers should ideally be blocked um by this uh DLP policy. So just quickly show you that you know without data transcoding this will go ahead and be blocked by Chrome Enterprise DLP. Okay. So now what we're going to do is we're going to encode the file before we upload it. Again, we're using a very simple method, B 64. And once the file is encoded, we'll go ahead and hit upload. And because now it's an encoded form, there's no credit card number being detected. And we're going to add a piece of JavaScript to intercept the file upload after the

checks has been done by Endpoint DLP and B 64 decode it such that when you upload the file and open it up in Google Drive, it'll actually look exactly the same. So there you go for the attacker. You know, it looks exactly seamless. It doesn't seem like any blocking or any encoding a decoding method has happened. All right. So now we'll take a look at file download. And this is a very similar concept to file upload um except it's reverse. So when a file download is being triggered, uh the file is actually fetched from the service provider and then it gets inspected by the DLP solution using the file download API. um before it gets you know um

downloaded to the endpoint. And the key principle remains the same here. You want to encode the file uh before the inspection happens and you want to decode it before the user the end user or the attacker sees it. We'll take a look at how this works. I believe in this case we're using Zoho as an example. Um but again you can use any enterprise um any enterprise or consumer SAS app to do the download. So typically um the DLP solution will go ahead and block the file download uh from any app that contains credit card numbers. But what we're going to do is we're going to go ahead and add a piece of JavaScript that will go ahead and fetch

the file and then we're going to go ahead and code it with B 64 and then trigger the download. And again at this point because now it's no longer in the format of credit card numbers um it won't be detected as suspicious by the DLP solution and the download will be successful. So I'll just quickly skip through it. Right. So this is the downloaded file. We can go ahead as the attacker you know in their device go ahead and decode the

file. All right. So what about copy paste? So copy paste is a little bit more unique because there's kind of two operations where you can apply the DLP policies to. It's usually applied upon the copy function or on the paste function. And as the attacker, you don't really know um which policies it's applied to. So just to give you a brief preview on how this works. We'll go ahead and copy some random text and paste in a paste bin. You'll see that this is successful simply because it doesn't contain any credit card numbers. But the moment we go ahead and copy the text um from earlier given that it contains multiple credit card numbers I'll be blocked by the enterprise um the

endpoint DLP policy. So how are we going to go and evade this? And as I mentioned earlier the attacker doesn't know what exactly is the policy being applied. So just to be safe we're going to add two pieces of JavaScript. So the first is going to be at the paste site. So here what we're going to do is going to assume that anything that's on the clipboard is already B 64 encoded and we want to go ahead and decode it and add it to the text area. This is to evade any sort of paste policies. Now on the copy side, what we want to do is we want to also make sure that any um kind of data that is added

to the clipboard is already in encoded form. So the JavaScript that we write here is going to read the selected data uh encoded in B 64 and print it on the console such that any data that is being copied onto the clipboard is already in the form that is encoded. So this way we can evade both the copy policy and the paste policy. All right. And another question that we often get is like hey what about printing? It's a very common way that insider threats like to smuggle data out. Um so this is a technique called data insertion. Um and again going back to how regular expression works um you want to make sure that the uh format of the data type

that you're trying to inspect is in a specific format. So what we can do is um you can insert a a piece of um text or a character in between each car in between each character in the file such that when it reads it um it's going to break the regular expression for credit card numbers. So just to make the experience even better uh we can add these text in a very small um kind of size and also turn it into white such that when you actually print the file it'll look exactly the same but really what it does to the endpoint DLP solution is it completely breaks any regular expression rules and even if you don't know um

whether there's credit card number you can just add it um into between any characters that you see on page. So we'll go ahead and view the file and as you can see it just looks exactly to the same exa exactly the same to the end user. All right so those are the five techniques that we wanted to show you. Um and I think we're reaching to the end of the talk but we'd also like to share with you an open source toolkit that we've created for any of you that want to test this back with your enterprise security stack. So any um pentesters or red teamers here? Okay, just a few. Um, so if you

are interested, you can go ahead and go to our GitHub. Uh, we're gonna release this I think an hour of this talk after this talk. So, uh, what we've done is there's going to be two things over there. The first one is the application that we showed earlier for the Sassy SSC bypass. So, all of that you can test out. And the second one is all the text that we shown for endpoint DLP bypass. We've actually created a browser extension for it. And the reason why we've used browser extension is because it can inject JavaScript into the browser. But of course, if you want, you can also use the developer console just like we did. And u one thing that I like

to mention is I love to take credit on building up the code for um this repo. But to be honest, most of this are done using the free version of chat GBT and claude. So this just really shows you how trivial it is to break enterprise DLP solutions. All right. Um, so before we end, I'd like to invite Vbec up for some key takeaways. The whole talk almost felt like a bad commercial for enterprise DLP. So you know what is fundamentally broken here? How many of you love writing regular expressions or copy pasting that in your DLP solutions, right? U so I think the whole DLP industry hasn't innovated much probably in the last 15 to 17 years. uh much of

it is still run by regular expressions which is amazingly shocking in this day and age for you know you to just go ahead trying to search for patterns and strings and whatnot. So that's the very first part which we feel is very easy to break and what we've tried to show here is very trivial techniques right I mean much of it B 64 encoding I mean it's been around since probably the early web is so easy to detect that something is B 64 encoded and most DLP solutions don't even have the ability today to actually figure that part out so I think the first takeaway is enterprise DLP at least in our humble view is broken there

needs to be a big disruption happening uh today's world of you know LLMs you know where you can actually pass tech get inferences I'm hoping that's where DLP is going to move where it's starts to become more inferential rather than pattern searching the second part of course is the browser is the only unmanaged application at this point in time on managed enterprise devices for the most part very few enterprises manage the browser very few enterprises lock it down while what we forget head is the browser is a full compute machine, right? You can run code, do whatever you want, transform data. I mean, encode it, encrypt it on the client side. Fairly easy to go ahead and

create bypasses for both cloud as well as uh you know, endpoint DLP with browser APIs. The second thing what we also feel at this point is most browser vendors have not opened up really powerful APIs for endpoint DLP to go ahead and access what's happening. Because if you think about it, like if all you're looking at is a file upload is happening via Chrome browser. I mean, all your work gets done via the browser, every single app you use for the enterprise today is a SAS app. And this is really where it's expected that all files probably go through the browser. How many of you still use USB keys? Probably barely anyone, right? Network file

systems, sharing of files within the enterprise, right? I mean, almost makes me sound dated now. Most of the young folks don't even know what that is and this is really where if that's the gateway device that is the gateway application makes sense for you know all browser vendors to go about working on it. Now what we find of course is that Chrome and Edge are a little bit ahead. Uh you can try out similar examples with purview nothing changes when it comes to Firefox Safari all of them they lack behind like anything. uh much of it you can't even do at this point in time when it comes to any form of in browser monitoring. Now the second part of

course is the whole sassy sec. That was a great promise and the promise was you don't have to have anything on the endpoint. You can send us all your bad traffic. We are going to take a look at it, clean it up for you, send it to the destination. When things come back, we're going to do the same thing. You don't have to worry about deploying anything on prem. But what's really happened is when these technologies came to be, that was probably around 10 to 12 years back. It was easy to look at network traffic and recreate what was happening in the application layer, right? Many of you may remember the times where you had to hit the refresh

button on Gmail and Yahoo mail. But now applications are super complex. UI frameworks are complex. Everything is an API. Real time binary stream communication is more the norm than the exception. And this is really where without really taking into account all of those nuances of how browser tech has really changed over the years, uh, at this point in time, it's impossible to look at network traffic and recreate what's happening at the application layer. Um, beating Sassy SS SWGs, we had a very detailed discussion we did a while back at Defcon last year. In my humble view, is almost hilarious. uh and to see most enterprises depend on that as the primary way to block insider

threats you know data excfiltration and whatnot. So I think in conclusion what we wanted to say is at this point in time the browser is the weakest link. Fairly easy for insiders and attackers to excfiltrate data do things via it. uh browser vendors have to step up their game first allow a lot more fine grain controls so that you can do you know many of the things that you could happily do at the end point by just interfacing with APIs Windows you know Mac Linux whatever gives you you probably need an equivalent capability in the browser second sassy sec as an industry needs to probably expand its parimeter into the browser and kind of in a way

redefine behind what they're supposed to be doing because without roping the parimeter of Sassy SSC through the browser, there's absolutely no way that you can go about protecting that data coming out of the enterprise. Okay, anyone here not yet convinced that DLP in the browser, you need this book? Here you go. You have a question. Okay, we'll go through the questions just just just to kind of make sure questions are going in the right order. I'll probably have him give it you can just put the question in. Okay, but u I'm going to ask a question for the book. Okay. All right, everyone. Just as a reminder that we do our questions via Slido. So, I'm handing the iPad over and

VC will be able to to check out the questions have been submitted. We've only gotten one so far. So, just reminder, you just go to sli.do do and enter in the code besides SF2025 and we're in theater 6 again and why don't you so okay sounds good I'll just read out the question so will this work if all the traffic goes through enterprise browsers right which is browsers where you can control things uh that's the whole point of the talk is if you're able to control what's happening in the browsers whether it's enterprise browsers or you know extensions which go ahead and provide browser security you are able to monitor every single thing happening. Uh and and that's that's

exactly what you're trying to say is if the browser is an application platform then you need something browser native to go about protecting. Okay. I don't have any more questions here. Someone was asking a question here. Anyone any other questions? If you just shout it out I'll try to ask it again on the mic for everybody here. Go ahead. So I think if you're not question yeah I got the question okay I'll repeat the question so his question was that if you're not using enterprise browsers or any form of browser security how do you go about you know solving this problem right uh simple answer no you can't similar to if you don't have an EDR or

an antivirus you know how do you go about at least getting a chance at figuring out whether malware or ransomware was downloaded and that's the big blind spot at this point in time is given that you can transform change data as you want within the browser if you aren't having anything there or you fully control it attacks are going to get bypassed. Okay, gentleman there question or you want to add something

Yeah. Why?

Exactly. Exactly. And and just to add on to your point, I think we are running out of time. To add on to your point, we've seen attackers even go ahead on the endpoint and kind of go ahead call the browser out but in a headless way. and then using that to exfiltrate data. So, Mandant and a couple of these guys put out some attack research uh around how you could actually use headless browsers and still go ahead get the beacon across and all of that even if Sassy SSC was running and there's a lot of interesting uh you know I think recent research around all of this because if all your traffic is going through Sassy SSE and you're probably

excfiltrated sorry infiltrated the endpoint and there's ran ransomware malware something running then how do you really now exfiltrate all of that data right and the best way seems to be once again via the browser. Now, you can't package in a new browser because that would be a new untrusted application running would probably light up every EDR out there and this is really where if you could just go ahead run the existing browser in headless mode. The other variation that we've actually seen is the installation of browser extensions and you know both sideloading it as well as somehow having the user install something. I mean the most common attack we see now is a new version of chat GBD comes out and at

that point in time you know attackers will put out on social media saying install this extension and you'll get free access and those extensions start to exfiltrate data and whatnot. Okay, running out of time. Thank you very much. Oh, you want to do two more? Two more minutes. All right, let's just read it out from SL though. Um is there any difference for mobile browsers on managed mobile devices? Uh okay, it's the second question down there. Is there any difference for mobile browsers on manage mobile? Uh not really. Now the only downside of course is given that on any mobile platform iOS or Android there's very little you can actually do at the operating system level right

because you've been locked out. Uh people have tried to go ahead use proxies. The closest one which we've seen is actually deploying an application with accessibility mode turned on and that's a little superpower with which you can monitor a lot of what is happening on the device and we've seen some attempts kind of like happen there. Have you looked at cyber haven that does not rely on regax? Yeah, by the way the whole cyber haven attack which ended up happening we were the first to disclose. Uh we've looked at a couple of this I think interest of time we'll try to post a more detailed response. Last but not the least, amazingly this was Audrey's very first research talk at any

conference. Personally, I thought she was amazing. So, thank you. Thank you, VC and Audrey.

Data Splicing Attacks: Breaking Enterprise DLP from the Inside Out

Related talks