SaaSquatch Hunters: Threat Detection In The Wild Of SaaS

BSides Dublin16:4090 viewsPublished 2025-10Watch on YouTube ↗

Speakers

Julie Agnes Sparks

Tags

StyleTalk

Show transcript [en]

Uh hello everyone. I'm going to go ahead and get started. I think it's about time. Um thank you for coming to my talk where I'm just going to nerd nerd out about critical SAS logs um and how you can do stuff with them. So basically just looking at detection methods and different ways that you can approach writing detections for SAS apps and also how you can focus on hunting. >> Um so a little bit about me. Um I've kind of done a lot of the stuff um especially on like smaller teams. Uh but I found that like my favorite thing to do is detection engineering and hunting. So it kind of like spawned this talk and

mostly I look at like security research now and like how we can get insights from like large amounts of data and like find uh actual threats. So, for the folks that are familiar with the MITER attack matrix, there's actually some new SAS matrix that came out to help you guide your detection development. So, I'm going to cover that quickly. Um, some of the challenges with specific logging uh related to these systems and then some attacks we'll walk through and um some detections you can think about when you're thinking about those attacks. Um, so MITER actually released something called the SAS matrix. Um, it's very new. It's probably less than a year old and it has general guidance. Um I think

that in general it's very very vague. So this is not something I ideally use but I think it's a good overview for people that are getting into it. Um in comparison um Push Security came out with a very a way more specific um version of their own where you actually can dive into explanations and things that they've seen and how to approach them when you're writing detections. So I highly recommend checking this out. was something that I look at. Um, and it's generalized, so you still have to do a threat model for every like application that you're looking at, but it gives you some good things to start with. Um, so with SAS logs, you generally have

some of these available. It's very rare that you have all of them. Um, you almost always have user activity and admin, wait, sorry, admin activity. Um, and then sometimes authentication depending on the system. Um, but the two I found to be really critical is API activity and integration activity. Um, so some examples of companies that include that are like GitHub, Octa, etc. But it's actually like you have to have a team at a company that's invested in making those logs available. Um, cuz while companies might have them internally, they don't actually expose it to the customers or they put it behind a very high pay tier. Um, so when you're like thinking about what application you want to write detections

on or hunt on, you really have to think of like what logs are actually available to you. And I have complaints about logging for SAS apps. Um, these are just some screenshots of some examples. Um, GitHub used to not link your email attribution or like an ID other than your GitHub actor name in like organizational logs. So it was really really difficult to figure out who was doing things. Um, some logs don't include IP addresses, user agents, there's no good way to collect logs via API. Um, so all of this is like challenges you run into and you have to really consider when you're actually approaching how to get the data in to do um, this analysis.

Um, so if you want to know more about specific logging issues you might run into when you're like collecting this data, um, a shameless plug, I try to maintain this website, which is a copy of the SSO wall of shame if you're familiar with it. Um, but it's specifically for audit logs. So, it's trying to encourage vendors to do better um and call out uh vendors that put things behi behind very high pay tiers um or that just have really difficult logs to use or missing critical fields. Um and on the website also there's guidance. So, if like a vendor does look at it and they want to listen um there's ways that they can improve their logs.

Um, but even if you're looking at like ingesting a data source, you can like check this out and you can see if um you might run into issues or like pitfalls to avoid that maybe like we ran into in the past. So I think there's like two critical ways when like you get really bad logs that you can potentially make them way more useful for you. Uh so in a lot of sims you have the ability to have like a reference or a lookup table. Um, and you can actually save critical data, uh, and then you can reference it in like your detections or your query. And so one thing that I've done that I really liked

was if you have an IP address in a log, um, as it's flowing through your data pipeline, you match on if that IP address has been seen um, by your EDR provider in the last 24 hours. And so that gives you context on um if that device might be like higher or lower risk if like that IP address has been associated with someone within your company in the last 24 hours. Um and then it just kind of gives you a head start into that investigation and like evaluating risky behavior. Um and in a similar vein like if you can do cross data enrichment so you can take information from one log source and inject it into another log source that

might be relevant. So you're making the log bigger, which depending on your storage and everything could be difficult, but at the end of the day, like if you're looking at information um from like a SAS app, but you also have some of the IDP details enriched, it can really help you write better detections, get context, and um pivot easier when you're doing incident response. Um, so I want to cover a couple attacks and um, usually this talk is like closer to an hour, so I just picked out like some key ones that we're going to go over. Um, but if you're interested, I think the slides that I submitted are actually way longer. Um, so when I think about doing

detections for SAS, um, kind of like these are the areas I start with. I think they're sometimes in logs they have like a high probability of being able to be tracked by logs. Um and I think that they're higher value than just doing like um basic like threat intelligence like IoC's like when you start threat modeling understanding like what has been seen in the wild what might be relevant to your specific industry or organization um what does activity in your environment look like and then what are like critical assets within that application. So like do you have like customer data stored in like a data lake? Um do you have like critical GitHub repositories? Um understanding

like those things so you can prioritize where to write detections, where to spend your energy. Um and if you're familiar with the MITER attack, um these are kind of the ones I focus on. Um usually because there's better logging, right? Like you have the authentication logs. um you might have like admin activity and then often they will have like a lot of hopefully a lot of logs about like data exports um cloning repositories etc. So you can kind of catch the attack at the end. Um so I'm going to go over like two critical SAS apps. Um we're going to talk about GitHub for a little bit. Um most companies use it and it also often

has critical IP and sometimes customer data. And then we're also going to talk about Snowflake. So a lot of people are adopting Snowflake um as a way to have like manage data storage um integrate with different solutions and it's become like a really big adopter and because of that with so much critical data in it um attackers have started focusing on it. So for GitHub um I think the one thing that I've seen the most common um that companies have run into is the compromise of a personal access token. And so GitHub has two types of personal access tokens. Um the original one was a personal access token classic. So essentially when you created this

personal access token, it had permission to everything in both your personal GitHub repositories and whatever company organization you were associated with and there was no way to separate permissions. So anytime that personal access token was compromised, they essentially have access to everything that you do. um GitHub introduced I think they saw the issue with this especially like seeing customers being attacked. Um they focused on releasing like what they call like a fine grained um personal access token and with that you can actually specify specific permissions. Um and so the problem is is like people that have generated personal access tokens that don't have an expiration date that maybe like generate like four or five years

ago, they're still using classic. People that generate them now often use fine grain. So, if you're thinking about like a developer and you're a company that has been at the company for a couple years, you might have like thousands of personal access tokens classic in your environment. Um, and some companies have focus on rotating these. Um, but I would say it's like very manual and also sometimes it's like, oh, we'll just take the risk. Um, but this is probably like the most common attack I've seen. And the focus is just excfiltrate as much data as you can and then start ransoming the data. Try to get like payment. Usually it's not like a very sophisticated thread actor. Um but

they're just focused on like how much data can we get and then they'll end up like creating a backup. It's called backup um markdown file in the repository like delete everything put a markdown in their personal GitHub repository and just say like here's how you contact me like if you want your data reach out. Um, and so I think from a detection perspective, you have like a couple different avenues, right? Like if you've seen the specific threat pattern, you can do some of the more like static like understanding that that repository was created. Um, what is in it? But then from a behavioral standpoint, you really want to catch it when it's like actually happening and it's like using those

critical tokens. Um so like one of the ways just like an easy way to like start is just looking at when you call out to the GitHub API it gets logged as um a repo download. So in comparison to like a clone. So you can easily like create like a quick detection where you say um is this a repository that's specific to my organization? Um is it using a personal access token? Um and then filter by like the actual actor repository and then set the threshold you want. So maybe they depending on the size of your organization, how many repositories you have, maybe you say like they downloaded 200 in 5 minutes or 1,000 in 5 minutes.

Um, and that's like a critical alert that you can look into. Um, another thing is like typical like doing an API request and doing enumeration, figuring out like what repositories they actually have access to so they can figure out what to download. And so you can, this one's a little more noisy. I would say you have to like be careful with like your environment. you probably will have to tune out certain tokens that people are using in unusual ways. Um, but in general, like if someone's doing like a high amount of enumeration with like a personal access token or another token, um, it's something to definitely look look into and then you can also correlate it with the previous detection

and understand the actual like attack path. Um, I didn't go over GitHub thread actors because there's so many and so many different ways that it's been focused on over the years. Um, but I want to call out the most famous one with Snowflake, which was UNCC 5537. If you remember, it came out April of 2024. Um, there's a couple different reports on it. Snowflake released some information. It's now deleted. Um, and then Mandy came out with a really good report. So, I highly recommend looking at it. And so they were actually instead of focusing on Snowflake, they were focusing on the customers who use Snowflake and finding a way to get into their environment, download a lot of

data, and then essentially the same same thing. They want to be paid. They want to um scare the company. They want to get all of the things that they are like aiming to get. Um and then they'll probably also sell the data. So um this thread actor had very very specific behaviors that they use in every environment. So, Snowflake released like a lot of like IoC's, but you know, those aren't like super useful. Um, I would say probably the most useful one was they have like very specific user agents that I've never seen before, but now that Snowflake has posted about it, it's like probably not usable anymore. Um, but they did a really good job of breaking down the

behavior that they see in every environment. Um, so they get stolen credentials. They specifically target accounts that don't have MFA enabled. um which you would think that if you have a snowflake instance you would require everyone to have MFA but a lot of these people that were affected didn't um I think about like 250 organizations were affected by this uh thread actor and um so that might be like a service account it might be like a user that um was created in Snowflake manually like a long time ago and then they if you're familiar with Snowflake they basically just select all the tables that they can so they want to see what tables you have um they select

whichever ones they think might have like critical data that might be important. Um so maybe that's like HR data, maybe it's like customer data um anything that they have access to with that user account. And then they use something called a temporary stage. So in Snowflake this is the ability to load um in Snowflake data or um load data out of Snowflake. So you can create this staging environment and then you can point it to a place to actually bring in information or like drop information like an S3 bucket or something. And so a lot of these like companies found that they created a temporary stage which isn't often reviewed. And it's kind of just like the

wild wild west. Like you might have like one temporary stage that like all the data engineers use to like bring in data from like specific internal infrastructure and then they would copy this data that they found into that staging environment and get that data through the staging environment to their local machine through like downloading it to like S3 etc. and like pulling it down. And once they got that right, then they're reaching out to these companies like they're making it very aware that they have their data. And when something like this happens, you can write detections and threat hunt on like things that have already happened. Um, but because this is a new attacker, you would also want

to approach it from a detection perspective where if this happened in your environment, whether it was even like an insider or this like financially motivated attacker, you would be able to catch catch it. Um, and so probably looking at the time I probably could have included more on this, but I just linked to a really good blog post um, shamelessly that I wrote, but I think it's a good start if you're like trying to think about how to implement these behaviors and it actually goes through how to run these queries manually in your Snowflake instance. So Snowflake creates a lot of noise in their logs and a lot of people won't bring them into their sim because they can query

directly within Snowflake. And Snowflake also has like monitors and like alarms that you can set um and like grab that alert via like send it out to your SIM. And so thinking about how to thread hunt within Snowflake if that's like your only option. And so a lot of these queries are written that way but then can be rewritten for whatever SIM you use. Um so this is the blog post. Um, Mandant has a really good report. Um, as well as I think Resonate also came out with a really good report on it. Um, but if you're thinking about how to implement detections for Snowflake, how to like threat model things that have happened or could happen based on that logging

visibility. Um, this goes through it pretty in-depth. So, that's kind of the end of the talk. Uh, I didn't realize it would go so fast, but I appreciate you guys listening. If you have any questions, feel free to come up to me. Um, I'm sending out the deck afterwards, but there's some links to like some really good pages that have the out of the box detections that you can start thinking about. I think the limitation with a lot of the out of the box or like public repositories is they're very focused on like Windows or Mac or like endpoint focused things and you're just now starting to see um SAS or even like AWS, Azure, etc. like being introduced to

these out of the box pages. But it is happening slowly and if you write something super cool, you should definitely commit it so that other people can use it. I would also say if you are a detection engineer, you're interested in writing detections, um there's a new closed open-source um community called detections.ai. I would highly recommend checking it out. It's more of like a closed source, but it's like community members can also contribute rules and it's often used by people who are more like cloud infrastructure focused. Um so that can also be really helpful. Um but that's it. Thank you.

SaaSquatch Hunters: Threat Detection In The Wild Of SaaS

Related talks