
um so for a lack of log content when I think of a log it should have anything in it that can help us attribute to a user and understand the action taken um so that could include critical Fields like a source IP address a user agent the actual event that's occurring whether it was sex successful or failed um and it should be able to attribute the user um successfully so that when incident response or detction engineering is looking at it they know exactly what user account could be compromised to taking malicious actions and then aut have external facing documentation some companies do internal facing documentation like CR strike um where you have to log in to
access it but I'm a big proponent that it should be externally facing so you can easily do threaten research and understand what detection is actually possible based on the audit logs before you actually dive in um there's poor quality and inconsistent formatting um one of the examples I like to use is there's an unnamed password manager that previously had a Chrome extension a software uh desktop software and web UI and all the logs were different depending on how the user accessed the um system and so if you think about it your someone exported passwords from the Chrome extension but then that log looks different than if someone exported it from the desktop software and so um when
you're doing incident response for detection engineering you want to know if someone export trated data and so that audit log isn't consistently in um in that type and so it's really important that across product versioning and operating systems the logs are seeing are the same and you're able to deduce what behaviors are occurring and you have the same visibility um so you don't want to have a system where if they take an action out like depending on the versioning of the the update you don't have any disabilities with they taking critical actions um there should be a low rate of log quality related incidents this is on the vendor um sign so log should be
reliable and taken as a source of Truth um there's occasions where their system goes down and you have to back fill logs um but I'm talking more like um when there's duplicate event IDs um there's issues with the formatting so that it's not being parsed correctly um anything that kind of affects the laog quality of your logs in your sim and there should be limited latency between an action occurs and the log event is available um so for example if you think about a again the J ad or like a GitHub phone or like any of these SAS application actions occurring um by the time you're alerted of it it's going through it has to be generated you have
to pull it via the API you have to stream it through any enrichment and then it's going into your sim and then depending on how often your detections execute and then you're getting the alert for the incident response tee so if the third party has a long latency between when the log is actually generated then that's affecting your Mee time to respond it's affecting like how quickly you're able to how long an attack is in your environment um so this is really important that they prioritize limitedly um and then difficult log collection mechanisms so you should have the ability to stream logs to a cloud storage some provider that seems really obvious um not all sass applications Do
It um some of them they just allow you to view certain laws to be the web UI they only have admin logs they don't have user logs like there's a large variety and these SAS applications are not prioritized the ability for us to get the logs um which should obviously change um log collection should be possible IDC be sorted and straightforward so you want to make sure that you understand the timeline of when events are occurring and if an event occurs twice that can mess up your incident response timeline you're understanding of the incident um so logs again should be the source of Truth and good log formatting and data structure choice is critical um anyone that's been
a security engineer AR detection response team can understand like the log inje part and sometimes take the majority of your job the detection engineering part is the fun part um and so taking time to par logs is not fun um when it's difficult data structure it's even more difficult um so really making sure that the choice of the formatting um is makes it easy to par and easy for SE Engineers to get the information they need and then the last thing I wanted to touch on was licensing and cost um if people have onboarded SAS applications as a security engineer sometimes you know that logs are not included um and I think that's like a big oversight and
being able to have good security cure with that SAS application um so they should be price or part of the core product and um the process to add them should not be comers some so if you want to know more I highly check recommend checking out the audit log shame it's at audit logs. tax it's a website that my ex- colleague and I maintain um but it goes over logging issues pricing issues um for example um you can only export logs the UI and batches of 500 messages per export obviously an issue um does not include audit events for project settings group settings deployment approval activity time zone use difference based on where you view the audit logs they don't
consistently use your TC um all these issues make it really difficult for us to use the logs um so I highly recommend checking out that website and also it's open source so if people want to contribute to it um based on their own experience logs please feel free what was it um audit is yeah sorry it's super tiny audit a logs on tax okay tax yeah it's based off of SSO Wall of Shame um so we copy the same format and um same questions also recommend checking that out um so we talk a lot about the restrictions SAS applications put on us but what can we do to make the data better so these are kind of the two
things that I go back to um having reference or lookup tables or caching data like in dyo DV or something and then data inje and cross cross management um so imagine we had a table um in our Sim of all our EDR provider information um whether that's a source IP address operating system used user assigned and then we have a table of lower risk device activity higher risk device activity we can reference that in our detections kind of creating a risk based detection without having to go through the whole process of um contributing the scoring for all of our assets and um and then for data cross enrichment um imagine that same IP address is enriched
to every other blog source for that user to understand if they're accessing that application from the location so a good example for this is again if you have um EDR logs or even IDP logs you can understand more context about that user and you want to Bubble Up that context in every single log you have um the downside of this makes your logs bigger so pricing and concerns but it gives both the detection engineer and the incident response analyst the context on the user as they're looking at the raw log um so they're able to say this user has these permissions they're um an HR analyst they have they're coming from an IP address that matches the IP address
in the last 24 hours from our ER so we can say it's a corporate device um we can understand other aspects of this and this really helps us with detection engineering so let's go into some attacks detections we can write um first first my perspective on approaching the detections for SAS um I like to do threat research on the active attacks that are occurring um for these environments and then I love diving into API activity you get some really interesting stuff um especially if they have API activity and not token usage logs so you can see how tokens are used um yes what right research tools are you using um Google are you the M or you using so
depending on the SAS application there's reports that have been created by like mandant or other organizations and then there's also just a lot of blogs there's a lot of people that have like exploited things that have seen things in the while that companies that have been haed that have WR briefs on it um so kind of anything that relates to a stas application threat so usually it's like whatever the stas application is active threats and then going through a bunch of pages and like reading links and the links that connect to the links and recorded future oh we don't use that um and like I've never used I've never had a company that wanted to pay for it
okay um again like API activity fund user and service account Behavior Analysis um it's really interesting to see um how users are using your environment if service accounts have been compromised what their behavior usually is if you think about a service account they should take the same actions consistently so deviations from that are definitely interesting to look at um the token usage um which I'll dive into it detections and then you know your environment best so what are your critical stat appications what are the Assets in that environment um what data does it have access to um it's really like an access a practice of threat modeling your own environments and prioritizing sta PA Pati that have your
critical data and then from a minor attx perspective um these are kind of the four I focus on when I'm first doing an MVP for detection writing so the initial access or finantial access you can look at the token usage you can look at the login activity um you kind of get your basics down um then looking at persistence and then collection and exfiltration so usually when an attacker is in assess application um their goal is to exfiltrate as much data as possible um potentially impact your environment um so you'll be able to see the ideas that they can exort you um you'll be able to focus on exploitation behaviors that they can take um so some of the put to detection
engineering research um that AIT blog documentation I mentioned that should be externally facing um taking a moment to look through all of the event types that are possible um a past history of log data to use for hunting usually do six to eight weeks um research using those current security articles of content um and then threat intelligence indicators so whether you have a service you pay for or you have an internal team that works on this um being able to incorporate any of that stuff um is really critic to environment especially if you know that there's um an active attack or targeting organizations that are similar to yours so I'm going to focus on two cases
for attacks and detection engineering going to look at GitHub um and Smith Le so both of these um applications are pretty critical to organizations um you know GitHub you have intellectual property you have your deployment um you have a lot of what Secrets potentially in there tokens that you can use to access things um and then you have snowflake which has financial data personal Data customer data and has becoming a really large platform that a lot of companies use for data warehousing um so I think this is only going to grow so let's talk about GitHub log visibility my favorite topic that's logs um GitHub actually has pretty good logs now hasn't always been that case um they
have GA attribution of user email addresses activities in the audit blogs so what that means is before if you were looking at a GitHub audit blog you would see someone's GitHub username and if you looked at people's GitHub usernames they're often not their name um and then it's really hard to attribute that activity back to a user in your in your organization um and so they added user email addresses so you can see the user email address usually it's a corporate email address that's committing something that's taking the action really really helpful for understanding if there's Behavior outside of um your organization taking activities within your environment like someone accidentally got their personal email or
like a talk to outed an email um to your organization um you can look that um they allow the ability to include Source IP address and log so this isn't by default you have to toggle it on which is like a slight downside um but that's critical for user attribution understanding where behaviors are coming from they then provide granular detail on the type of token taking the action so there's four token types um and it provides you in the logs the token type um sometimes when it was um just the O access token gives you a time frame and then the hash token um and then the behaviors that are occurring with that token and then
there's now GitHub API request logs so you can see granular usage of those tokens to take actions into the API which is super valuable so let's talk about some GitHub threat actors um I kind of think about it in two different buckets for GitHub you have various groups that do malicious payload deliverying packages um so deliver these malicious payloads um trick users into um executing packages downloading repositories um and then eventually Poss data extion and then the other group um or there's various groups that do it one of the most notable ones is shiny Huns um they give credential theft and data exfiltration so they're actually focused on compromising those user accounts um Dre credential th and then exfiltrating
data stealing access keys and ultimately exporting companies so for GitHub detction Focus kind of there's four that stuck out to me when I was like initially looking at it um abnormal usage of access tokens right there's previous proof that the actors are exporting extorting these um utilizing stolen credentials and credential theft repository cloning Behavior so that exfiltration of intellectual property and persistence through new user accounts so we're going to talk about normal usage of access tokens um in April 2022 GitHub security announed I detected the compromis O access tokens issued ofok Travis C Integrations and that was used to download data from a lot of organizations um so there's definitely activity in the wild that they have
admitted um that is using the compromise of O access to to exentri data then in October 2023 and another compromise organizations found that attackers access the of accounts using compromised personal access tokens um most likely from now that was on a someone's end point and grabbed it from the development environment so looking at a example detection we're looking at GitHub o token actions taken by various asns and user agents so the idea here is that we're looking at differing locations and differing devices based on the data that we have um so the two is kind of like I just put in there or we can do more depending on your environment but you don't want to
see people taking get Hub actions using an oop access token from multiple locations and different devices in a short time period um this is also based off of U an approach to doing detecting Opa session hijacking um the only difference between the GitHub logs and the opal logs is the opal logs Rises the session ID so you have a little bit more Fidelity in that um but you can definitely dive into abnormal actions by o access tokens you can do anomal detections um basically focusing on when this is deviating from when it's typically being used then we're looking at personal access tokens I couldn't include the whole detection in the screenshot but the idea is that um you're looking at
get flone events and when a personal access token is created um and you want to look at maybe depending on your environment the personal access token is created and then five repositor been clone 10 repositories been clone and then if you look at the ability to add reference tables or lookup tables you can actually get a list of new new hires in your organization or classified by Opa data or sorry yeah Opa data in your IDP like wherever there's like user rules um you a new higher tag you can actually exclude them um because if you think about it like a new hire is onear CL T repositories right and you don't want High team to get a a detection
fired every time that happens um but it is really interesting because if you're going to exrate data via P um then you're going to try to get as many repositories as you can before you detected um and then another one I want to look at was API request logs um so get request is secret class minus the public key um understanding if they're accessing Secrets within your environment um if that's a compromise token you'll also get the token usage information so there's some additional detection ideas um kind of that I've been thinking of and also have like written in the past um such as an SSH keep created from a suspicious IP address um I say suspicious IP address
because a lot of people do use enrichment so they understand what's like if a detection is or a log is coming from a residential the corporate VPN if it's coming from a tour if it's coming from a hosting provider um so you can actually if you have this enrichment focus on you know what is the behavior that we're expecting usually residential IPS and corporate vpns um and you can tune that a private repository change to Public Super significant um user downloaded data as a zip file so GitHub allows you to just download a repositor as a file so that's another way um that you can use your access to ex um GitHub also has um kind of an unknown
user field I haven't seen it used a lot um but it is available and it gives the ability for um a non offed user email join next we're going to talk about snowflake um snowflake thread actors um there's one major one kind of the only one we know about it's pretty new um who have stolen significant number of um records from Snowflake customer environments um and they use infos on mware to gain user account access they log in exp trate data um and in some cases they use client applications to get access and Expo trate data um there's kind of a couple methods but usually it's just a user account no MFA login suspicious IP address grab data um
and we're going to talk about once they log into snowlake what are the behaviors they actually take so once they've log in Via stolen credentials um they show and select tables um within your environment so those tables where you're storing data um potentially financial data and then they're going to create a temporary stage so for the folks that aren't familiar with stages they're a way to load and unload data in Snowflake um so they're going to create a temporary stage in order to store that data that they're grabbing from those tables so they use the copy into command to copy into the stage um so moving that data where they can access it and then they
get the data through stage to their local machine or an external cloud provider so looking at some attack techniques um focusing on client applications which I mentioned before was that way that you can set up automation or third party Integrations of snowflake um exfiltration behaviors data transfer actions and malicious IP addresses so I'm not going to touch on malicious IP addresses or data transfer actions but I highly recommend looking at the snowflake audit log documentation um and then the mandiant or snowflake report onc 5537 they have IP addresses that you can use as indicators to compromise um of course are changing but it's nice to have a detection with it when you know the intelligence and then
also data transfer actions there's a couple different ways to do it you can look at the auto documentation they have pretty good logs um we're going to focus on exfiltration behaviors so again snowflake announced um the compies of 200 plus customers that had data extration and then extortion by UNC 5537 and then there's kind of not any other thread actors right now that have been talked about and so we kind of were in the position we don't know when the next one's going to be but as snowflake grows as an application and storing more and more critical data it's only a matter of time and so we're going to look at that snowflake stage behavior um so we're
looking if it's set to an anous external Cloud location um right now I have the stage being created and then the stage URL which is the export location um if it's an S3 bucket GCS or Azure um so that covers the three major Cloud environments um we want to get detection written those are the ones that are supported byic um we want to get a detection one of these stages is created it's pretty infrequent in a lot of environments um and then we want to understand where that data is being exported to another one is looking at users anonymously quering data so there's this table called query history it captures every query taken inst simply and every
command that is listed in that query um so we're looking at when a user deviates from their typical Behavior interacting with a single database um so we're looking at that name and then the user that's taking that action and if there's a nous amount of appearing so if a user account has been compromised and then they're able to take it and use it to query a bunch of tables at once and then accrate that data and the last one I want to talk about directly to the UNCC 5537 attack um they use this copy into command um to get it into an external location so they Prov app and then and um they would copy
the data into it directly um so this is a good one to maintain and this is actually instead of a detection of the Sim this is actually a detection you can deploy directly in your snowflake instance um so it's a query you can do a monitor in Snowflake and then get an alert or an email um forication if a detction like this fires yes do you do you find more [Music] effective buil into appliation words like this as opposed to what what's your go to it depends on how the Sim is parsing the logs um so if the Sim is parsing the query history and you're able to easily search it I'm a big fan of keeping all
detections in one place um however there are instances where you keep detection as code in a repository but they get deployed to different vendors so maybe you deployed some stuff to like project dri for like meder and then some from t s or like sass applications or other logs that are streaming in um I I think when there's an incident that's occurring where you need to quickly deploy detections it can be quicker to do it in the SAS application itself if it supports it um if not I would say like keeping everything centralized is really nice um in response to the snowflake attack um this is a detection that like I wrote for to be deploy
directly in Snowflake and we wrote all of our detections in Snowflake first because it was faster and then we made sure the data was clean and normalized into the Sim and then reot them into the SIM um some additional detection ideas for snowflake is looking at those new client applications um so UNCC 5537 used two client applications that were malicious um one of them is called de Beaver it's actually like an res tool um it's used legitimately as well and then um a homegrown client application called R PL um and so understanding when these automations and drivers are set up in your environment um grants have added role to user and um policies modified um
as I mentioned earlier you want to understand what IP addresses are um accessing your environment and what your what Behavior you're respecting um and only ads are allowed to uh modify Network policies so if you want to read more about threat hunting and snowflake I wrote this article when um the attacks first came out like about them um and it actually a snowflake specific and that you can go through and it'll all be queries in Snowflake query language that you can deploy to your snowflake incidents and do threat printing um or proactive detection writing using their system um and it walks through what each of the uh queries mean and why it's important to your
environment and then that's it um I wanted to thank everyone for listening to me spiel about logs and detection um if you have questions go ahead and ask them you can always reach me on LinkedIn or afterwards um I will open source my detections on GitHub and then also provide them through the link but they're collecting for bides um there's a lot of good links on things like attacks um sources that you can use for log collection information on um again audit logs that tax um detection writing in general and then there's some out of the box detections that I linked here um that are good to talk uh to check out and then there's a GitHub detection
specific bid top that was at Las Vegas this year um by a friend of mine highly recommend it he talks about setting up a detection as code pipeline and then talk get help for us um so it's really good thank [Applause] you does anyone have any questions hi um earlier in your talk you rattled up a bunch of things that SAS applications should be collecting in their logs kind of a what um so I would say username or user email Source IP address user agent optionally like geolocation information you can enrich that yourself but it's really really nice when a company does that by default um and then the action they're taking in a very clear way
whether it was successful or failed yeah first I was going to ask basically the same question but I'm not good enough at taking medal notes what reference can I go to to find list similar to thatare engineers so um if you go to audit logs. tax it has information on some of the like basically the exact presentation I went through in the beginning about logging um restrictions in quality all of that's published there and then I also have a medium blog um that's about how to create audit logs for Security Professionals for the Persona of a software engineer thank you yeah uh so in detection engineering which do you think is more useful like understanding straight C or like query
language like each Sim has their own quy language I think like yeah understanding how to do data quering and having the ability to learn new systems really important I think coding can also be important depending on like what position you're doing and if you're like doing automation or scripting um doing log parsing it kind of depends um but querying languag is really really powerful I think that there's a misconception in security sometimes that like if you don't have experience with a tool that like you're not be able to like adopt to a new tool and I think that's like really harmful for like people that use Sims because it is really similar there's good documentation um as long as you have
asked like have had experience of data before you can do it yeah for sure so I guess follow question do you think that realistically May some standardization inure of language I think there's too many vendors doing too many different things yeah I wish yeah and I think that there's adoption of like maybe not the information going into the audit logs but for formatting purposes like using a unified data model like ocsf elastic like ECS um super helpful for detection Engineers but I don't think we're going to get to the point where the logs itself always look the same or the quer language is always the same unfortun yep
ofice yeah I think this goes back to a little bit of like the risk based detection writing so like you want to understand the risk of user behaviors and what is typical for that environment um I say the biggest challenge is having good data um and a lot of times if you have data piped of your sim that's enriched and formatted then you're at the point where you can start considering that um but if you still have poorly formatted logs or not enriching data like you don't have information on the users and the entities in your logs and it's going to be really hard um but I think bigger detection engineering teams where you have like 10
or 12 detection Engineers they're like a lot more focused on that when you look at detection response people since you're here you'll get to hear this multiple times speak louder than normal okay okay okay repeat the question find a microphone and have somebody run around not everybody can hear you okay um