Hunting Supply Chain Threats Using Anomaly Detection

Name: Hunting Supply Chain Threats Using Anomaly Detection
Uploaded: 2023-05-10
Duration: 50 min 2 s
Description: Craig Chamberlain presents a detailed case study of a supply chain incident detected through anomaly detection applied to cloud API logs. The talk explores detection engineering approaches for cloud environments, demonstrates practical anomaly detection techniques (geographic anomalies, rare service

BSidesSF · 202350:02345 viewsPublished 2023-05Watch on YouTube ↗

Speakers

Craig Chamberlain

Tags

CategoryTechnical

TopicCloud IAM Detection Engineering Supply Chain Security

ResearchCase Studies and Incidents Analysis Methodology

StyleTalk

About this talk

Craig Chamberlain presents a detailed case study of a supply chain incident detected through anomaly detection applied to cloud API logs. The talk explores detection engineering approaches for cloud environments, demonstrates practical anomaly detection techniques (geographic anomalies, rare service-user combinations, novel action patterns), and reveals how these methods uncovered unexpected unauthorized third-party activity in a production environment.

Show original YouTube description

Hunting Supply Chain Threats Using Anomaly Detection Craig Chamberlain Come see a case detailed study of a supply chain incident and how it was detected by applying anomaly detection to Cloud API logs. https://bsidessf2023.sched.com/event/1HzuT/hunting-supply-chain-threats-using-anomaly-detection

Show transcript [en]

so thanks for coming to my talk my name is Craig Chamberlain and this talk is called hunting Cloud Supply Chain threats using anomaly detection um my I have a long version of my my biography and my introduction but the short version I can give you is what happened is um 20 years ago I was picked by a security product startup in the Boston area and we got the MVP we had a purpose-built rule detection language like many security products do and and the next question was well what should it actually detect like what should it detect what should alert what should it do and and nobody wanted to work on that for various reasons so I

thought it was interesting that we could do anything but we didn't know what it should do so I volunteered I kind of said yolo and volunteered to work on that and then went on to work on several more large-scale threat threat hunting and detection products and projects over the course of my life and um still doing it today and this is a talk about um as far as um this is my second time at B-side San Francisco well it's my second talk here at B-side SF my talk last year was given by a colleague because I wasn't able to be here unfortunately um this is actually my I think my seventh piece I to talk now because I

was at um besides Rochester in March um and I've done a bunch of other conferences I've done uh 14 or 15 conferences um lots of Publications basically just been working on on threat detection for most of my life um how I got here is uh six seven years ago I was doing work for some of the largest cloud using organizations in the Boston area where I'm from and got interested in in threat hunting and detection there and how different it was um along the way I I kind of went on a walkabout in 2018 and I did a project I did a detection project in some open source tooling just for fun um started doing work on

practical applications of machine learning for threat hunting and detection in in 2019 that was the subject of my talk last year and so continuing to work on applications of machine learning and anomaly detection for threat hunting in general Cloud threat hunting detection in particular as well as the combination of of you know new technologies like machine learning with conventional detection um the the approach that I like to take and I'm going to turn this into probably a blog post or try to flesh this out and try to articulate this more but the approach that I think we need to take or that I would like to take in this problem space is um is to kind of Define a new discipline

and the reason being is because not many of the existing disciplines um that we have in in detection engineering are not exactly what we need in the cloud world so um network security monitoring endpoint detection and response obviously are you know absolutely foundational essential pivotal tools and Technologies um that that are that we need but there are there are large areas of cloud detection where they're just they're just not that effective because a lot of the cloud API activity just simply doesn't leave host or network evidence um the reason I don't call it data science and I I don't plan to become a a traditional conventional um a data scientist is when it comes to threat detection not

everything is a data science problem um if you can detect a you know there are some threats we can even even in this space there are some classes of threats we can detect with simple conventional searches and rules and queries and if we can detect a threat at such a low cost as that then that's that's preferable to spending enormous time and effort on a machine learning approach um in other cases there are areas where we really need complex machine learning or not the detection Tools in order to find some of these things but you know not everything is a what we call detection engineering problem in this world not everything is a data science problem so what we really

need is is to kind of merge the two of those so what I'm going to show here today is I'm going to show you what I've been doing over the past year or so as I've been continuing to do work on anomaly detection for cloud threat hunting and Cloud threat detection and I want to show you some examples of of you know kind of articulate how I'm doing that in some detail and then show you some of like the the most interesting examples of what I found so far one of which is and and the reason I wrote this talk is the first time I turned on this this particular detection um stacked and tool set that we're going

to see here today the first time I turned it on on production data I found something that I wasn't expecting I was expecting to find possibly some credentialed access and you know some more of what I've been seeing I wasn't expecting to find what I found which was something kind of new and interesting and so um I'll show you what that looks like in in great detail so the reason for the Ocean's 11 theme in this talk is um if you if you've seen my talks you know they a lot of them have Movie themes my my mom was a huge movie Buff when I was a kid so I grew up as a child

going to the movies so I guess that's what happened here but um the reason for this particular theme is what struck me is that it wasn't that long ago that supply chain attacks were kind of beyond our imagination why I say that is if we go back more than a decade ago I'm saying to into the like 2007 8-9 time period I can remember sitting in threat modeling sessions in you know large financial institutions with you know large talented very capable security teams but when it came to third party or supply chain scenarios um it was it was simply beyond our imagination at that time and one of the remarks somebody in a meeting once made

a remark and said isn't that something like an Ocean's 11 scenario and at the time it was it was something we hadn't yet conceived of of course a decade later and now we've seen um in in fact just recently now we have possibly the largest most interesting example unfolding right now with 3cx where it appears that there was there were actually two supply chain attacks in that in that intrusion set um the earliest one is probably 2013 and Target where there was potentially there was a apparently a an HVAC vendor or a third party that was compromised that had access to the network that was that was the the nature of the supply chain there

um or the supply chain intrusion there of course in 2020 we had Sunburst which was probably that I mean prior to this year probably the largest I guess you know it remains to be seen it's an open question as to whether Sunburst or 3cx has the the highest uh cost or impact in the long term the RSA breach I include that because that wasn't exactly a supply chain attack because as far as I know none of the software product there was was actually poisoned or anything um I think it was more a case it was more strategic in that the ultimate objective there was probably not the the manufacturer of the two-factor tokens it was probably one of the users

so we've seen supply chain attacks now with a network management product with a Voiceover IP product um probably see more of them over the course of the decade but what about one area I think we haven't really thought about yet is is in the cloud Arena because if you're a cloud user or if you are if you work with large-scale Cloud users one of the things I've noticed over just the last few years is that there are an increasing number of third parties that have some or sometimes a lot of access to their customers Cloud accounts in order to deliver Services sometimes their security services sometimes they're just management and orchestration services but they're delivering services and so sometimes

they have access to prod accounts and sometimes they have a lot of access and Cloud thread hunting and detection is it's it's hard enough that we've seen um there there are four there are more than four um examples out in the public record but there are four large public record examples and case studies that were big enough that they actually went to prosecution so we can go and read the indictments and read about them in great detail and the thing that's interesting is that in in the case studies that I'm going to show you I don't think that it I don't think that the Common Thread is anything to do with the organizations who experience these

because some of these organizations have very large very capable very talented security teams as as you'll probably realize when you see the names and I would venture to Guess that that many of them were probably doing you know not everything 100 right but they're probably doing a lot of things right when it came to security and they probably did have sophisticated detection tooling um but in these four cases there was large-scale X-File that simply went undetected until it was far along and you know it was not detected early enough and part of the reason is that most of what we're looking at most of what we're dealing with most of the time is not always but most of the time we're

dealing with credentialed access where credentials have been compromised and somebody is persisting via credential access and impersonating a legitimate user and that tends to look like um that starts to look like Insider threat hunting which is something that I had worked on earlier in my career and it's one of the it's one of the harder things to hunt I think if you look like if the biggest examples of this uh possibly are well there was there was actually there's another example just recently where um there was a an Air Force employee who released some classified material into a Discord server and so now there's a large debate about you know oh how did this happen

and why did this happen and how is this detected um but you know it's it's the second or third time that we've we've seen this and I think the answer is it's not that um it's not that you know something wasn't done that should have been done at least I doubt that's the case I think it's more case that somebody who is technically illiterate and is very familiar with the systems that they are exfiltrating data from and possibly even familiar with what detection mechanisms exist and how they might evade detection um hunting an Insider like that who who has any degree of sophistication and and knowledge of the terrain is is a very difficult thing to successfully hunt and

and find and so when we have credentials when we have you know smart sophisticated threat actors who have obtained credentialed access and Cloud accounts um it's sometimes it's it's hard to to distinguish them from ordinary users so this is the best one sentence version of the problem statement that I've found so far this actually came from a blog post um from one of the um from a Blog related to one of the incidents this is a better problem statement than anything that I've come up with in the past a few years it's simply that the problem is that at the end of the day it's very it's sometimes it's very hard to differentiate or disambiguate between

threat actor activity and legitimate user activity um because we're looking at credentialed access and the difference between legitimate and suspicious activity is it's sometimes it's a matter of nuance so working backwards through the there are four big examples um there are four examples at least that were large enough to go well three of these have gone to prosecution um the fourth remains to be seen But most recently in 2023 we saw a case of credentialed access this one was kind of interesting because according to what was published apparently uh there was a developer laptop that was running a service that was internet facing and their laptop and so somebody was able to get execution and persistence on this laptop and then

from there were able to obtain credentials to access um some to not only access cloud data but to actually some particularly sensitive cloud data and to exfiltrate uh the data that they were after and you know this is a hard one this might be the hardest one of them all because this might not have even if you were um scrutinizing activity on the at the time this happened this might not have looked very different from that user's you know ordinary work Behavior day to day um the exfiltration of course if you had been able to see the exfiltration that's that's a possibility but apart from that so um in 2021 we had this case where

according to the indictment we had an actual case of Insider threat the first the first big Insider threat case that I'm aware of so far at least um in in the cloud environments where according to indictment a user with some privilege privilege went rogue and started to exfolt data and apparently tried to engage in some kind of an extortion scheme or some kind of a monetary demand in exchange for the data um eventually um of course eventually law enforcement were able to uh to identify them and get attribution and went to prosecution um one of the things I've noticed about these these indictments and the reason I put them in here is that many of these

indictments have a kind of a breadcrumb trail to anomaly detections are kinds of anomaly detections and potential anomal detection recipes that we can use like in this case they mentioned in the indictment they mentioned that this user was ran a command called get caller identity in the course of so the user logged out log came back in from a VPN in order to try to hide their Identity or obfuscate their identity and then ran get caller identity and get caller identity is basically like a who am I command and um it's it's very unremarkable to see a user or or a program running that command it's pretty uh it's pretty unremarkable for the most part so I'm

not surprised that this wasn't you know if people were looking at the logs on this day there wasn't really a reason um somebody would have found that unusual um there's one possibility though in each of these so in each of these examples there's one possibility um something else that they mentioned that might have been anomalous was they changed retention policies on certain um storage probably on an S3 bucket or a certain storage places where logs were kept probably as a defensive agent probably thinking that if they change the retention to one day then with only 24 hours of log data by the time anybody got around to investigate there might simply not be any log data

to to investigate um in 2019 we had this case where and this was this one was a bit more complex where somebody was able to according to what we know somebody was able to actually kind of bounce through a WAFF or web application firewall service and get execution on their virtual instance running the WAFF and then call the metadata service the metadata service if you're not familiar it's just an informational service um that exists in most Cloud environments where a virtual machine or a compute instance can call the metadata service um over HTTP and just ask questions about itself and it can also ask questions like what am I authorized to do and if I'm authorized to do this or

that transaction or work in the servicer that can I have credentials and authorization to work in those services and the metadata service say yes you can and you know use these credentials use these temporary access tokens um long story short someone is able to to obtain access tokens um this way and we're able to use those to start exfiltrating bulk data from the storage service and you notice that this was the one this was the example that kind of originally got me started thinking along these lines because you notice in here they even note and I they probably weren't thinking about anomaly detection here I I don't know if they were but you know they actually they actually

um commented the fact that this this waffle waffle would have been a it would have been a role that was used by this virtual machine instance in order to deliver Waf related services um in the notice that this waffle doesn't ordinarily invoke the list Buckets come in um list buckets is if you're not familiar it's basically like like an LS it's like a version it's like an LS for storage service it's like list available um not directories but a bucket is not exactly like a directory but a bucket is a container where files and data blobs can live so it's basically like show me a listing of available storage units um and then 2016 was the it was the

first one one of the more interesting ones because in this case they um according to the indictment according what we read they actually shared snapshots from the victim account to the attacker account and a snapshot is um in in Cloud speak what a snapshot is essentially a virtual disk image it's virtually it's just an image of a virtual disk it's normally attached to a virtual machine but it can also exist in a detached state in order to be attached to some other virtual machine uh and it but you know it will basically contain the file system that whatever virtual machine uses it expects to have which may contain production data and of you know any and all kinds

and so they use this sharing feature um to sort of forklift data to their account and presumably from their account they exfiltrated to somewhere else that was under their control um but there's another case where let's let's look at these so if we look at these actions each one of these um in Cloud speak when you call an API method it's um in some classes called an action but essentially you're calling an API method and so can we when it comes to the question of how do we detect these things well many of these that we're looking at are not really good candidates for simple alerting or detection um why because they're just super memory

so um get caller identity is is sort of like who am I and um as you can imagine you know it would you could have similar issues alerting on that then you might have alerting on who am I um and it heavily automated environment where that's used extensively but um that can in in production um I'm looking at a cloud trail data here which is what I've been focused on but get color identity may exist there may be hundreds of thousands of instances of that action in the logs at any given time um list buckets is usually in the tens of thousands and shared snapshot volume created can range it can be a few thousand but it can also

get up into the hundreds of thousands anybody know why that's something that is that has been on the increase over over the last uh while last few years uh the reason is because um snapshots are increasingly being used by security products and services so for so-called like agentless security scanning um so sometimes security tools will in order to avoid having to install and manage agents sometimes security tools will scan this the snapshots instead um and so um snapshot access and snapshot operations are are be kind of become much more supernumerate than they used to be say five seven years ago um so then when it comes to user context this is just a simple scatter plot to

give you an idea of what it looks like but um there are with and this is what a lot of these look like in that there's a there are a small number of user contexts that call this action uh about a huge number of times but then there's a long long tail of user contexts that call it occasionally and um you know too many to think about any kind of you know simple alerting same thing with Source IP addresses um there's a few Source IP addresses that that call that action um a large number of times and but there's a long long tail of source IPS that call it occasionally with a long tail of normal outliers uh too many to

potentially too many for for to make that a good candidate for anomaly detection so list buckets um is one of them is also a heavily used command and it normally occurs thousands of times a day it can occur millions of times a day if if it's an environment that makes heavy use of S3 as it's as its primary data layer um typically you'll see it if you in any in any on any given day in any given cloudtrail lock set you'll see at least 50 000 um you'll see you know up to 50 000 unique user contexts calling that method um over the course of not in a day but over the course of most of the retention

windows that I'm dealing with are between one and four weeks so in a matter of weeks so the problem is you know how would you find on like on on that in that second example when this one role associated with a WAFF service that has no reason to call this bucket starts calling it the data was there but it might be one user context amongst tens of thousands so when it comes to detection um the snapshot sharing method actually has a uh a tactic technique combination in the attack Matrix and it's called well there there are several related to to snapshot um sharing and modification and I think the one we're looking for in particular is the at the bottom is

called snapshot modification but there's a tactic technique called transfer data to Cloud account which essentially involves using snapshots to to exfiltrate data and so the guidance here talks about says you know periodically Baseline snapshots to identify malicious modifications or additions and so and that's I think that's good advice um the Practical problem is how to implement that when we have tens of thousands or maybe hundreds of thousands of snapshot operations with a long tail of normal outliers in most Dimensions whether it's user context or Source IP or something else um Source i p address we can pretty much take that off the table as far as looking for unusual Source IPS because there for one in most Cloud environments

there are there there's an incredibly large and um unusually distributed number of source IPS there's too many to even start thinking about baselining or doing anomaly detection um there are normal outliers there are new IPS that show up that you've never seen before and they're used for a time and they go away simply because Source IPS are considered to be uh they're they're not in there they don't you can't consider them as an entity in that you can't assume that there is a user or a host or any kind of an entity behind a source IP because things come up and down and IPS are assigned and and unassigned and so um that's you know just that's not

really good fertile ground for for conventional detection or for anomaly detection so um even though we're looking at volumes like these there are cases and there actually are projects where people have written alert rules for they've written simple alert rules for actions like these um like assume rule assume rule is basically almost like the equivalent of a pseudo it's saying I wanted to assume role I want to elevate or change my privilege from from who I am now to this other role why because this other rule will allow me to do work in another service or Elevate privileges to do something I want to do again secret values is pretty much what it sounds like assume role with saml is is when

you have human users logging into the console most of the time if they're logged into the console they're authenticating by a saml um and you know there are fewer of those than there are just plain old assume role events but the reason being because assume rule events are used more by compute instances than by humans um for the most part they're heavily used by compute instances in order to assume the Privileges that they need in order to do work and so you know with these kinds of numbers it's it's it's you know way too many of these to think about simple alerting but even if we wanted to start creating complex logic to try to Baseline these

the amount of branching logic that we would need to try to Define and enumerate normal versus abnormal use of some of these methods based on users or sources or something else um it's it's just too large to deal with so the good news is that the the threat model that we're dealing with for the most part even though the possible manifestations are are numerous and the um if we draw in a tag tree like I've tried to do here it's at least for credentialed Access it's relatively simple in that most of the time we're really looking for either um a compromised compute instance so somebody has persistence on a compromised virtual machine or container and then they're doing assume roles

calls from there in order to elevate Privileges and gain access to data or things or um or pop or somebody has compromised credentials and they're authenticating using a key so they've they've obtained either they've obtained a key from the file system on a compute instance possibly or they have obtained a working set of credentials from somewhere else um and so what I've what I've done is in order to try to explain why I use some of these anomaly detection methods I've I've organized them here under these these two different um branches of this tree so in the case of compromise credentials what I found is Geographic anomalies are are work pretty well for this um and that's not new so Geographic

anomalies are good at detecting a lot of authenticated credentialed access um and a lot of anomalous credentialed access patterns but they're also really useful here because um most of the time there aren't a lot there isn't a lot of normal variance in in Source geography for cloud activity especially automated Cloud activity and transactions so rare service for a user and rare Source geography for an action or what they sound like so rare service for user is as a user working in a service that they normally don't the reason is that most users most of the time they work in the same Services over and over again and they do they generate very dense amounts of transactions in the logs

um it's it's relatively unusual for a user to to start using a new service um and to generate a very sparse number of transactions usually users in user context usually they generate very Dent activity or none at all um a new action for user is I'm using that instead of rare action for a user and I'll I'll show you why um spikes in errors and events are useful and in the next chart I've got another diagram where I'm trying to try to illustrate um what each of these like why are we using which case um each of these does better or worse at finding particular particular parts of the threat model tree so um on the compromise instance side there

are probably the odds are with a compromised instance it's going to be running you know there's not going to be any kind of geographic anomaly there for the most part but there there often will be rare service activity there often will be new transaction activity um possibly new rules for a key what that means is that there was a new combination of in an assumed role event there was a new combination of key and roll name the reason is that because most of the time um because a a key can be assumed well most of them we can assume that a key has an entity behind it it's either a user or a compute instance and where you

know whether it's your user compute instance they tend to use the same roles and do the same work over and over again so when they when they um when you see a new combination of user context and key or a new combination of key and user context or role that's often interesting that can be an indicator of of possibly privilege elevation or lateral movement if it's a case of credentialed access so this is this is kind of like my attempt to diagram what each of these um anomaly detection methods are are good at finding so on the on the horizontal axis so the horizontal axis is the duration in time it's has the intrusion did the intrusion just

start five minutes ago or has it been going on for a day or has it been going on for a week or has it been you know longer than that so it's it's you know is it is it a short or long duration and the horizontal axis is the density of the events is the intrusion set or the is the intrusion or the threat act you're creating uh many events and doing many transactions and creating many events or or is the density small and so this is the reason why I find I need the combination of um different functions the three main functions I'm doing are are new um things that are new or combinations

that are new in um and that they have not manifested before things that are um statistically rare that they are the least frequently occurring combinations in certain combination sets and then the spike function looks for um they look for rates of events or errors that are unusually high say like if you take the the five largest spikes in an error or an event by standard deviation and if you take the top five from the entire set you'll often find that the top five to ten are outliers in that they are well beyond any kind of normal Peaks and troughs you see from day to day and so a spike in error like that can mean it

doesn't necessarily mean that there is an intrusion a spike in errors can also mean that something is just broken or something possibly something third party is trying to do uh trying to elevate privileges that it doesn't have and something is just misconfigured but there's the interesting thing about rare or the interesting thing about a spike in errors is that there's there's not really a benign explanation for a spike in errors either something is broken or there's something suspicious going on um so in the case of something that is relatively new so it just started recently um new functions say like a new action for a user or a new combination of of um key and role will often be good at finding

that um but if the intrusion is is extant it's been going on for some time maybe some days or weeks then most of the time a new function won't necessarily find that so in that case a rare function looking at rare combinations is better at finding that assuming that um what they're doing is assuming that there are aspects of what they're doing that create kind of sparse events or sparse combinations um Geographic anomaly detection is is good um generally um of course there's a there's you know but it's not a Panacea but of course you know there's a simple Evasion for Geographic and Nomine detection which simply is don't be in a different geography than your victim now make sure

you're in the same geography that the victim is in um in which case we have to start thinking about things like autonomous system Network names which is something I want to do next because um even if you're in the same geography as you're a victim the odds are your um your as name and number is probably different um and then spike in errors and events like I said a spike in errors is a good way of finding if somebody lands on a compromise instance and starts doing large-scale enumeration in order to do privilege elevation or lateral movement or just starts doing Discovery it's often the case that they'll create a a huge spike in errors that's well beyond

uh it may have a could have a standard deviation and into the tens of thousands it's well beyond anything you've seen before um unless they land on an instance that has say god-like access this unrestricted admin access which case they may not generate any authorization errors but they may well still generate an unusually large spike in certain events especially Discovery events so in the examples um the practical application for this and examples is um in the first case um the the get caller identity transaction would was coming in from some kind of a VPN service that likely was in a source geography that was unusual possibly new not seen before but probably rare um and but even if not if it was say if

the geography was unremarkable then there's still a case for an rare or new autonomous system network name um in the case of where the life cycle retention policies were modified um this would be a job for something like a rare Source country for an action rare service for a user um the odds are that most human users most of the time most human users don't go around setting life cycle retention policies to one day especially for logging logging is usually set up and torn down by automation so this might have been a rare service for user it might have been a new action for a user um in this case this was coming in from

and some of this from what I read was coming in from a VPN some of it was coming in via tour so there would have been a rare Source geography for these actions there would have been a rare service for the user as this user context started calling S3 for the first time and if if we were running at the at the moment this happened it would have been a new action for the user and that um we would have seen this user context called list buckets for the same for the first time and then the snapshot activity um we don't actually know enough about this to know if there was a rare Source geography here but I suspect there

likely was um and if not there's likely a rare Source network but there would have been a rare service for the user and a new action for a user because from what I've read about this from what I understand the the user context in question here was it was more of a business user not an engineer and not somebody that you would ordinarily see sharing snapshots so it would have been very anomalous so the reason um the reason I'm doing well the reason I'm not doing say um rare action or method for a user is because there are too many normal outliers um it looks like this um because and I've tried this I've tried

um running combinations of rare say rare action action meaning method basically rare transaction for a user in cloudtrail data and it will find suspicious activity and threat activity but it will also find normal outliers that are just not remarkable so the signal to noise there it's it's good but it's not great um like in this case if if I were to run this um what I'm doing at the moment I'm actually doing anomaly detection in SQL at the moment because I have my collateral data is in a SQL data in a series of SQL tables and it turns out you can do there's some math functions you can do in SQL so um when we tried running rare action for a

user on a medium-sized cluster with say 300 million ish events um there are 200 to 250 results typically and some of those are interesting enough to investigate but many of them are not so the signal to noise is just not good enough to think about turning that into an alert or turning that into a something that goes into a unit of work um but if we run new action for user instead then we have to find that is what I mean is that um looking for in the set of transactions that have been called by user looking for instances in the last one to two days where I have a combination of user and action that has not manifested

in the prior say 30 days um using a query and so using a query I'm limited to going back to however much Hot Table exists in the data are exists in the table sorry um in the future I plan to turn this into um an airflow pipeline so that it will have some memory so that it will be able to reason about new combinations that it has seen uh like new combinations in the last day or two um that have never been seen before for as long as the function is running even though we only maybe have one or three days of hot data it still remembers what it has seen you know basically for all

time so a simple learning function um so the first time I turned this on what I found is this and it was not what I was expecting so this is the output of one of the queries called rare Source country for an action and it's essentially looking for the least frequently occurring instance combinations of event name or action basically the transaction name and the source geography because in this case I'm I'm enriching these events with with Geographic data which is something I don't think that happens automatically in most Cloud environments but um but it's something you can do they're using there there are there's Geographic data available for providers like Max mine that you can use to do this

so does anyone notice anything right off the bat um in the first row there are two assumed role events from a source country and then there's another describe instances event describe instances event is what it sounds like describe it if you call describe instances with no arguments and then it'll basically just return say here's everything we know about all of the instances the compute instances running in this account basically just tell me everything about the compute instances um it's very unusual to see it's very unusual to see a user call assume roll twice um or it's very unusual to see a zoom role happen twice in any Dimension whether it's user Network Source i p

Source country or anything else and same for describe instances so it turned out it didn't take long to find out that this was a source country that did not have any business relationship and so this this activity was um was inexplicable at first yeah um there were also a couple of start and stop instances events which were a little troubling because um what those do is a start instance will start a virtual machine a stop instance will stop a virtual machine the reason it's troubling to see that in combination is sometimes you'll see that combination if a virtual machine has been modified in order to um because if if you don't have say if you're a threat

actor and if you don't have working credentials um to persist into a cloud API but if you do have the ability to modify a virtual machine sometimes people will will choose to persist in a virtual machine instead and then and then attempt to elevate privileges by means of whatever assume role operations the virtual machine can do um but to see you know again to see one instance of these two transactions from any Dimension whether it's user country or Source IP or anything else it's very unusual and most of the time um most of the time most user contexts are are doing very dense very large numbers of start and stop instance operations it's very unusual to see

anything or anyone just just do one of each um and then this was new action for a user so it turned out that the describe instances transaction was also something that this user context had never called before um and so this has been a little bit some of this has been grayed out but what I found is it was actually a third-party account that's being used that was being used to deliver Services um in the in the environment so the the presence of this third-party account was normal but the transactions that it was were doing were not and could not be reconciled so um this was investigated and in the end it was decided to completely remove this

access

um in terms of hunting for this um I tried to include some more helpful information in terms of considering whether or not there may be unauthorized third-party activity going on in a cloud account um one of the good places to look is is um if you had say if you have a spike in errors if you have a spike in authorization errors related to this um then they'll often give you some clues about well they'll definitely tell you that the user context that's behind it and then that's one way to if you sort these out by user context it's one way to kind of quickly determine whether or not um you you have suspicious activity

coming in from from any kind of third-party delegation accounts um and then in terms of the question of when you're looking at a data like this and you're asking well what is this third-party account actually supposed to do um one of the things you can do is if you can go back to find the original create policy events from when the access was first instantiated the create policy events will usually usually when the when the vendor is setting up their access um they'll do a create policy operation and that create policy operation will usually will tell you what it is they expect to do so that's one good kind of clue is that if you find a user if you

find a third party account getting running into errors or trying to do something that um that you can't reconcile and if it's not something that is specified in the original crate policy then chances are that it's well it could be that something has changed on their side uh and they and they haven't thought to ask for increased permissions or access or it could be that something strange is going on and in terms of do I have any third-party access at all it's actually not that easy and straightforward to kind of audit for this but one of the things you can look for is depending on how about how far back your data goes is if you do sometimes you'll find this

event called create open ID connect provider and this is just a sample request and response and if you have third-party access you'll often find or if you're if you're about to have third party access you'll often find that this is one of the events associated with with setting that up um a couple other examples this is uh this is actually there's actually two things going on here so the first line item is is is paku and that's actually authorized security audit activity somebody's running a pacu is essentially like a security testing framework auditing tool it can also be used for pen testing um but the third line item is not normal and what it's finding there is what is

outputting there is that that user account um authenticated from two different countries at the same time um so and at the same time period they were authenticated from two different countries and one of the countries is is one that simply doesn't make sense because there isn't any kind of business relationship um planning on using so as as far as expanding this I'm thinking about um new country for user and possibly new source IP for an account and the reason being is that again if if an intrusion has been going on for more than a short time um then sometimes you know it'll be sometimes it'll be easier to find this sometimes it won't but um if you if

you're in a situation where you have these functions running continuously like in this scenario I described where the new functions have been have been running for a long time and they're able to remember um basically they're able to remember a long-term Baseline then we potentially could pick up a geographic anomaly potentially pick up a credentialed access instance coming in from an anomalous country uh right away like in the first few minutes versus having to having to wait for a geographic anomaly to form um new source IP for an account what that's about is if um if you interrogate your Cloud inventory data and keep track of all the IP addresses that are in use in your

account there's a lot of really useful things we can do with those to reason about like one is that um if we see authenticated activity coming in from a cloud provider IP address and it's not in use by one of your accounts or by one of your instances um and it means it's from somebody else's account so either it's an account we don't know about or it's something strange going on um plan on doing similar things for with with endpoint data one thing I find useful is if you have Network events from from virtual machine endpoints you can correlate those with the network inventory in order to look for like in order to look for sort of North South

traffic where virtual machine instances are doing heavy network activity um outside of your AWS account now it's not always that abnormal for them to be going outside of your AWS account because odds are they're going out to the internet to do lots of work um but if they're going to say an AWS or an as your IP address that's not one of yours um and the and the activity is dense and it's authenticated maybe or if it's um if it's something that you can't reconcile an endpoint like if it's a netcat instance or maybe SCP or something else that doesn't make sense and it's it's doing heavy activity to another Cloud account then it's hard to

you know it's hard to think of a benign explanation for that so that's a good detection um this was an example it's a new action for user this is the best thing it's done so far was back in January there was some research that was published uh not by me but it accredited it here where essentially what they found is that there's this obscure method called get Federation token that was being used for persistence and it was interesting because you could use this method to persist um not everywhere but you could use this method to persist at least in the console UI but it would but even after the original user account was disabled you could still persist this way and so

people were from what I understand people were using this method to try to survive an eviction and maximize dwell time and try to persist even after people were attempting to evict them and and disable the original account and kick them out they were still so um this was output back in like late January early February and I like this example because back in January um you know back in December January this was something that we didn't even really know we needed to look for which is often the case um the cloud threat actors are still in a period of Rapid Evolution and there's new research coming out and there are new tactics and techniques appearing now

and then that sometimes there are things that we you know we didn't necessarily know we needed to detect so I like anomaly detection there for that reason because sometimes it's sometimes it's good at finding emerging threats um even when we don't know what we're looking for you know we know we need to be looking for what's next but we don't know what it looks like so we can't write a rule or a search for it but sometimes we can still find it with anomaly detection so that is my talk and we have one minute left I think um so it can take maybe one question but I'm happy to hang out for questions no questions

question one question

okay yeah so the question is what about time of day so I think it will be useful uh I think what I need to do there though is I need to type the users um because um program there's a lot of programmatic users running Automation and tooling and doing work in Cloud environments and the programmatic users um they don't have seasons you know they're not nocturnal they're not diurnal they're just they're just doing work all the time but for the human users if I can type the human users uh I think looking for anomalous time patterns for them could be useful yes and I am at time thanks [Applause]

Hunting Supply Chain Threats Using Anomaly Detection

Related talks