
So my talk is named post compromise uncovering clouds and assessing risks. Um just really quick about me. I'm a principal at 03 cyber and I do multicloud security. Um let's just jump right into it and try to kind of set the stage here that's going to be with us through the entire presentation. So in in this scenario that we're going to live through in the presentation, we're going to be a multinational corporation and we've been affected with malware or ransomware in this case and that's kind of going to be what drives us in this scenario initially. So we have a ransomware that's affecting most of the infrastructure and we have limited access to our documentation and existing
information because most of the systems are down or being recovered in this case. Um but what about the cloud here? So the attack we're talking about in this case only um affected the uh on premises infrastructure but not the cloud. So my first question is well what about the cloud environments? Uh you're a multinational corporation you have a large footprint you definitely have some cloud in here. So the first question is like what clouds do we have here? Because there's again there's no documentation. There's some guys that knows where all the infrastructure is in the data center, but there's really no one that has full overview of the cloud in this organization. So, our first uh uh part of the journey here
will be to find out what clouds are there. And we're equipped with what I would say is the equivalent of a Swiss Army knife for most of the people in this room, which would be Python, PowerShell, a 5G router, and some duct tape. And we're going to start finding the clouds here. So, we we know the organization has cloud, but we don't know who who's using cloud. We we do have an idea of what what it's being used for. Um, but where do we start like finding out who owns these clouds? Because our CMDB is down. So I guess the first step is you start searching through some SMTP logs. You look for uh senders such as uh wildcard at
amazons.com or wildcardasher.com or cloudn reply at google.com which kind of gives you an idea there. You can also proceed to search the certificate transparency logs what certificates has been issued for like wild card your domain uhno and you can also try to like enumerate any subdomains to see if there's um some data on subdomains that have been registered or are registered uh that's being hosted in the cloud and when you do that subdomain enumeration you're probably going to get some IP ranges like uh what's the IP of the asset and you can then take uh download the IP range of the different cloud providers and you can see if you can match the IP of a public uh cloud provider um to to
the IP where your assets are running if they're not already fingerprinted by the service you're using to look for this. What else can you do? Oh, you could normally read the documentation. In this case, you can't. Um, I've also uncovered a bit of shadow IT or clouds by making friends. Um, so lunch is a really good opportunity. Find some guys with some print on their t-shirt. Maybe they have a bid t-shirt or something else. Uh, or maybe they have a Kubernetes t-shirt. They're even more likely to run cloud. Or look for people with like uh stickers on their laptop as well. Uh maybe they have a cloud sticker there too. and then you'll know that they might know more
about the cloud as well in the organization. So in this scenario, we've found some cloud. We've actually find quite a significant amount of cloud. We we found three AWS organizations. We found a Google cloud organization. Uh we found two enter that both had subscriptions in them. Um and of course we found Alibaba cloud but that was out of the scope in this case. Um so we're not going to cover that in this session. But with this significant cloud footprint, it kind of sets um some uh rules for like how how can you go about this because maybe you have developed a tool to do this efficiently in one sc uh one cloud but now we're talking mult
multiloud. So you need to find approaches that will work and scale here. Um let's kind of give this cloud footprint or cloud estate that we found some uh remarks based on like our initial look. So multicloud that's obvious one. There's no hardening meaning there's basically poor configuration of most resources. There's no logging besides the defaults and we're going to look into that. There's no governance. The GRC team didn't know much about the cloud environment. there's access keys and clear tech secrets floating around. Um misconfigurations are prevalent and of course most of the things has been clicked or some people have deployed stuff using the CLI. So with this as the basis where do we even start?
Um so we need some sort of a process here. Um and you can kind of follow a simple instant response process as well. You kind of start with investigation. Look at the things that are there. Now, if you're finding any signs of compromise, you need to do some sort of containment and eradication. And then you can proceed to eliminating the risks uh once you've uh performed those steps. So, the first thing we're going to do is of course going to be the compromise assessment. And during the compromise assessment, it's important to keep in mind how the public cloud works because you have the cloud control plane and you have the cloud data plane. And a compromise could occur
in both of these uh domains. And it could also occur in one domain and move to the other domain because of how these domains interact. In this scenario, we're going to start looking at using the control plane. and we're later going to move into the data plane and see how we can do investigation there as well. So when doing a control plane compromise assessment to call it that we first start with the logs but we can also utilize the control plane uh to identify possible persistence. It could be identities that have uh had a key added to it. It could be uh secrets that that you found in clarex configuration that is now being used. It could be virtual
machines that have been spun up or compromised through the cloud control plane. It can be resource configurations that are weak and being used to gain persistence or it could be on the network level as well. So the first thing we're going to do is focus on just getting all the logs and getting all those logs from the cloud control plane. So we wrote uh the cloud log collector um which interacts with the different cloud providers uh using the rest APIs. So for Google cloud we're using the cloud SDK. It handles the authentication and it allows us to really simple extract the different logs. For Azure we're just using the SDK for authentication because uh that flow is a
bit complex to write yourself and it works really nice with the SDK. Um, but I don't like the Ashure SDK. So to do things fast and at scale, I'm using the REST API instead for Azure, only using the SDK for authentication. Um, for enter ID, we're also using the REST API towards Microsoft Graph. And we're using the same SDK for the authentication here as well. And for AWS, we're only using the SDK, which is named Boto3. I don't know why. Um so with that and uh like I said there was no logging enabled but what does that mean in the context of cloud? Well luckily you still have some logs in in the cloud when there's no logging
enabled. So if you look at AWS we have the log type of cloud trail which is the API logs for AWS and they have a default retention period of 90 days. um it's not so straightforward to just well I just want to get all those logs for the last 90 days and I'm going to show you how that's handled in the cloud log collector soon. In GCP you actually have 400 days retention period of the admin activity logs which is really beneficial if you have a compromised GCP environment and you also have 30 days of data access logs which you don't have in any other other cloud providers at all. Um for Ashure you have the Ashure
activity uh where you have default of 90 days. This one is also a bit tricky. You need to collect it from different places which the cloud log collector handles. If you're on an ashure enter ID free tenant you only have the audit and signin day logs for the last seven days. And if you have a license you have the 30 days by default. And if you're also on a free and tried tenant, you will get an error when trying to collect the logs from the API because they don't want you to collect the logs before the retention expires on the seven days. So you can have actual retention without paying for it. So they'll just block you from interacting
with the API. I'm pretty sure there's a lot of people in this room who could bypass this by just scraping the data from the browser instead. Um so the cloud log collector the whole intention behind it is just doing a really simple log extraction from multiple cloud environments in this weird circumstances where you need logs from different cloud environments like this one. So just quickly going to show you how the cloud log collector works. Um so it's open source. I think it's on 03 cyber GitHub or my personal uh it's called the cloud log collector. Um, you can see there's just different functions that are really small for each of the cloud providers. We just call those
different functions. We can use the same time frame for all of the cloud providers. We know we get logs within that same time frame, which is really nice. Now, of course, it depends on the retention of each cloud provider here. So, if I put 400 days, I would only get Google cloud logs for for that far uh back. We can now run it and we see um it gets the Google logs first where it uses the filter uh in terms of the time frame. It gets a token for ashure ID and now it gets the logs for the different subscriptions. It loops through all and now it tries getting ID logs which fails. And here comes the tricky part
like imagine doing this manually for AWS where you need to pageionate. You need to go through every single region for every single AWS account to get those 90 days of logs that are default. Um, so this is also one of the reasons for writing the cloud log collector because in AWS cloud trail it's logged in each respective region and it could be regions you're not using at all where you have events because you didn't disable the region. And you can see there's also one region there failing in Canada. That's simply because the region isn't enabled in my account. Um, and now it's going to go through all of those regions for all of the accounts.
and it's going to save that shortly into a JSON file. So each of the different cloud providers will have a JSON file with its pure log format. Uh there's no normalization or anything happening here. And I'm going to show you why I don't bother doing that soon as well. See? And there we go. So now we have different JSON files. So fairly simple. We can use the cloud log collector, get a bunch of JSON files with all the logs uh down to our workstation. And now what? Uh we have the logs, but what do we do with those? So I have an architecture that I really like for efficient log analysis um that I've used uh on on instant response. And
what I do is I just spin up an Azure data explorer cluster. It could be Google BigQuery or something else. I just like using Azure data explorer. And what I do is I upload the JSON file there. And I'll show you exactly why and how. So this is my Ashure portal. I've created a data explorer cluster for this purpose. Uh we're just going to access it uh through the web panel. We could also do all of the upload through API. We could put it in storage count and autoest it. We already have the Ashure and GTP logs in here. But we're now going to import the AWS logs as well, the cloud trail logs, which is so we'll
create a new table. Just choose the file that we got from the cloud log collector. And there's 72 megabytes. And it's going to process that really fast. You can see it does some normalization for us. So that's why I didn't bother doing this. And this is the really nice part. And now I have AWS, Ashure, and GCP logs in a data explorer cluster where I can query this. I could do this locally using jq as well, but if I have large data sets, it won't really scale. Well, for the data explorer cluster, I can choose whichever size. Let's get 10 random logs from the AWS index. And yay, it works. So now we're going to look for
something specific across all the logs. So, we're going to look for three IPs that we've gotten from someone else on instant response, saying these three IPs are known to be an indicator of compromise. We're now looking for those logs across GCP, AWS, and Ashure. And it goes super fast no matter how much logs you have because you can scale up the cluster. And we can see there's a match on the call IP and GCP. So, we know that the indicator is present there. We can then look at the AWS log and see that there's also a match on this one because we've filtered for it. And there we see the same IP being matched here as
well. And now let's go down to Azure. And we can see there's a match here as well. If I change the IP now and just try to run again, you will see there's no results being returned. So it actually works to look through this uh data and find um indicators that you're looking for. I could also be looking for other things such as actions for instance, but those would be unique per cloud provider. But I could also normalize some actions that are known to be used or some techniques that are known to be used across multiple clouds uh by threat actors as well. So the nice part about this is you can have one guy that's just collecting all
the logs and just uploading it to the data explorer cluster how you upload it doesn't have to be through the console put it to some sort of storage blob or anything and just have it auto ingest and then you can have your team analyzing those that are fetching the different indicators if you're working in a large scale incident and they can continuously analyze against the data set and you can upload more data as you're able to acquire more data as well and we could potentially have Alibaba cloud into the mixer as well and just ingest that to into a separate table. So for the sake of this scenario, let's conclude that there was no signs of an
active compromise affecting the cloud control plane or the cloud-based identities. So um what are some precautionary measures we can still take? Well, we could go through all the access keys and just revoke them and like see what it usage because again we only have a limited days of logs. So things could have happened before u the time where we have logs as well since we only have seven days of signin logs as well. We could just revoke all the sessions and tokens to be sure rotate all the secrets and then look for like signs of persistence such as uh external trust being in in instilled. Now it's time to look at the data plane here and how we can leverage the control
plane to investigate the data plane because in the cloud you can't really just fetch a disk uh like this. So uh I've already showed you how to get the logs with a cloud log collector. Um you could get disk images through the lib cloud forensics which is open source by Google. uh and getting memory I like to use a portable binary from Kaya which allows us to take in volatile memory dump and I'll show you how to utilize this in the cloud context. So, lib cloud forensics um I've made a few adjustments. So, it runs like uh synchronously across multiple clouds. But what it does is it uses the uh different cloud providers APIs and then
copies or shares a disk based on the best practice to uh capture uh evidence in in a cloud in that given cloud provider and it then shares that with your forensic environment. So you can do this at scale as well. You can get like copies of all the disks in the different cloud providers and you can then have someone analyze it and look for like persistence or malware on the disc here. How about memory? Let's say we have a virtual machine with an uh managed identity. This could be running in Azure, but it could as well be running in GCP or AWS and it would work more or less the same way. So quickly put together run command.py PI
which uh is able to execute code on the different hosts from the cloud control plane on on the uh different hosts in any given cloud provider. I'm just going to feed it my invoke memory collection.ps1 script which is going to run on the com uh virtual machine. The virtual machine is going to use his managed identity to authenticate to the cloud uh provider is going to download the binary from HTTPS. It could also be in a uh cloud storage where I have it free stage because it's already authenticated and it's going to write the memory to disk. Now with the memory written to disk, it's going to again use it managed identity to upload the memory
file to a storage account. Um and I'm quickly going to show you how this works as well. So you can see there's some arguments that's like cloud provider specific. We can use it on Microsoft, we can use it on AWS or we can use it on GCP and we can feed it the exact same script there. um see and yeah we just have functions to run it on the different cloud providers and then we can trigger it as run command and since we're going to run it on a Windows instance we can feed it uh whatever shell script we like. So I have a super script I use for taking memory dumps. I use it mostly for cloud
environments. Uh we download the Kumaya toolkit which allows us to create a voltable memory dump. Um when copying large files in Microsoft Dasher I like to use the AC copy which is a binary because it uh handles uh way faster uploads than using the direct rest API for the storage account. Or you could create something using the storage account SDK. But I just get the uh AC copy binary and based on that we just run invoke ps ashure instant memory instant response memory collection and we can now trigger that using the run command. So let's just trigger this. We specify the cloud provider, Ashure, the script path, which is the script we just went through, and
the VM name, and what resource group we're going to put it in, and subscription. And oops, crack. Okay, there we go. Yeah, we run it quickly for Ashure and AWS. And there we can see now there's a memory file uploaded here using only the cloud control plane in order to get the memory. So by using the cloud control plane we can interact with the data plane to get the different uh things we need. We could query for specific logs or anything on the different hosts. If it was a Linux host we could just trigger something else like a shell script instead. So it gives you a wide array of opportunities. um just being able to leverage the cloud for its uh
benefit. So we're done with our investigation for this uh and now we're kind of getting into the remediation part. Uh so the first thing again you want to kind of go back and look like how bad are things like from a configuration perspective. So there's open source stuff you can run like Prowler. It kind of benchmarks your whole cloud environment no matter which cloud you run in. Uh, and I think it's really good. It kind of like compliance driven gives you like, okay, you're CIS compliant with this, but it also finds some other neat stuff like misconfigurations that can potentially be exploited. Now, you also should make sure you've enabled all the logs for later. Uh,
there's many ways to do it. It's also many tricky ways to get the logs out of the different cloud providers. You need to spend time and study how the cloud provider works, what logs are relevant, how do you filter this to not get immense volumes. And then I also like to do a more manual approach. So okay, I might run prowler get get like an high level overview but also like to just dump all the resource configuration. So uh if it's like ashure, I'll just dump all the ashure resource manager objects and just look through everything. I'll do mostly control F and look for things I know are bad or trying to look for secrets uh and
try to map out the architecture. Uh and based on like getting all the metadata and the configuration of the cloud, you can start to uncover attack paths. Um it requires a lot of patience for large cloud environments, but it's definitely doable and you start seeing patterns after a while. Um I think my conclusion is uh you should be adept to working with the SDKs and uh the rest API for efficient instant response in cloud environments. And although the cloud had no signs of compromise in this scenario and it was only affecting the on- premises environment, it was a matter of time given the poor configuration that we uncovered. And that's all I had for today. So thank
you.