Streamlining Threat Hunting in Cloud Environments with Jupyter: Chi Phong Huynh and Kai Iyer

Name: Streamlining Threat Hunting in Cloud Environments with Jupyter: Chi Phong Huynh and Kai Iyer
Uploaded: 2025-10-27
Duration: 30 min 57 s
Description: BSides Edmonton September 23-24, 2024 Talk: Streamlining Threat Hunting in Cloud Environments with Jupyter Abstract: Threat hunting is an essential cybersecurity practice that involves proactively searching for cyber threats that evade existing security solutions. In this session we will explor

BSides Edmonton · 202430:573 viewsPublished 2025-10Watch on YouTube ↗

Speakers

Chi Phong Huynh Kai Iyer

Tags

CategoryTechnical

TopicCloud IAM

StyleTalk

Mentioned in this talk

Tools used

AWS CloudTrail Azure Log Analytics Jupyter Notebook Microsoft Sentinel Splunk

Service

VirusTotal

Frameworks

MITRE ATT&CK Framework

Vendors

Okta

About this talk

BSides Edmonton September 23-24, 2024 Talk: Streamlining Threat Hunting in Cloud Environments with Jupyter Abstract: Threat hunting is an essential cybersecurity practice that involves proactively searching for cyber threats that evade existing security solutions. In this session we will explore the capabilities of Jupyter notebooks as a powerful tool to enhance threat hunting capabilities, especially across cloud platforms like Micrsoft Azure and AWS. In our exploration, we will focus on identifying common attack Tactics, Techniques, and Procedures (TTPs) utilized by threat actors. We'll be introducing a Jupyter notebook containing detections mapped to the MITRE ATT&CK framework and threat hunting methodologies backed by unsupervised machine learning. We will take a look at huge datasets using visualizations to find anomalies. These anomalies would be converted into High-Fidelity Detection, along with some ideas to extend this hunt to IAM Platforms like OKTA. The flexibility of Jupyter notebooks allows for the incorporation of machine learning models and statistical techniques to predict and identify potential threats based on historical data. This predictive capability is invaluable for staying ahead of threats. Speakers: Chi Phong Huynh Kai Iyer 2024 Slides: https://drive.google.com/drive/u/0/folders/1ess6fUZNd9BbWK7pPBrh8UVE-7GXtMyG

Show transcript [en]

good afternoon everybody thanks for making it today for our presentation on streamlining T hting Cloud environments with Jupiter let's get started here the agenda for today maybe starting with a quick introduction about ourself and a quick introduction about threat hunting what are the limitations of traditional thre hunting and while there is a need to adapt to a modern traditional thre hunting with automated Tools in machine learning we'll also deep di into Azure laws some fundamentals about Azure laws 0365 laws then we'll also look into some fundamentals about AWS cloud trail logs in specific then we'll go into some Jupiter notebooks where f is about the Azure part of thread hunting with Jitter on Azure environment and I'll be talking

about thread hunting in the AWS environment then we look at some takeovers and references so I work as a c security engineer at UI C and I do a lot of technical blocks open source contributions on GitHub I like a lot of code and I'm also a data privacy advocate in my free time I actually read a lot of manga and watch a lot of anim that's a quick introduction about me f so me same thing MD Engineering in a at the most in sometime you can find me on link usually for biking hiking I mean yeah if you SC this QR code you should be able to find Fong it's not a fishing scam it's actually a legitimate

LinkedIn URL there yeah question about so just going to read out this terminology here hunting is terms of techniques and tools organization used to identify cyber threats so threat hunting is basically a process where you look into a chunk of logs and you try to identify some threats which are hidden in there and then you classify those threats as relevant threats or non-relevant threats to your organization so some organizations the threat might be so the threats which are relevant to Industry a might not be relevant to Industry B so it's very important that you understand what is a relevant threat to your own Industries and traditional that hunting heavily relies on the expse of analyst who's

actually doing the threat hunting the modern day threat hunting heavily doesn't rely upon the analyst ability to do the threat hunting but more focuses on automated tools machine learning and chunking through a lot of De if I have to classify the hunting into types then I'm going to class myal dat structured unstructured and AD hog structured is all about looking for specific dtps when I say dtps that mean techniques tactics and procedures I'm going to look for specific things that are out there in the environment I'm going to look look for specific threads I know what I'm looking for and I'm hting for those threads unstructured is when I'm looking for ioc's Suppose there was an exploit

out there there was a vulnerability out there and I have got some ioc's from some open source intelligence platform wirus total or FBI flash free and I want to hun for those ioc's Anda environment I'm just going to unstructured thread hard and AD hog is whenever you need it suppose there is a new vulnerability out there lock forell or any other vulnerability and you want to search if any of those iocs are available in in your environment or you just want to do a quick sweep so ad hoc is just whenever your clients require for it or when the time calls for it structured and unstructured are basically scheduled every day or every week regardless of

how your MDR provider or the so provider wants to perform those so those are timely manner and adog are whenever it needs limitations of the the hunting we have too much of data these days for even analyst to sit down and Chun through terabytes of data every every company is currently moving into Cloud environment and then when every application is going into Cloud environment you're logging a lot of different types of Log sets it's too much of data to even hunt for and that's why the first limitation too much data and the scalability part when I have got like 10 terabytes of data I run a query with some wine cards in a Sim solution

like spun or Sentinel or cinar logarithm the scalability part becomes a question because I'm not able to perform a large scale threat hunting and I needs to give me a near real temp detection so that's a scalability problem static nature of fun is basically when I know what is the threat I'm looking for I write a detection query or a detection logic and I'm just going to hunt for that but what if I don't know how it looks like and also the evolution of malware and tractors the code just keeps on changing the malware keeps on changing so there is no way we can stick with the current detection rules and hunt for the upcoming threat H threats that's why we

need something to identify unknown threats if we don't know what it looks like if there is a zero day out there or a vulnerability out there which we don't know then how are we going to hand for those that's what the presentations Focus are going to be how are we detecting threats which we know and how how are we going to detect threats which we don't know so as H say we l sauce datase and since we create found a scene and I just create something for se s but using sometime we don't have the class much EAS to modify the data for example netor of fishing email even subject how do that all you have

to to compare a string and then you unfold of fishing in there and then machine Lear for do or you use dat you have I those and next for after talk about sign up which we already have inside in

next graph D activity which is new thing from Microsoft which every graph just when you go teams so in the background you go for user grou whatever and then you can go to as Port when you something in the Sur box for user groups also do but going to be L of block going cost you lot of money and you in some data into or so you have to F or suiz before you test and then last thing is all the other only trading only finding all this other resources anything happen inside the resource other activ not lo you Lo yourself from other dtic like the off for ID G listing a sec from Theo

who modify something you have to go to the diagnostic yourself you have to insist it on so now K going to talk about AWS I know it's a bit of juggling between Azure and AWS but be with us guys let let me know if you have any questions regarding aw going through here so aw's cloud trail is a specific type of lock source which I'm going to talk about there are different types of lock sources on AW there is guard Duty and there is several other uh but then we only focus on cloud tray here so Cloud TR is record any activity that happens in an AWS account regardless of that account is able for capable of

doing an API call or Noni call so we have all those recorded inside inos Cloud tray and I'm going to classify inos Cloud tray logs into three different even categories the first one is management second one is data and third one is Insight so if I talk about management events management events is all about all those management kind of activities that you perform on your AWS account like config configuration of policies registering your devices config configuring rols ring data setting up loging and all of these specific policies that you see are the specific event sources that you see at Ro policy create default bpc create s create say these are examples of the event sources

which is available under management events the next one we'll talk about is data events data event is all about any operations which is performed on or in a resource here we be looking at uh API activity on S3 object Lambda functions executions EVS or elastic blocks block storage they have API calls and snapshot so all of these uh actions like get object put object delete object snapshots all of these actions are very important because that's what what that's what what we are going to be using for our thre Huns and the next one next one ins oh ins sorry yeah inside events inside event is actually going to tell you about any unusual activity or of noral activity so suppose

let's take a classic example here saying that an account that logs no more than 20 Amazon S3 delete bucket API calls per minute starts to log an average of 100 calls so that's an unusual fight whenever there is an unusual activity of this type happens what AWS does is it's going to create one log at the start of the activity and one log at the end of the activity so two logs which is going to tell the start of the activity and the end of the activity and then we uh we are going to take a look at the activity to see what that activities B all about so get on the ship we send you to

Jupiter make this one so first thing first we go with oh this one is that uh yeah

yep too big or too small for you guys so first thing first we use Network X and then some simple cly in Python so um so I got my Sil locks from AO triy from a c CSV we can go from BL stor or S3 or somewhere on Prem depend on where you where you guys extract the lock to so as we can see after grafting we have user usually locking from this 104 IP which is cloud FL like 46 event one event from all of them from cloud flp but when we investigating a lot we see this standal IP from Europe which is one 217 then now we Tye itive inside IP address we can see that M pass and then

of them seees no risk right because MF pass usually we configure policy when MF MFA pass let it go in and then just get the risk but in this Cas when when you e the uh mining mining the middle fishing this is different story attacker already got the token for sign so if we go after that one we graph the unit token identifier which is unit for every section every user so we graph that unit token we go to the graph API lock so as you can see here there's one SP during this time like 4 billion B which is 4 gab of the graph API L within 30 minutes of in time frame and then we

go this is a log scale of that you can see usually they just go like 5 megabyte uh even 1 megab couple by and then so let's see what happen with that IP address so 4 Gaby here from this IP address we see the C API to user calling is one megabyte for this thing R in infra right basically one megabyte we can list most of the road inside the entro ID and then the listing the group like how many user in the group who belong to which group subgroup or whatever group inside that one and the notable thing I only list like uh a th000 user is already 4 gab so if your organization have 10,000 users it's

going to take like really quick 10 15 minutes to call to drag out all the user and then can attacker you can use like fishing email for those user or in person name usually in the entra you have the on Friends ID also as well the uh user ID when you call for helpess for reset password it verify with your employee ID right and then and then to pretend the persistent we have to register MFA here is the pon lock Pon code you can see like get um operation name from audit log which is register security info or updating it this the event basically from the uh a lock but this event doesn't tell you

what device got register or something change you have to look at the uh you update user operation before that just before that's security info update it's going to show you he an iPhone he here is my case my phone is myone phone sometimes say user phone or iPhone 13 something but update user doesn't give you the IP address of the actor and this thing is and this thing is have different correlation ID inside audit log so cor correlation ID in in azer which is same action like in the same section so when they loog in if they have enough permission or they activated Pam access you can disable diagnostic lock which like uh locking you read the BL storage or read the

key in in my case I disable my uh BL stor lock for sending lock to

S and also we have to disable the policy in thisk the Block Story policy um delete ret enson policy so if I exfiltrate a lot and then I delete the whole BL storage no one can recover in this time as well the defender for story Defender storage for example when I upload the mware to the storage or I try to infiltrate book data they going to leer and stop my connection to the book story and Bono disable

maal we can see here when we disable the defender for Block St they also remove the role assignment like so the uh Defender scanner cannot access the block story and then in order to exray we have to modify the network security which is NSG here in my case is allow any IP or Al about income to that be so I can infiltrate those thing easily using easy copy or on

technique so after that stage we can just easily delete all the data if we are tackle so we can encrypt all Ransom delete Snapshot Computing snapshot delete the table but there's one thing IM mtip policy here so if you have a mutable policy setup for your BL storage when you delete your container it's going to be failure so as a attacker you have to es escalate your privilege to remove that policy in order to delete it so after I remove that policy down here I amble to I'm able to delete that BL container as we can see it here the story account backup front one is protected protected by the IM imitable policy and last thing when we up the

mssp we have Li have access to customer Resource Group or subscription so as the attacker I can set up after compromise the poent I can brand my uh user from my tenant some Lal access to your subcription to prevent to to to get this persistent in the future so here we can say usually when we deploy light we need the arm template so we can say all this event is within same correlation ID So within within same action we deploy the template ground access also this is also important thing which is ton register action so in that action we can see mssp tenant ID which is the ATT ID uh tenant ID sorry and the other case is if the victim is

the mssp service detector can move lary to other client that's mssp has so after compromising the whole T if we have multiple Cloud setup with SSO to AWS the attacker also can use those account to Signal sign to AWS so now I going to tell you guys how we can exploit ews

I know that was lot of Jupiter good but then we have a lot more coming up as well so can I guys uh can I just ask you guys to take a deep breath because we have a lot of Jupiter coming on yeah so what we do here in ad is basically the exact same thing he export some diabilities that we need to run our python code the first one that we are going to be talking about is AWS defense evation using impair Security Services what we actually look for is specific event names so if you look at event names here you can see that delete alarms delete tabs delete policy delete log group delete something so we can

clearly see that the attacker is trying to delete some policies delete some logs delete some user groups so some deltion kind of activity is going on which is basically a part of defens evation so when we actually look for all of these event names inside our laws so whatever ACD did was B uh took all those laws into AWS API colle x. CSV loaded that into a data free and searching for these things against it so in the production environment instead of loading that into a data set you can directly quy your S3 bucket or you can directly qu your inws resource for the S of presentation it's all in a CS3 file and you have qu that

CSV file so when we query that CSV file for all these different types of event names we some we see some events here and these events are basically the alerts which are going to be generated by this inaction and going to be forwarded to your the one for investigation or to for investigation moving on we'll be looking at exfiltration wi data Sy C so that's basically creation of a schedule task which can perform various different activities in this case what we'll do is basically create a schedule task that's going to exfiltrate data from your Source called data. amazon.com so we are looking for an even name called create task and create task is basically any schedule task creation

and that schedule task creation is performed on an Event Source data Sy and then we are exfiltrating data using that the exact same thing on Azure would be DC sync then you sync those two DCS and take those data and then you can exate data and do anything with that dat uh moving ahead we'll be talking about G detector Recon time Sor Discovery once you have got foot hold of an AWS account you want to see what kind of resources exist there what kind of list detectors what kind of detectors are there in the given name and here what we see is a lot of false positive because a lot of genuine activities or in admin privileged uh

users perform the similar Discovery activities so what we do is Excel aw service hold for confy cloud guard connect or any other user accounts which are like having a privilege to perform this activity on a business use case or on a angular use case that's why we use these two as a filter and then we only take for any other activity any other user gr performance list detector's activity moving ahead we'll be looking for publicly exposed dat in okay here we are going to be looking at only for modifyed Ev instance and then anything which is publicly accessible already and we know about that we just explored about that so that's the filter of the

supression and we only focus on modify DB instance moving ahead we have changes to I ass management policies so we'll be looking at create policy update policy delete policy anything which has been performed on these policies which is not aligning with the business use case you're going to investigate further on that the next one is non MFA consor login so the event name is conso login and then you're going to pck that out anything which has got a successful MFA or F MF because if it's a failed MFA you don't really want to look into those on so successful MFA logging into console successful logging into console without an MFA is what we are going to be

focusing here next is AWS cloud trail manipulation okay what we are going looking is delete flow delete because Del delete activities update activ stop logging activities which is imping differences or modifying or disabling different tools now let's go into some machine learning part not too much of machine learning but starting part of machine learning what we do is take the data load it into a data frame and what I did was basically EXT two different things account tiny and who was it invoke by account B invoke by inside user identity just got to exra those two and then and if I just take a look at the error messages which I have so only going to print couple of five unique

eror messages in my data set I can see uh user is not authorized to perform an activity specific pocket does not exist Public Access is not found if username password different type of other activities that we can find in our AWS account so the purpose of this machine learning thing on the detction that we finally do the use case is if I see a majority of a spike or a drop in the error codes that I have in my AWS account then why am I seeing a spike in error activity or why am I seeing a drop in the error activity spike is basically when somebody is trying to do and a activity which is not authorized and the

lot of error URS draw is when you actually delete those error events then usually suppose you are having 100 error logs a day and that's the usual thing for last three months and you suddenly see no error logs on one day what happened did the error actually stopped coming in or some activity deleted those Lo what those kind of things so what we do is basically load those into a data fre we select our features very carefully so here we have our features which is s IP address at eror go I want to know what IP address is causing the most number of error tools is it an internal IP is it an external IP then I like just do some data

manipulation which is basically cleaning up of data anything which doesn't have data or data that is not in a particular format c those data and then I use something called Auto incoder for doing an t here I have an encoding layer of two which is the number of neurons in my hidden layer you can just customize that layer to your own for use case here I'm going to use an Optimizer for Adam and mean Square for my L function and I'm going to run it in 50 eobs which is 50 iteration or 50 Loops a bat size of 32 that means I take 32 things in one batch and run it for 50s and here basically

I'm calculating my thresold and my thresold for noral usually you can use a threshold saying that anything which is above 95% of total data point is an anomaly but here what I do is tus of mean mean Square are twice of me twice of standard deviation and addition of mean to that this basically tells me that any in any datas points this per equation twice of standard standard deviation plus me is going to give you 95% of the data points so anything which is outside this would be an anom so after uh I calculate this I trying to calculate the deconstruction error and plot everything so if I run it for 50o I did a graph something like this which

means anything which is on the right side of this dotted righted line is an anomaly and you have to look into the case why that's an anomaly anything on the left is actually okay so can SC yeah thanks a bit a bitow yeah so these are the anomalies which have been spit out so I probably had 100,000 of events 2000 events are actually anable this year and the account I looks like it's for because that actually took this Flow In The Stream but what it says is basically you see all these account IPS and all these error codes the unique error codes are all here so you just have to look into some specific error report which have

been caused by IP addresses and see what the reason for us and given names are here which is like clean cloudstream or decrypt or update instance so specifically SP out any anomalous activity which is a spike or a drop using machine learning so that you can actually look for things which are unknown so here in this case I don't have to write a query saying that okay look for any activity which is creating error code but looking for an activity in general and figuring out if there is a spike or a drop on a early basis or on a unusual basis that's what we're going to be looking in here now going back to PPD let me

see yeah h no that was say from just slide down y next the takeaways from here so we saw the challenges with traditional threat hunting why traditional threat hunting is not the best approach to do tradition threat hunting today how can we leverage Jupiter to do threat hunting which is basically have a jutter notebook hosted on your ew's environment or Azure environment query your data Le on object storage grab all the logs write your requir generate alerts send it to your sim solution expl Sentinel C how can we enhance the cont with machine learning which is basically using Auto encoders or anom inactions or clustering or classification then how can we automate the cting methodology so whole process

that we did which was reading the data set from a CSV file running a running ating detection logic and then generating a l everything can be automated using a flow or a pipeline which can M your S3 bucket run your detection generate alert send it to a SE solution and the e option and integration since Jupiter is very e like user friendly can be adopted very like can be integrated into aw's environment can be integrated into Azure environment can also be integrated into Google Cloud platform so very easy to adapt and integrate to your existing infrastructure and if you guys got any questions about anything that we have discussed please feel free to bring it

up here question is so what what fing do you get from you see jup that's compared to Crystal note which is next so as I say some some some function like data manipulation that Crystal doesn't have uh for example like it's uh email subject when you when you when a tiger sent you a um a million or 10,000 different subject email fishing email so how can you do that to search for similar string inside Crystal like for example document random character and then some username right we have to pipe those data out to hyon and you D flip to compare the string the string we set and then the list of subject and then when the data return we

have a Trad hold about that like if you want to look at like really similar string you can set a threat like zero .7 or 0.8 and then you graph th out it's going to be C subject and then you go funding from th from those need you can do it you can do those thing inside data Explorer if you have a python go any other questions yep I

also so is it for me phone uh yeah yeah for the network um Network X yeah yeah what those that's you for when you graph thing like from nor to X like I uh some you get a list of user and then what action they can perform inside your organization right you can graph okay we have access to uh admin group and then Kai have access to uh user group and then this user will have access to certain resource inside as like a if you have you ever use a phone or blone they kind a similar thing you grab everything you can do so that for red team simulation you can grab the whole attack

path and you follow that path thank you guys

Streamlining Threat Hunting in Cloud Environments with Jupyter: Chi Phong Huynh and Kai Iyer

Related talks