← All talks

Santiago Abastante | Hiding Malware in Docker Images for AWS Hardcore Persistence and Defense Evasio

BSides Zagreb47:4763 viewsPublished 2025-03Watch on YouTube ↗
About this talk
Presentation: Let’s build an AWS Backdoor that can evade all detection mechanisms existing so far. Are you up to the challenge? Our objective is to execute commands against an AWS tenant from a remote location without being detected by AWS mechanisms like GuardDuty and minimizing our fingerprint in CloudTrail API call logs. To achieve this I am going to explore a technology stack that besides its availability is not being widely used: Running Docker containers within lambda functions in a full serverless approach. Speaker: Santiago Abastante is former Police Officer from Argentina, now a Cloud Incident Responder and Security Engineer with over 10 years of IT experience. A Digital Nomad an international speaker, I’ve presented on Cloud Security and Incident Response at Ekoparty, FIRST, Virus Bulletin (three times), Hack.Lu, and various BSides events worldwide. I hold a Bachelor’s degree in Information Security and an MBA (Master in Business Administration). Recorded at BSidesZagreb (https://www.bsideszagreb.com/). #cybersecurity #bsides
Show transcript [en]

[Music]

[Music]

[Music] broken English

[Music] so uh our first Speaker come from Argentina this is Santiago abast and he is a former police officer from Argentina he is now Cloud instant responder and security engineer and he will talk about uh hiding m in Docker images in a for hardc hardcore persistance and defense evasion so this is a pretty pretty modern topic I believe we are all interested in that so uh Santi how let do the talk great thank you hi all how are you doing well as he said I'm San from Argentina this is my first time in Croatia so I'm really glad to be here thank you for having me and also I came here alone so please talk to

me I I really like like going to community conferences because I want to know how every different countries do cyber security I find that it's not the same how we think cyber security and I think sharing is like the best that we can find in cyber security communities so yeah I I I I would be here all day so if you're around and please reach out I would be more than glad to uh the talk is called hiding marware in Docker images for hards persistance the main idea uh I an inan responder and I see all day that from the offensive part of the security we are like all well trained and do and know a lot of and

really know what we do when we are doing a penetration test or rtim exercises but most attackers are not like like us right they maybe are well trained in some technology like AWS or aure but when they got to compromise an environment they don't do the happy path and they move along what with whatever they know or what with whatever they can find H so the main idea of this talk was like to try to understand how like the internals of the AWS components work so we can understand what the attackers can do besides that we can learn in an AWS training course so the everything well this is me I like traveling I like hiking ER I I'm a

digital Nomad so that's why I'm here I'm staying in Europe for three months ER this thing came to me in Japan it was in a inan response conference talking with a friend of mine that works in virus bulleting in virus total have you heard about virus well he was talking to me about some Brazilian trade actor that use dely to code malware and he was saying like yeah you know they use dely because nobody knows how to de de compile dely because it's like a really old programming language so most modern researchers are most used to see and go but whenever they find the dely piece of marware they like struggle and also the the Brazilian

used to like create big files to bypass antivirus because he told me like okay antivirus when they find like a really big file they tend to skip them and it was like [ __ ] how is like the antivirus industry works like that ER and it was like what can we do from the cloud perspective or what are the the cloud industry doing if the antivirus industry that is like 20 years longer is doing those skips right so I started to think uh how what how does the the the the the virus and marware detection working in the cloud environments like Docker components and Lambda environments and AWS stuff right so what I did was to

create an AC file uh have you heard about the AC file well for those who haven't the AR file is like a string of text the text designed for the antivirus to be tested right so this is something that every antivirus should detect and I published that to virus total and got that almost all um engines was were able to detect them but then what I did was to sip the AC file and try to push that to Total to understand what was the difference only by sipping the file as you can see here we have like 61 detections if I go to virus total vir total and I choose like this aard do zip is the same file but we already have

like less detections that what we could have like with the normal file so and only by sipping it so there are already some engines that are not detecting this uh and I found that like super interesting because it's like okay H if an antivirus does that what what happens with Docker right so I created a Docker file plain Docker file only with the aard text and push that to a Docker reg and I was able to push that without problem I was able to upload that to dockerhub if you go to my Docker file dockerhub link you would be able to download that this means that every trade actor is able to push Docker files

with malware to a dockerhub right without any restriction at all and this is particularly dangerous because ER this is like a trusted Source right so I started to think about how to connect everything together especially related with thrustor so trusted Source Docker images the lack of knowledge from incent responders regarding all this set of tooling because what I found in the in the real environments is like a not every inan responder knows knows about cloud and not all Cloud Engineers know about how to do in response so there is like a skill Gap that attackers can like take advantage of I also try to run some kind of scan the the the scans designed by by Docker

Docker Registries and those kind of tooling is more related to vulnerability management so they are ready to find like misconfigurations in the doger file but not that that they are not like looking for compromise or or to malicious code within those U those tools so it's it happens the same with ECR so with all this like brainstorming I was was thinking about okay if an attacker got persistence or got access to my AWS account what they could do and I started to think how an attacker could like use all they know about AWS to exploit this knowledge to bypass most of the defenses to understand if AWS was able to allow to allow me to do that as

an attacker and notify me as a Defender right because the main problem with AWS is the share responsibility model in my mind as an inent responder because the share responsibility model what it does is like say okay this part of the stack the technology stack is yours and this part is from AWS so for example if I'm using a ec2 instance that is a server I as a as a engineer I cannot handle the internals of the server I cannot handle the power grid I cannot handle the internet but I can Harden the operative system I can work with the networking besides I don't have net flow for I don't have like bucket inspection I would have net flug for example but

when I'm like moving to the SAS part of the of the share responsibility model like a Lambda function there is less than a Defender can do and there is more in the AWS side of the responsibility mode and we cannot delegate responsibility so if I trust AWS to do some security practices and they are not doing that and I'm compromised because of that H besides his C share responsibility model I would be still be accountable for that compromise right so when working with SAS environments the issue is that I I can do less things H which one it's only for me to get an idea how many of you work with AWS cloud and that

stuff okay half of the half of you well I are you following with me with the with the sh responsibility model thing the the idea with a Lambda function is it allows me to execute code H in the cloud so the only thing I can handle is the code that is going to be run so I don't have access to the operative system I don't have access to the network components I don't have access to anything of that what I do have is a lot of security tooling from AWS to do thread detection login monitoring and and that's what we need to to bypass in order to execute code in these kind of environments the thing

with that is like those TR detection mechanism are based on the logs that AWS have and we can trick the cloud to see what we want to see for example we are going to go in detail further but if I execute an API call everything in the cloud is API based so a Lambda function a server uh the creation of a user are API calls and are going to get loged no matter what they are going to get Lo but it's not the same if I'm creating an I am user from Zagreb and I'm also I'm working every day from Zagreb that suddenly I'm executing an API call from Argentina because that's weird but if

the API call comes from an AWS server it's it's like more more for the detection engine is easier to bypass their behavior analysis and their machine learning algorithm that they have to create detection so it's easier for them to associate with a false positive and bypass the detection so we're going to trick the engine with those kind of of things for you that don't know the most important component in AWS is what's called I IM that's identity and access management that that's the ER the part of the cloud that allow me to execute API calls is my are my permissions is like the the things that I can do as a user in order to

achieve that to achieve what I'm going to do I need sufficient permissions first I can Elevate privileges in different ways like attaching policies or modifying policies or or or there are different things that I can do but the most important thing is for me to do what I'm going to do now I need need sufficient permission so the most important thing that we can do as Defenders is to harden that it's not an easy thing because most of the time when we start doing things we do like in a secure way but time to Market used to drive like companies to do things like faster and suddenly they soften the restrictions and suddenly they have ER

permissions uh use users with administrative permissions for example infrastructure as code users GitHub users the things that are used to create infrastructure the operation users that are used to deploy code those uh entities used to are used to have like a lot of uh privileged and we can exploit them to to move along and execute what we are doing to we are going to execute now as I told you this is how a Lambda function how you can see a Lambda function the for me the most important feature for an attacker to understand is like we can deploy code like I'm going to show you the Ida right now but we can create code using the Ida in the Lambda

function and if I as a Defender need to get that information I'm going to see the the the the malware and the and the malicious code but I can also upload a Docker image to execute the Lambda function so then I'm not going to be able to see the code I would need to get the docker image and then de compile the docker image to understand what's going on and most of the time there is when I where I can like hide and encapsulate my malware to do what I want to do and also to generate layers of complexity for the inent responder think that if you are an inan responder and you are getting this

you need to know already how the networking works how work how Docker Works how to run different architectures of Docker images for example I can generate a Docker image with a different architecture that the normal operative system works for example if I know that you're using Windows I can use armm and then you need to now know how to run an AR docket image in your Windows machine for you to analyze that and those small steps of complexity give us time the other thing that we need to understand is that there are Docker Registries that are the the the AWS H tool for replace the the the docker official registry repos right H the thing with this is like it's another

trusted source that will bypass the security detection most of these tools what they have is like they are associated with the normal software development L cycle so if I'm creating a dock registry and I'm adding a image to a doare registry and then I'm running a Lambda function from that registry that's something that is normally associated with what a developer is going to do in a production environment it's not that I'm creating a server out of the box is something that uh tends to be part of the normal behavior of an account that's what why we create an A account so these API calls are not strictly bad on their own the thing is that connecting everything is when they

get like to be dangerous for the tenant the last thing that we need to complete the what I'm going to do is the API Gateway the API Gateway allows me to expose to the internet a Lambda function see the Lambda function would be presented to the internet using an URL that is provided by Amazon web service so if I'm doing something against the this API Gateway I'm going to be able to expose an API endpoint that is going to be logged with an Amazon web service domain so if I'm working from inside a computer from a workstation of aox and I'm executing API calls against this H this end point the log in the workstation is going to show a trusted

domain from a most of Trad intelligence tools tends to bypass a trusted domains so Microsoft aure so I'm going to be able to execute API calls from a trusted ER from a trusted source to a trusted destination so that would a hide my source IP and also would hide the request I'm executing to ER the thing with this is also part of the development L cycle this is the architecture di diagram that I'm going to replicate as you can see here we have like the the Nate user so I don't I don't need permissions to execute the API calls against my VOR I'm going to execute API calls against the API Gateway this will trigger my code

ruing in a Lambda function I need to give the Lambda functions permissions to execute what I'm going to execute the Lambda function is going to be pulled from an ECR repo so the thing that I need to do is like from an authenticated user being able to build this infrastructure with terraform or whatever if I build it manually I would be able to execute much less API calls everything that we did we do AO in automated way is going to generate noise if I execute this with the AWS CLI I can execute the specific API call that I need and that's going to lower my fingerprint so I tested it created by with terraform and it was not detected

but at the same time we depend on what this the detection mechanism that the defender have and we're going to see a little bit how to identify that and how to reduce and and and dampen their def fenses to understand better what we can do and if we can execute this this is the meter TTP that we are going to execute we are going to use a valid account to execute a server's execution to implant an internal Locker image um the thing is that I'm using aaor because it's the most noisy thing that we can do H this should be like detected because I'm like controlling an AWS tenant from outside organization and it's not Ed but you can use that to

execute whatever post exploitation tool that you would need as a red team exercise are we okay so what we need to to evate first of all there is bpc flow logs this is the net flow log that would be like monitoring everything that working in the in the network level of the of the AWS environment but we don't care about this because as I told you the Share responsibil model is about that so this is aw responsibility there is no bpc flow logs for what we are going to do so if the if the defender has VPC flow locks enabled they are not going to detect they are not going to get the trace of what we are doing here

there is a API Gateway logs but API Gateway log is going to log the queries against the API Gateway but as we are creating the API Gateway we have the control on if this is enabled or not we need to check that this is not enabled and if it's enabled to disable it or at least to do the same Tech techniques that I will show you to disable other types of log to make this what not happen right this is the important one clo Trail and event history clo Trail is the log that is going to record every API call everything that we do from creating a user creating the API Gateway the Lambda everything everything is going to

be recorded in cloter yes there are two types of ways of of getting cloter logs the most important one is the event history event history is always enabled and cannot be disabled it's something that's going to be always in every AWS account and the thing with clo tril is that heav history is that it's going to record up to three months of L so if I execute something now I cannot dis that I cannot H remove access to that and it's going to be recorded for three three months the clo alternative is is the most strategical way to implement this the defender needs to enable clo Trail to get them they it's going to be shipped to an S3 bucket and

they can get whatever amount of logs they want with clo the issue with clo is that that it has like 15 minutes of Delay from when an API call is executed and when it's recorded in the bucket so the issue with that is like I have like those 15 minutes of time plus whatever it needed to be interested in the seam and whatever is need to Tre an alert so I start to create like a buffer of time that I have to execute whatever I want to execute the clo integration is the most common to for CM Integrations and then for the incent respon when we see logs in iws is mostly the the the second case and we have those 15

minutes to do what we want because if I have those 15 minutes to disable the log being generated in clo Trail I can implement the infrastructure and then I can reenable the the clo Trail logs they are going to see a small gap of time on what they are not being getting locks the thing with clo is when they are shipping those informations to the CM H they have different layers because in AWS you have different tenants that are different accounts and each account have this has different regions so I can disable One region not all of them and with that they are going to keep getting logs so they are not going to get the

alert that they are not getting logs because they are going to get logs from different sources I can work in the region that I disable and then when I reenable the logs the are not see the difference especially if they are not monitoring the the tampering of those of th of those those kinds of information ER the thing with EV history is I cannot do that H but I can work with the if they are shipping the history logs to the CM the event history are sent using a cloat rule that I'm going to show you in a in a sec H so I can work with the cloat rule that is Shi the logs I cannot work with the event

history on itself so no matter I do this is going to be record the thing is that I can the the event history API is like really difficult to work with because it has like a really low rate limiting and it's and the export feature is like awful so for a Defender working with event history in multiple accounts and multiple Rion is like a mess so we can H that's why everybody like Trust but everybody both of them has their drawbox and the both of them have the the strong point and the most difficult thing that we need to bypass is gouti that is a threat detection tool I told you so whatever I'm doing with against

AWS it's going to be recording gy and if it's bad it's going to trigger an alert for example I can show you the my my gouti findings here I have like the gouti this is the Gard I I don't know if you see it well because of the light but this is the the these are the findings that have been Genera these days regarding what I was doing for testing and stuff um what I did was to generate an integration that's going to whenever I trigger a an event I can execute a command to force an uh to force an event and this is going to generate a new finding and whenever this finding is is

generated I need to change region I'm generating a sample finding as you can see here here this is a sample finding I I have just generated with a severity high and whenever this trigger is going to is going after the integration happens we're going to monitor this because it should notify me on Discord right so I have the security detection enabled with the with the full integration for me to get a notification with Discord the thing with with gouti is like the time in the between the interactions with API calls in AWS it's always set on pulling so it can it can wait for five minutes 10 minutes 15 minutes based on every part of the

integration to work so if I try to disable gouti that's something that most Sims are going to detect but I can work with integration for gut to ship the event to the in response team and that's not something that is usually monitoring so the integration in the mwh that we are seeing ER the the the event to get to be getting here that it last last time I present this the event arrived after the talk like two hours well we were having like some coffee ER but the the integration is easy but basically I have a cloat rule that's going to work like this it has like an event pattern this is my cloat rule and the

clo watch rule as you can see it says GTI gy finding right this whenever the event boost the event boost CES this is going to trigger that to a destination and the target is like H this GTI discore integration Lambda the GTI discore integration Lambda what it does as this is the Ida I told you is like this is how you should see a Lambda function working if you are not using a Lo image you can see the the the the source code of what the integration does this Lambda function what it does is like get the shason EV from the gardu trigger pass it and publish that to Discord right the thing is that if I as a defend as an

attacker understand how the integration works so I have GTI I have the cloat rule and I have the Lambda function and I know that the defender is using the this integration to get notified it can be Discord it can be pay Duty op whatever I have different things to do for example I could try to get the the the code and to understand what are they doing to notify in this case is discore because I have this wording here that says discore URL right and I'm getting that this getting the the Discord URL from the environment variables so I could check in the configuration the environment variables and I can see a disc or web hop right this is a common

integration that is normal to be seen if I change a letter in the in the in the disc in the web hook the notification is going is not going to arrive and the Lambda is not going to fail so I don't need to execute the API call that is the activate detector of gouti I can update the Lambda function that is also an API call that is common in the in the in the in the in the software development life cycle and that API call is like not as dangerous for a detection perspective than deactivating the the the gouti so even though the gardu is going to detect me they are not going to get the

notification besides that gardu is not going to detect me I don't know if I other things I can do is like I can change the event pattern to reflect what I can what I want right if I change the seven pattern the API call is going to be update cloudwatch room and not most and most CMS are not detecting those kind of changes and if they are they are going to get a lot of false positive because these kind of events are used to monitor every alarm in every ER part of the software development life cycle so it's going to trigger a lot of false positive if they create a rule on this and it's not fine-

tuned other things that can detect me is like clo tra itself right if I go to clo Trail I would see that we have different Cho stuff like the clo Trail that's shipping logs to S3 so the logs are going to be shipped in an S3 bucket so if they are integrating this to the Sim they're going to see this in a formatted way if they are not and they only have this enabled they're going to see this pattern that is awful for analyst so you have year month day and a lot of shason files in this so if you're an analyst and you get this you're going to struggle a lot because you need to download all the folders and

start passing that and this is a lot of data to analyze H the thing with this is like the S3 bucket is like any other storage and the S3 bucket has a policy for me to understand if I can ship logs to the to the to to the bucket itself so for example if I go to the to the bucket again we are going to see that this in permissions it has a bucket policy and I made a change in here that says in the principal the principal is the one that says who can publish information in the bucket so the principal says guty when it should say clo TR so the that API call is to update

the bucket policy of this bucket and as I'm generating the API call to the3 bucket the stre bucket is not going to accept logs anymore ER that's a dangerous API call but as it's not accepting logs anymore the Sim is not getting that API call and the alert is not going to trigger right and also if it triggers when they are trying to deag the the the the the issue is super hard for an analyst to understand what to look in the policy because it's something that not most H is the most most difficult thing for us to understand what's going on with the policy and also if I change the principal by a principal that is allowed

like UTI instead of clo tra and you are like in the middle of an incident and you're looking for clo tra and you sayuti is almost the same because it's also a security tool so it's easier for the analyst to to get confused right I I have other ways to to to hide myself but let's move to the to the to the last part of the demo so the the thing is that what what I'm trying what I'm going to do is to generate this doer file ER this doer file is

uh this ster file what allows me is to execute to generate a a Lambda function with an awli so it's not a malicious code it's something that is common I'm installing awli inside the docker file I'm running this code that what's going to do is like it's going to pass whatever I execute in the post request to the to the to the command part of the docker file ER and I'm going to run in the architectural diam that I show you so in a Lambda function from an ECR and using the API Gateway if I show you here the API Gateway we can see it this is the the end point that I'm going to execute and as you can see here

I'm executing things the API is in Virginia but as the I'm using Virginia because it's a default region but and also because I have all my monitoring tool enabled in Virginia if I go to the gouti integration um you're are going to see that everything is enabled for detection yeah I we have this Lambda protection tool fully enabled we are in sou Lambda protection tool fully enabl so the thing with this is like whenever we try to execute API calls using Postman again the end point for example I can use

this this is for example an STS G identity this is the URL for the API Gateway and this is the end point of the API and I'm passing the command that is AWS STS get color identity STS identity is like a who am I for the for the for AWS here you can see that is like user ID malware test this is the irn of the role that is being used by the by the Lambda function that I'm executing API calls against the thing with this is like the this is an assume role and the malware test name is the name that I gave to the session that I'm executing so I'm going to show logs

from the malware test session but the malware test itself the malware test role doesn't exist then real name of the role is malware test role so whenever you are doing the investigation everything is going to be loed as malware test or the name that you want to give them and if you use a different name that the RO name it's it's really difficult to match the the origins of that for example what other things that we can do for example is like um we can execute an LS on a stre buckets right this will allow me to see every storage bucket in the AWS account the thing with the S3 end point is like this is a rational endpoint but

at the same time the3 buckets are Global so if I execute this this command as you can see here there is all my S3 buckets if you execute this against the default region this is going to be logged in Virginia but I can execute the same in Oregon for example instead of executing this in Virginia and what this is going to do is going to retrieve me the same information but the log is going to be recorded in a different region another misconfiguration that defend venders do is like they enable all the detection in the regions that they are going to use and not in the regions that they are not going to use so this API call is going

to be logged in virgin in in Ohio and the gouti is enabled in Virginia so I can get some enumeration and retrieve some information from a different region being logged in a different location and also working in a non-monitored location other thing that we can do to to cover up a lot more or or infrastructure is like shaming from a from different roles I can use this command that is called assume role to assume a different role in a different account so I can use this Lambda running in onews account of my compromise tenant to Sham to another account and the only log that is going to be record in the AWS account is like

is the assume rle and everything else is going to be executed in a different region as we can see here we are going to get a lot of credentials that we can use for the for the role to be used as you can see here we have like the access key the secret access key of a different account if you here it's really small but the the the account number is 59 it start with 59 and if we go to the get color identity this account they start with 94 so this I'm jumping from one account for the other from using this Lambda Lambda execution the most interesting part is like GTI haven't detected me I hope the

last finding is still the sample that I generated the high one that I show you but also if I go to clo TR and this is the the the most most important part I think ER for the detection and I go to event history I'm going to see a lot of things right this is how uh event history looks as you can see here there is the malware test API call that I was executing um again you need to you need in order to investigate this you need to know that the malware test a user is the is the dangerous one if I go to the I am the role I'm using again there is no

malware test user so if you see an API call coming from a user and the user is not in the user list is weird but you need to see that first and if I go to the role you know the role is malware test with some shies after that there is no malware test role because that's decision that I set to the role when I'm assuming that the other thing is is that if I filter by T name and username malware test we're going to see all the test for example if I go to the assume assume Ro and we start to analyze what's going on this is how a a clo log looks like I have this

IP this is the IP that's being recorded in the clo TR logs right but if I go to see what's my EP this is the IP I have in my computer so besides I'm executing API calls from a sagre IP the the the the the IP that's getting it's being recorded in the in the clo TR log is different and if I do a an analysis of the source [Music] IP I'm going to see that the origin of this IP is an Amazon data service Serv so besides I'm executing the API C from my computer the registry it's going to be registered for an Amazon server from Virginia another interesting thing that I can show you here is what happen if I

change to Ohio and in Ohio I'm going to see much

different logs it's not recorded yet so the API call that I executed in the S3 bucket in Ohio is not is not yet recorded I haven't got here it arrived the sample finding that I executed first so we did all this thing and yet we got the sample finding so if I was I had like almost like 10 minutes to do whatever I want in the meanwhile that the gouti generated sample finding arrive to the notification server so we need to work with this kind of er space and time modifications like the the 15 minutes with clo Trail the 10 minutes with event history the reion trick with whatever is going to be recorded or not and using

AWS services in normal development life cycle to bypass what the gouti understand that is something bad so if I can execute API call within the Lambda within my malware with a noisy thing as a back door imagine if you do something like more ER more silent like trying to execute an infos stealer or get getting persistence or or whatever you want the the the the thing with the and the difficult thing with the with this kind of thing is like if I am assuming a role in another in another account I have compromised an account with this technique and I'm assuming a role in another account the assume role is also going to come from another awsp so the second

compromise account is also going to get this kind of stealthy Behavior where the API calls are coming from the most common thing that we do as inan responders is to try to hard push the source IPS try to understand if something coming from a weird location and with this any anything is going to come from a we

location well I have some security guidelines to show you as last thing ER this is important thing that we should do as Defenders to stop this from happening uh we need to block whatever we are not using regions AWS Services if you're not using Lambda you shouldn't be able to trigger a Lambda in your users account Less privil on a is the most important thing because if I got taxes like this you're not going to be able to detect me not because you are not doing your job because AWS is not providing you with the tools for you to do that ER monitoring try to enforce monitoring on those resources that you know that are

part of your critical infrastructure so you have your Lambda function is used to notify when something is happening with GTI create create detection rules from that Lambda 2 from that clatch rule too uh and yes I think I'm out of time so that would be it I don't know if you have any questions or

something it's cheaper no the the the advantage with Lambda function is like it can be really cheap if you are like doing something small ER so if you need to handle a big server to Pock to do a poke on on on on on a development it's like super hard but it can give you this big holes right and also there are a lot of info Stealers trying to get your credentials from wherever you are so it's like it's important to be safe with with I so if you think about any question you can catch up with Santi after and thank you for your presentation no thank you for listening

[Applause]