
guys hello everybody good morning we are ilya and alex from cyclode and in the next 45 minutes we will present our research on github actions internals including how we discovered and disclosed critical vulnerabilities in popular open source projects that were using those actions so on the agenda we will talk about what are github actions and why it's such a powerful build system which kind of misconfiguration it can have we will understand the consequences by exploring its internals and we will speak about possible mitigations so my name is ilya i previously worked as a developer and are in the team leader on the ips product in checkpoint i later on moved to fireglass a security startup which created the first web isolation solution and currently i'm working as a back-end technology lead at cycle and i'm alex silgaiv i'm a senior security researcher at cyclode previously i were investigating malwares at checkpoint research a reverse engineer with some interesting piece of malware both crimewares and apts and at the moment i'm searching vulnerabilities and researching mitigations for the software supply chain security that's it okay so you all know github and its code storing capabilities and in 2018 they stepped up their game and decided to create a cicd platform called github actions which allowed its developers to automate their development workflows it really became very popular quite fast and mainly due to its rich marketplace and currently it holds more than 2 000 public actions on it marketplace and also provides free free ci cd for public repositories according to the their numbers github currently has more than 73 million developers and stores more than 200 million repositories so what are the possible usage of github actions the main usage is cicd as i mentioned for example running tests on open pull requests or static analysis you can build your code into containers and upload them to a chosen registry such as docker hub or ecr you can schedule tasks that will scan vulnerabilities in your code you can use it to automatically label issues and pull requests you can send issues to ticket handling systems and much much more so here's an example of a github action as you can see it is just a yaml file that contains when and what to run so in the you can see their own uh keyword which means that this action should run upon every push every new push to the repository and it contains a single job in the single step just to print hello world to create this workflow you simply need to put this code inside dot github slash workflows and that's it every next push will trigger that workflow so let's speak about a little bit about how it works so github runner is an open source project that what it connects to the github action service fetches the jobs and then executes them it can run on a github hosted machine which is the popular use case and you can also run it on your self-hosted environment the gita boasted runners will run as a thermal environment which means they're created upon a workflow triggering and will be destroyed after it ends and for each workflow a new temporary github token is created for the possible api interactions so be before we talk about the github token itself a few words about access tokens in general so in order to access and modify github assets you need to provide an authentication token that details your permissions so as you can see here when creating a token a developer can choose which permission the token will have which are basically a subset of the user's permissions uh inside that specific token so i as a user can have access to many organizations and many repositories and this this token will basically provide those permissions when i use them another thing you can see that these tokens can be created uh with with or without expiration which makes them a lot more strong meaning that these tokens have a privilege to do a lot of damage and it can even not expire at all so github when they designed their github actions they really wanted that developers would not use those personal access tokens inside their workflows so to overcome this they created something called github token and the github token is provided for every workflow that that starts running its default permissions are read and write for most of the events and the permissions are only for the repository in which the github actions is currently running the token is valid during its action the action execution period or 24 24 hours at most and it uses a default parameter in many actions and is this is the preferred method to invoke github api functionalities an important note are forked pull requests which which are basically used when uh contributors want to contribute to some open source project they fork the repository and create a pull request with the suggested changes and if you think about it if that specific repository has a github workflow which for example runs the cic test or static analysis this the developer can basically use that github token with the right permissions to modify the content of the repository by committing via api or stuff like that so github has different mitigations for forked pull requests but the basic one is in those scenarios the github token receives at most read permissions so these scenarios won't be possible another core mechanism in github are the secrets so any meaningful cicd workflow will need to use some secrets for example aws access tokens or or passwords for registries and gita gives us the option to store secrets they save it in a well encrypted manner and if the workflow wants to use them it decrypts and adds it to the to the payload of the workflow there are several options on how to create secrets some of them are on the organization scope on a repository scope or even a repository environment which we will talk about a bit later and here's the first example of a vulnerable action so the sample workflow you can see that the keyword is on issue created on opened this means that this work will run every time i will open an issue in github for a github repository you can see that it has a single step which runs a script and an important note here are the curly braces that are used throughout the script which allows developers to use dynamic parameters in their workflows so github provides parameters on the event triggered for example the issue title and the issue url and also the github token that we discussed previously and this specific workflow basically checks if the title contains the word bug and if so it performs an api call and adds a new label of type bug to that issue so this looks innocent enough but let's see how it can be exploited on the right you can see an issue title that we provided for that workflow and on the left you can see what happens when it is actually executed this title is planted inside the curly braces that we saw before and you see here that the if statement is is not non-existent it's it jumps over the if and then runs a code on the runner itself this example just prints cycle to the screen and the fact that this crafted issue uh knows how the workflow looks and it knows how to start the if and how to finish the if in in a way that the syntax is is valid and the workflow runs so is it a bug or a feature according to github's best practice papers it is well known and they cite when creating workflows you should always consider whether your code might execute untrusted input from attackers which is very nice and very friendly but i'm not sure that all developers in the world start by reading the best practices documents before they start using the platform itself so we wanted to know how how popular the usage of these batters are we used a tool called github search and which is currently in beta but it's a very nice tool you can just add keywords to the search here and it will search all public repositories in github and will return the results you can sign up and try it out it's really fast and really nice and you see that we search for the github event issue in curly braces and also the keyword run as you see we have two hits here in which we find workflows that indeed can be exploited in the way i just showed so is it widespread we saw we found many many a popular open source projects such as liquid base which is a tool for handling database schema changes wire which is an open communication platform and many more and we can see that according to the downloads of those open source projects and the their usage these vulnerabilities are are potentially affecting millions of users so here let's dive a bit into one of the use cases of the wire specific one and here you can see a part of their workflow you can see that it is triggered upon any issue comment and an important note here is that an issue comment is used when you add a comment to an issue and also when you add a comment to a pull request so github users are using the same event for both of these scenarios and you can see several steps the first step is basically checking that the github that the command body contains some keywords zenkin's review so if we add a pull request comment with the word zenkins review we will go past we will pass this if we go to the next one and here it just checks that whether the comment is on the pull request or not so if it is we continue to the next if here it checks whether the title starts with some keyword then end with some keyword and if it doesn't you can see the two echo commands and the second one is basically printing out the issue title for debug purposes and this is exactly what can be used to exploit this very popular workflow on the right you can see that after we disclosed this issue to wire they were very fast in patching the problem and it was very simple you simply need to use an environment variable so you see the end at the top storing the issue title in that environment variable and then you can just use that and it is already escaped and the code will not run when you use it in this format so what are the consequences of a build compromise you can expose secrets as we mentioned in order to create a meaningful cicd pipeline you are probably using secrets so in this way once we we have code that is running on the runner we can use it to expose the secrets to the sensitive assets we can also use the github token the one we discussed before to commit to the repository as i mentioned by default you have read write permissions to that repository so we can create a workflow an inject code that uses github api with that token to commit code that is not really part of the pull request inside that repository in such a way an attacker can really create critical supply chain incidents without being really reviewed or approved in that manner and the much smaller risk would be the malicious active ability to run botnets or crypto miners using runner infrastructure so in this point i will allow alex to dive a little bit deeper to the vulnerabilities and the mitigations [Applause] yeah so so thank you very much ilya let's dive a bit deeper technically deeper so ilia explained what could be the the consequences of such build compromise and will soon explore how an attacker could actually reach these consequences from technical perspective so for that we created this intentional vulnerable workflow which we'll explore through our example so this workflow first will be triggered whenever a new issue is created it defines the new environment variable for demonstration purposes soon we'll see while we're doing that and it has a three steps it has three steps the first one doing checkout this is an external action it's using the checkout command which basically does git clone to the code into the runner environment very simple and it has two additional run commands the first one just prints the issue title and description and the second one is run runs a c url to the github api to update this issue a label with a new issue so as ilia showed previously this echo is susceptible to injection attack because they are not sanitizing the the the title and the body so an attacker a malicious attacker could potentially run his code at this point exactly this exact point so what could he who is fetching in this uh in this sample he could get on the one side connect this this github token and use it for his malicious purposes or he could get this additional bot token which comes later in the strong command and see how we how he does that first in order to er to ease the testing of this random infrastructure instead of creating workflows and testing each workflow when he runs we created some lab environment in which we [Music] made a reverse shell from the runner environment to our personal computer for that we use the popular tool called ngrok which does basically a tcp or http tunneling even if you're behind firewall or not so it's really a really cool tool we just run the ngrok with the we installed the tool on our computer we run android tcp 10000 tcp is the mode it could be run in each http also and ten thousand is the port in which we want to to listen after running it we received from a android android cloud received this end point which will use it later in their exploitation and then we just create a simple netcat listener on port 10000 and at the end we created this simple bash script which does the the reversal it's you could find the script easily in google so combining all together when we were sending this issue title this looks quite complex but we explained how it really combined when we when we send this to the github repository and while we get our reversal we have a control on our computer on to the runner infrastructure so we can explore it and find it any interesting stuff in there so we won't overload you with all the reconnaissance with it on that machine you are welcome to check our full blog for that but we found some interesting pieces of data which we'll use later as we as we will show in this in the slides so let's go back to our previous example so first very simple thing an attacker could do if if we have a code execution capability is to print environment variable this simple command and find for some interesting stuff in the environment variable for example we have this github token defined as an environment variable which the attacker could just print the variable and get it and use it very simple it also happens in real world scenarios not in only our our sample a second scenario that attacker could do is use the checkout command as i said this command just does a git clone to the to the code but it also sends a default parameter which we are not seeing here but it sends the github token as a default parameter to the external checkout this github token is also used as a terminal authorization token for the git clone so wherever we're using a git the git set tooling also know that whenever you're doing git clone with some token it also saves that token in a that git slash config file so because we are running as an attacker after that checkout was made we can access this dot git slash config file find the authorization line in that file and just pipe it through base64 decoding and we get our github token which it will use which was sent to that action and used to to clone the code so as an attacker we have another method to to fetch this sensitive token this was the second scenario the first scenario is a a bit more complex and during a reconnaissance of the runner environment we noticed that each run command we have two of these here each one of this before it's been executed it's also is saved on the file system as a shell file and the runner saves it and then executes it so why it is interesting because in our case where as an attacker we have code execution at this point we didn't receive the second command yet so we have only this single run command you could see here as a as we're printing the directory the render type directory which saves this shell file we have a single shell file that contains the same content as this one but instead of the curly brackets placeholders we have the real values which were inserted as the action triggered so if we'll get this second-round comment somehow it contains also this secret the secret bot token which will be placed as a as a real value which as an attacker want to grab so if we get a foothold on this run comment we also get this bot token this secret but as i explained the attacker have caused execution at this point so how can we fetch the the next one that hasn't been executed yet we have many methods to do that a simple method either thought well off was just putting some persistent script on the runner uh what does it mean it means a simple uh you know our case was python script that was monitoring this directory and whenever a new shell file is written to the directory this file would be immediately sent to some control server as me as i'm simulating an attacker so i created the server and whenever the new file will be there it will be sent to me so what will be the steps creating a some server that records all the all requests creating some python script that records modified shell script in that directory i packaged it all into some docker container to ease the deployment and i run that container on the runner in a detached mode mapping the volume and and indicating the url which you would send the file to uh we'll soon see in demo how it works all together so these were the three scenarios we showed how to fetch secrets but there are many many more they were really simple and more sophisticated attackers applied sophisticated methods which we won't include in this in this slides or in the article additional methods could be inspecting the the memory layout of the process inside the runner try to extract some in sensitive information from within the memory it could be a monitoring created processes so maybe the secrets were sent through environment variables to the processes so we can fetch maybe interesting information there and there are many many more methods for further research so let's start with the demos uh for the first demo we'll show how can we exfiltrate secrets we will do it in two steps the first step will just send our simple github token as we explained through the environment variable and the second phase of the situation we'll put some persistent script on the machine and wait for the second command that will be sent also to our server so let's see first we set up the server our control server and we're sending the malicious issue to the repository this issue contains several commands if first it will call the github token which is the first phase and the second one will run the the docker run as you can see we got already the first token which were very simple through the environment variable and and we got also the second phase with the exfiltration you could see we have here i don't know if you remember but wherever the complete script that was the third step for in the sample workflow we have the conflict complete the best script including the token contained in that script so actually we managed to get it so for the second demo we'll show how we're able to commit malicious code into the repository without the knowing of the maintainer of that repository for that we have we provided is really simple a bash script that contains it that receives two parameters the first one is the the file that we want to commit is a url from where we fetching that file and the second parameter is the path in the in the directory where we want to commit the file to it's a really simple script it fetches the file and the does some several git commands like adding the file configuring the the code the committer we can put here whatever we want we want we can impart impersonate other committers and then we commit it and push the code so on the runner side we just we will fe