← All talks

Cloud Protection: The Fight of Machine Learning Against DDoS and MitC Attacks

BSides Prishtina · 202625:4969 viewsPublished 2024-09Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
This presentation examines how machine learning algorithms detect and mitigate DDoS and Man-in-the-Cloud attacks in cloud environments. The speaker demonstrates a case study using Decision Trees, Support Vector Machines, Naive Bayes, and K-Nearest Neighbors trained on network traffic characteristics extracted from simulated attacks on an AWS environment. Decision Trees emerges as the most effective algorithm, achieving 100% accuracy in identifying malicious packet patterns.
Show original YouTube description
Cloud computing, hailed for its reliability and scalability, has emerged as a transformative force in the realm of technology. Its allure lies in the promise of accessible and efficient data management and processing, drawing countless organizations into its virtual embrace. However, behind the curtain of innovation, cloud computing is not impervious to the shadows lurking in the digital realm. In this presentation, we embark on a voyage through the ever-expanding landscape of cloud protection. We find ourselves in a precarious world where the cloud is constantly besieged by a myriad of cyber threats, among them the relentless Distributed Denial-of-Service (DDoS) attacks and the elusive Man-in-the-Cloud (MitC) computing attacks. The challenges are manifold. Cloud computing is deeply entwined with network connectivity, and the reliance on external services introduces a vulnerability to downtime and vendor lock-in. Furthermore, the cloud's potential to empower users is coupled with the daunting task of safeguarding it against malicious actors. It is in response to this vulnerability that we set our sights on fortifying the cloud's defenses. In a groundbreaking approach, we turn to the realm of machine learning, where algorithms like Decision Trees, Support Vector Machine (SVM), Naive Bayes, and K-Nearest Neighbors (KNN) emerge as our digital champions. Our quest is to harness the power of these algorithms to detect and respond to the cunning tactics of DDoS and MitC attackers. Our journey unfolds in a simulated cloud environment, where we recreate the battlefield of digital threats. Here, we subject our machine learning warriors to the chaos of DDoS and MitC attacks, gathering invaluable data that becomes the lifeblood of our investigation. This data forms the crucible in which our algorithms are trained and evaluated. The revelation lies in the effectiveness of these machine learning guardians. They prove adept at identifying and classifying malicious activities, drawing a clear line between the malevolent and the benign. Among them, the Decision Trees algorithm emerges as the brightest star, displaying the most promising potential to defend the cloud against a multitude of cyber threats. This presentation doesn't only sheds light on the battlefront but also offers insights into the future of cloud security, where the cloud's defense is bolstered by the ever-advancing arsenal of machine learning.
Show transcript [en]

okay hello everyone so first I would like to thank you for your time to be part of this presentation today and also at the same time I'm very honored that I had the chance to present this topic in front of you and I think that it will be a very interesting topic that I'm going to present today so my topic is cloud protection the fight of machine learning against the US and ITC attacks but before starting I would like to give you an introduction about myself for those who don't know me so I amarachi and currently I'm a teaching assistant at the University of Pristina at the faculty of electrical and Computer Engineering and at the same time I'm a

first year uh PhD degree student at the same faculty and also uh The Bobs engineer for almost the three years so that's also the reason why I have oriented my topic into the cloud related to my experience that I had until now uh and also through this topic we will not see not only the fight that has machine learning against the D and IC attacks but we will see that how does uh machine learning uh enhance the detection processes uh of uh these types of attacks and also the others so uh I haven't prepared you what I'm going to talk to about today but we're going to see along the way so it could be maybe more interesting for you

I'm going to start B with an introduction then we're going to dive in uh into the world of cloud cloud computing and how uh machine learning has its approaches through their algorithms to improve this uh process of detection of these attacks so as all we know uh cloud computing is one of the most popular Technologies nowadays so if we see uh before some years ago and until now there are a lot of Chang that has occurred uh in the it companies uh I have showed here in this figures uh the changes that have been done from 2019 to 2025 uh you can see that there has been un comparison between a traditional or on premise and

cloud computing that's uh is going to happen now so if we can see uh cloud computing has the total revenue uh comparing to Trad one uh having a very great growth in revenue and also in usage now we can say why is this so uh we know what are the advantages of cloud computing including its uh scalability and also its safety and many other things that uh improves many processes in the IT industry but also apart from the advantages that cloud computing has it also has as disadvantage ages that we're going to see right now uh that is cloud security so why especially Cloud security uh because machine learning until now has made different solutions

not only for detecting uh threats or vulnerabilities in Cloud but also for other different environments but I'm going to orient my topic towards Cloud security was uh also the main uh purpose of this topic so uh the main attacks that are happening now in Cloud are the andc that's also the reason that I have chosen these two to see how is the process of uh detection so to remediate them and to avoid these problems in the future but uh if you can see uh through some research that I've made uh I have seen some of the most recent vulnerabilities or tacks that are happening in Cloud as you can see there are security incidents uh during run time unauthorized access

misconfigurations U major vulnerabilities that have not been uh remediated and also a fielded audit now we can see other the part of on unauthorized taxes that is for 33% this a support of uh mitc because mitc is an attack where a user can get different resources in Cloud without uh having the need to be authorized in that environment and also on the other hand there is the US that is in the group of the first one security incidents during run time that is for 34% now the main purpose of Theos is to overload the network traffic uh this uh sending a lot of packets a lot of requests until it blocks the website or or that environment that is operating

and also we have data loss data corruption and also many others but these are some of the more popular ones in cloud computing uh and now many researches uh have been done of what approach to take in order to detect this kind of vulnerabilities because they're being day by day growing one by one so we can see that is a big interesting uh interest in using machine learning for tax detection and now we can see why I have summarized here some of the papers that have used machine learning ta different tax and Cloud but not those that I'm going to talk about later so the process section of uh Cloud tax is still a on topic uh as I saw as

we saw earlier uh Cloud security is a very sensitive topic and it needs to be improved so there is where machine learning helps its way so through the researches there are some of different machine learning algorithms that have been used through years so uh most of the more more popular one is ilsm through automated training and then also there were different approaches using statistical methods combination and algorithms machine learning uh achieving uh from this an accurity of 99.26% and also detection rate of this kind of attack of 100% And also using a hybrid algorithm for DM ITC which I have been trying to do now through the slides uh we're going to see what approach I have uh provided

to uh detect these two vulnerabilities in Cloud using machine learning and also the algorithms using to do that and also what is based what I was based to do this approach is the network package characteristics analysis and here is a figure of how is that done so first the network traffic is analyzed and then for each packet there are uh different fields extracted from that and the this is the phase where we collect the data that is going to be trained and also uh to do the prediction by Machine learning using the data that we have generated uh but for more details we're going to see in the next slid so I kind of mentioned what D and

mitc are so here we are going uh to see with figures what they actually do uh in the cloud environment so the the first picture there is an example of how DS affects in Cloud environments so uh the previous slid I said that main uh purpose is to overload the traffic that is currently operating in the cloud environment so in this figure uh we have a legitimate user that's currently can be us that is trying to operate through the internet and the other uh part is the victim or the second person that's trying to communicate with us so in the middle that we have the internet we can suppose that we have a cloud en environment that is operating

that part and there we can see three machines that are controlled by ATT attacker now here in the middle of communication between two users we can see different attacks with three machines in this example now maybe you saw the first of uh part of the conference today how dids actually work with real scenarios as we saw so that is uh with three different machines from different IP addresses they're trying to attack or to overload the traffics uh through this network between the two users and this is hows uh is done and at the moment the another part is the victim and can get the the data that was s from the legitim user because of the hackers that were on

the middle of the internet and at the other hand we have thec attack or the unauthorized access so we can supposed to have this Cloud hero Cloud environment and also a person that tries to upload a file there with authorized access that is should be and on the other hand we have an hacker in this case that tries to upload a file there and also to download from that from there without being authorized and in this case we have a cycle that's created for the mitc attack uh we can also see that when the hacker uh gets or uploads something in the cloud the main user that has an access maybe it doesn't know at all and there could be many uh

consequences so after this we can see what machine learning algorithms I have chosen to be analyzed in this case so there are four machine learning algorithms that I have chosen are decision trees support Victor machine navab bias and key nearest neighbors all of these or machine learning algorithms they are from the part of supervised learning and now what why I have chosen four algorithms we will see on the slides um later after analyzing after analyzing each of this uh these algorithms we can see that the algorithm that has the most higher accuracy uh will be used for prediction now we can see that all of these some of those are fast some of those maybe

doesn't operate a good and big quantity of data some of those are easy and fast some of those are easy to be applied but uh this is uh related also to the size of the data set that can be used and also for what operation uh they're going to be used to do the prediction so at the case set that we're going to analyze now as we saw from the title it's about a cloud environment so the first step uh includes of creating a cloud environment in this case I have shown a simple architecture of the cloud environment that is going to be analyzed now it's a is2 instance that is created in the AWS and it has full access to the

internet and also there is a web application hosted uh to uh to be this analyz is more complicated that it seems so to see how does it work and after that um I'm going to show you how I did some simulation tests of the DS andc in that environment and now we get to the most important steps of this case study that is package analysis data collection models training finding the B algorithm and last but not least the main part of this is U doing the prediction to see if these attacks will happen in the future or not so these are the configuration of this instance that I have created as I said it's accessible from outside has

web application hosted and now we're going to see how it reacts so here are two screenshots the first screenshot uh tells of how the US attack is conducted in this case so there are some examples that we can do the D attack including specifying the number of attacks that we want to use or the number of packet that we want to uh to send towards the cloud environment but now as you can see I have made an endless loop so uh through this code we do the DS attack sending a lot of packets until we block compl the website that is hosted on the in AWS and then after sending all those requests in this case you can see that

after the tripod uh there are some get requests there are being sent through that cloud environment and until we see that the website is blocked then this process is finished and so on we can continue with the process of detection and on the other side uh I have shown you a part of code to simulate the mitc attack we all already saw that what is the main purpose so uh we have the first method of uploading a file without having an authorization and the second one is downloading the file from the cloud environment and also the synchronized uh method that synchronize the files in order so the user cannot detect that there is any change in its Cloud

environment and here we have a screenshot after the DDS attack is conducted and we can see that the side can be reach reached and after this part we continue to analyze all those uh parameter parameters of those packets that we are interested in so as you can see I have chosen some of the most uh important characteristics that I want to analyze in this case for each packet so we have the duration Source AP destination IP protocol uh send bit acknowledgement bit F bit and also the packet size packet per second sequence number and is attack now the variable is attack uh in this case uh can have the value zero or one now it only realiz if

uh that packet that is going through that network connection uh is supposed to be an attack or not so this part of the code uh is the process of how uh each of those parameters are being extracted so uh this part at the end you can see at the code in order to know if we should give the tech the value zero or one we do this uh analyzing this PPS or this case that is package per second and if it if it is uh above 1,000 then the sequence number is analyzed and based on that is that I can have the value zero or one and this is an example of how a data

set uh is after extracting all those data that we're interested in so this data set can be up to 100,000 rows or uh more but this is only a part of data set of how it can be looked as you can see there a tack part uh some of the rows has the value value zero and some of the rows have the value one now we can uh if we can analyze uh the part of the PPS we can see that and those rows where BPS or packet per second or uh 900 for example or 577 we can see that the is attack is zero now in that case uh that process where that packet was being sent over that Network

it wasn't considered to be vulnerable and this is not an attack is not an attack and if we see the other values that are above 1,000 we have 1,200 we have 4,000 we can see that the value of his attack is one and thus we can know that that packet was intended to cause an attack in that cloud environment after generating this data set then uh we Define the models of each algorithm that is going to be analyzed the four ones that we show earlier and then for each of those models that are being created after using those four algorithms uh each of the models are evaluated and then we extract the accuracy score so in order to to find

that which is the most accurate algorithm is in this case we see the accuracy score and in this case in this execution this ision uh decision tree uh was the most accurate algorithm and that was going to be used for the prediction uh here we have a screenshot uh where we find the model with the highest accuracy that is able as the mass model and after that the result of protection is being shown in that application that is hosted uh in that case we have that a DS or mitc attack is likely to occur on the IP address in the near future that tells that is vulnerable also in the future now as I said uh decision trees

was the most secret algorithm in this case and it showed a percent of a cur of 100 so after getting the results after the simulation of these attacks and detection of them I also did some tests including the variables of time stamp page ID number requests number of erors duration bites and data and is attack uh here you can uh give different values different number of requests and then you can see through that model if your Cloud environment is vulnerable also from those attack or not so it can can highly be used also for a different project if you want to be using this uh here's the summary of the results that I've have gained uh through

this through this experimentation if can I say uh so as you can see the first part uh is when I conducted or assimilated those attacks through those script that we have shown before on the slides so we can if you can see there are examples of number of threads that are 100 1,00 10,000 52,500 and for different amount of threshold that is the number of PPS or packets for seconds for example the first case we have uh if the PPS is greater than 1,000 that we already had an example when the data was generated the other part is when the PPS is greater than 50 greater than 100 greater than 50 and also greater than 10

and also the duration of how much time did it take to make the simulation of the attack and how much was the time of the detection of those attacks and also the prediction that it tells that if that uh Club environment is vulnerable or not we can see that for some of the cases we have true for almost 100% but that uh this doesn't tell the curreny store uh score of the algorithm that was being used because I said before there's 100 now we can see that is uh 0.975 for example uh but this is how much are the chances so the environment can be vulnerable to attacks in the future if you can see now the second

table uh the second table in this case uh I have used J meter to uh to do our or to simulate those attacks in this case and I have compared with those that I did through the scripts and as you can see there aren't much difference in the prediction score but we can see some of them for example we see the false uh prediction of the first row from the two tables we can see that the first one we have fals of 0.712 now we have 0.191 but this also can be related to the duration of the attack or the number of requests that have has been done but in order to be more precisely I have

chosen the same number of threads also when I did the test and J meter threshold and also the duration so the results can be uh more positive or more related to each other and now I'm going to give you some takeaways or some conclusions from this analysis now through this study we can also say that uh machine learning is a great method and plays a vital role uh to increase the curacy of detecting different attacks not only in Cloud environments for but only but also in other environments and it also can adapt to evolving attack techniques now we can see at the end what are the possible improvements in this case we saw what the US attacks are also mitc attacks and

how they can be achieved how they can be simulated and how they they can be detected but remember there are also different approaches apart from machine learning but machine learning has shown a very uh great uh role in detecting this kind of attacks so it's highly recommended to use that so some of the possible improvements that are in this case uh can be the four that I have highlighted here so for example improve feature selection so if you saw that part that I have extracted data based on uh some features that I were interested in uh so maybe in the future we can also uh see other uh features that have a great impact maybe in the future

in that process or maybe to use hybrid approaches as I said we can combine two or three machine learning algorithms uh to improve that accuracy score in order to have better results the behavior analysis and also threat intelligence uh now in my PhD studies I'm working this direction of threat intelligence uh so with threat intelligence we can maybe achieve a more curate score and thus we can see that how how uh it differs with machine learning techniques when we use to collect data with extracting from the packet and also different improvements but uh these are some of the most important one that can uh be used to be in the future so thank you very much for your

attention if you have any questions or comments you're

welcome thank you rort do we have any question

y hi uh my question is can we use something similar to kind of build early warning system like for example non birth connection from those IPS I'm not hearing you very well can you repeal the question can we use something similar or the same model to to use kind of early warning system for example exle with the same logic M yeah as I said this solution can be integrated of whatever you want to do uh it's very easy to be integrated and automated at the same time so we can use it for different purposes you can not also detect the DS or mitc but also you can get different vulnerabilities or attacks that you want to consider in

your environment so it can be easy integrated Al can you elaborate a bit more link between thread intelligent how how can we utilize threat intelligence case Okay the threat intelligence part MH okay uh so we saw at the previous slides how those data were generated now through threat intelligence uh we can have uh more data analyzing now also uh using that data for different alarms so we can help us to know before when a attack is being done in that environment and also to use different data sets analyzing maybe two or three environments at the same time so that was the logic of using threat intelligent in the future thank you great thank you any other question

yeah thank you uh for these machine learning models to make uh reliable predictions and accurate predictions they obviously need a lot of data to trade on now uh like smaller businesses or organizations or whatever they might have a lot less Network traffic going through their uh Cloud environment so a lot less data to work with could they benefit from these models or is the data too little uh well it's right as you that because as much much data as we have is better we have a great curacy but it's not very worth it to for small companies because if they they don't have a large amount of data maybe they won't get an accurate result or

something and that maybe it won't be as much as accurate as it could be for a greater company for example great thank you any other question thank you rort that was very great presentation thank you very much