BSIDES CPT 2019 - How machine learning and AI can help reduce the cyber- attacks - Silent Dzikiti

Name: BSIDES CPT 2019 - How machine learning and AI can help reduce the cyber- attacks - Silent Dzikiti
Uploaded: 2019-12-11
Duration: 42 min
Description: Title: How the application of machine learning and AI can help reduce the cyber-security attacks. Abstract: According to Global cyber security company Kaspersky Lab South Africans have once again been warned to be careful in cyberspace with a 22% increase in malware attacks in the country in the f

BSides Cape Town42:00393 viewsPublished 2019-12Watch on YouTube ↗

About this talk

Title: How the application of machine learning and AI can help reduce the cyber-security attacks. Abstract: According to Global cyber security company Kaspersky Lab South Africans have once again been warned to be careful in cyberspace with a 22% increase in malware attacks in the country in the first quarter of this year. It seems that every presentation from every security vendor begins with an introductory slide explaining how the number and complexity of attacks an organization faces have continued to grow exponentially. Of course, everyone from security operations center (SOC) analysts, who are drowning in alerts, to chief information security officers (CISOs), who are desperately trying to make sense of the trends in security, is acutely aware of the situation. The question is how do we, collectively, solve the problem of overwhelmed security teams? The answer in many cases now involves machine learning (ML) and artificial intelligence (AI). Instead of looking at ML tasks and trying to apply them to cybersecurity, let’s look at the common cybersecurity tasks and machine learning opportunities. There are three dimensions (Why, What, and How). The first dimension is a goal, or a task (e.g., detect threats, predict attacks, etc.). • prediction; • prevention; • detection; • response; • monitoring. The second dimension is a technical layer and an answer to the “What” question (e.g., at which level to monitor issues). • network (network traffic analysis and intrusion detection); • endpoint (anti-malware); • application (WAF or database firewalls); • user (UBA); • process (anti-fraud). The third dimension is a question of “How” (e.g., how to check security of a particular area): • in transit in real time; • at rest; • historically; • etc. There is no doubt that AI and Machine learning enabled technologies are already a critical part of many security teams application and I will show how they are being applied. Speaker: Silent Dzikiti Twitter: @SilentDzikiti Speaker Bio: My name is Silent Dzikiti. I am a Data Scientist and i am studying Computer Science. I am a Zimbabwean. I stay in Muizernberg, Cape Town. South Africa. I have researched and applied the knowledge I will share on your highly esteemed conference.

Show transcript [en]

there are so my name is silent Sakichi silent is right that's easy to to get and I'm a data scientist be joining Museum in February so let's get in through this so first of all I would like to us something how men view how many before yeah think AI or machine learning or deplaning can reduce cyber attacks very few and for the others no okay cool so these are some top steps of a is cyber security solutions that are being implemented at the moment from future applications network included detection and prevention is all for detection credit scoring and next best offers for edge detection secure use on education cyber security ratings bikinis inside forecasting right let me start by

explaining what a is so basically is the programming systems to perform tasks which usually require human intelligence and just to surprise you a I started back in 1956 right by a guy called John Mackey market and this guy when he studied actually try he was trying to start with a chess game right so he trained a chess game to to play chess better than him and it really ended up smashing him and he played he actually I played better than than him and it's just a science and technology based on disciplines such as computer science engineering and a bit of psychology as well so what is our main go here for efficient intelligence in a cyber

security or infoset conference I my background is in there are signs I don't know much about cybersecurity but I know a bit and I'm sure if we combine the two we can reduce a lot of cybersecurity attacks that are happening and this is not only based on recess but this has been implemented to detect to prevent or even to only not only to detect and prevent but many things that can be applied as well machine learning I'm learning a bit of machine learning or implemented machine learning so far very few subtly buzzword many companies are now using AI and many security vendors are using a ISO and machine learning right when many are migrating to deplaning right which is

something I'm going to talk about so what is machine learning it's actually training an algorithm to solve tasks by pattern recognition instead of specifically programming them then how to use the task so with machine learning is basically yeah basically it's more like I made a few of cookies I'm sure many but with machine learning is more like training a kid so many times but this is this this is this and this is this then it add up like learning from the training right so and what is deep learning deep learning is training algorithms to use deep neural networks with multiple layers right so these are the four types of machine learning algorithms that you can use

deep learning supervised learning and supervised learning and reinforcement learning alright so I spoke about this it's so I'm going to speak more on machine learning and machine learning is you consists this guy called Samuel it was 19 1959 described machine learning is a field of study that gives computers the ability to learn without being explicitly explicitly programmed right and machine learning is rhythms that could dramatically detect patterns in data estimating function that describes the relationship between the future set and the target variable and uses and covered patents to predict the data when summer give this definition after he wrote a checkers with the gambler I was talking about playing playing program the algorithm lend over time okay so with some help

some oh god sorry about last summer give the definition after a row you wrote a checkers playing program in the algorithm learned over time what considered bad both positions and good board positions and eventually the Machine becomes better at playing checkers than Samuel that was back in what 1959 so those who think there was a new thing is not a new thing so what is the promise in learning process so I'm a big believer when it comes to machine learning you need to know to have the domain knowledge right that's the first thing you need to know if you're in the cyber security of a haircut you really need to annoy staff before you get into

anything so the first first thing that you need to know is to understand for identify project objectives right not only that understand the the terminology and everything in that industry which is their understanding as well collecting and review data data preparation select includes data modeling which is a big part in the machine learning process manipulate data and draw conclusions evaluation evaluate module and conclusions and deploying this is quite a significant part this can be a bit dangerous and as well good at the same time the reason being that we most models that were running now running on root root am run or data also root ambien sorry and a lot of people I get this a lot and

this is quite a big thing now many people don't understand what deplaning the difference between deep learning and machine learning right and if you look at the graph the picture here you can see that there is our input right and above it's machine learning right so with machine learning above the we are going to do feature engineering which includes feature encoding and all these other stuff and you get to figure out the features then there is a classifier with shallow structure and you get your output but with deep learning it says come and it's outsmarting ml though don't get me wrong mesh deep learning is part of machine learning right so it's it combines supervised methods in unsupervised

methods right so as you can see this is a feature learning and classify end to end learning both are combined when it comes to deep learning and you get your output out of that so this is all machine learning so with machine learning mostly we I think most of you have applied it and many people even applied it they know most about supervised learning and unsupervised learning classification regression and and clustering okay so what is unsupervised or supervised learning with supervised learning you are expecting the results the expected result is provided algorithm is trained to produce correct results right new data is classified according to the trained as well and no results expected algorithm is trained so that similar

data writes that is the unsupervised now as you can see in similar lies is combined and with a supervises it just spliced out right and there's this famous women Supervisors well which is something that I didn't mention because men men of the Fox and men many guys are not using it right so how can we reduce the cyberattacks I think we all know that many of these attacks are actually for financial benefits and we can check with many many many of the guys now we are using it for anomaly detection right so in the maybe in the context of network and what security anomaly detection refers to identifying expected intruders or bridges right and whether the nature of

the attack or data infiltration or exploitation room through ransomware or odd way or advanced persistent threats it is clear that time is not on the defenders right the 2019 bridge investigations reported and stated that on average it takes ten days only ten days for a systems bridge to be detected but after an attack gains entry however the damage is usually done in a few days less which is quite crazy right so I'm gonna get into your stuff now which is and further explain the DDoS attack in our service attack as you can see it is increased by 200% in the first quarter 2019 compared to the same period last year what is the go of did us the goal is to

disrupt or delaying the service the services or server or network articles want to rein in everyone's parade by making services and system and responsible or unavailable to end-users which is quite said they do this by exhausting those components resources can be bandwidth the disk space memory but their main aim is to prevent the systems from operating as intended right we can see from this as well that is quite growing and if there is you can see from 90 from from 1918 98 when this absolutely reporter came a few days back and it's he's crazy how how these attacks are growing the DDoS attacks as well right okay so what are the methods being used currently right intrusion detection

systems we use treasures rustics and simple statistical profiles to detect intrusions right but what are what are the limitations or the trace was based on normally detection logic is used to implement some questions very quickly raised how do we set the threshold it's a good question could some users require a higher threshold than others could there be times when users legitimately need to access the database more often African do we need to update the threshold could I not suck and not an attack on fretted data okay so I think there is a solution for that and it's called time series right so with time series we've got the seasonal variations where were the trend variations we've got cyclic

cyclic of variations and random variations which is no variations that is something that repeats over a specific period such as maybe a day a week month yeah right and for example if it's a website you know that around this time and we recently had a Black Friday and many many of the sites just had problems and problems and problems and I think those guys are not even using they are not applying times I mean time series as well because if you get to configure your your website according to what happened last year around that same time that very day and that's very our you can i implement time series or so and you know it will do wonders trend variations that

move up and down in reasonably predictable patterns right and if you can check how how it is being implemented obviously is first thing is to clean and which is a big part in the time in any machine learning mode away i the cleaning cleaning of data is quite crazy time time series visualization you wanna check you wanna check what's happening with the data set that we have remove nine resources now you know that train and evaluate optimally autumn eyes sorry ormally optimized machine learning model and deployment right so what does the time series algorithm do it learns from prior data and make a prediction about the future substantial deviations between the focus and observations and a

considers right this class of anomaly detection right this class of anomaly detection algorithm uses past data to project current data and measures how different the currents and the currently observed data is from the prediction in the focus following factors are considered which is the trends and if you if you look at my graph then the black are on the black pot right let's put it in a 12 month gap that if it was there was the 12 months and the next year the the whole long year it's quite above or beyond and we've got more traffic for example if it's a website then that it with machine learning you can train it and it doesn't recognize that

it is an abnormal traffic it dragged it required its recognizes there is a normal traffic but if the scary thing is what if it's a DDoS right and I'll get to talk about that as well so we can observe clearly period in the in the in the in the pattern in this series to perform anomaly detection using forecasting we compare the observed data points with a rolling prediction made periodically let me give an example and [Music] then an anomaly okay if the observed values for between within low confidence bands and an anomaly a ladies are raised right so the blue the blue line there right it's actually a twelve-months analyzed 12 months back and it is it was

already modeled right so we already modeled trained trained our data and the orange line now gives the focus for the next year coming here right so what are the advantages of time series the time series pattern recognition systems lens the seasonal patterns in the data sense is able to correctly identify the anomaly seasonality is the tendency of data to show regular patterns due to a natural circus of user activity for example I'm gonna give we get to see higher traffic on weekends or weekdays combatted sorry on weekdays compared to weekends whereas other side see opposite trend right some seasonal trends play out over long appearance online shopping websites expects a expect a spike in traffic

every year and during the peak shopping seasons so it's quite an advantage to use the time series right on on website specifically if the training data contains so what are the disadvantages now or the limitations in the Train the training data contains anomalies that you cannot easily filter out the model will fit to both in lies and outliers which will make it difficult to detect future outliers if the time series is highly adaptive and does not follow an observable trend or if the appreciate of functions varies widely focusing is not likely to perform well right I don't know do you agree okay so let's move on to fishing right this has been a big thing if and

I'm sure most of you are other than me and for some who started using we started using Gmail I don't know if you've ever received a memo from any guy just stating that you've got 1 billion runs that is 1 million u.s. dollars in Canada that is waiting to to be picked up I know that so back in the days you could get it in your inbox but I'm sure if you if if one of you or many of you uses you know you get to see that is now classified as spam and sometimes though you don't even get the email right and the reason being they are using machine learning behind behind any oxidising machine learning algorithms

behind that and they can easily classify that this is an inbox this is a social or this is a promotion and cut that kind of thing and you can even implement it as or when you're on your own with the emails right so the effect image appear to the recipient is someone you did trust such as a boss right or a bank and the bad thing that is happening now is that these guys can even now use your your domain your website domain which is quite crazy and they can actually is your brand identities all right and thick to be maybe your manager your boss or anyone else so futures are getting quite crazy but

the energy reduction the eased I mean they they is now one out of 99 in nineteen nineties and it is an actually an emu attack so it there is it's being reduced and there's actually a stealth that I was looking at that that's quite crazy and you get to see how is human beings of how curious we are so they send out a nose after a training the totem in must never open him a bed in oh right and I'm sure many of us can figure out sometimes we try to figure out right but out of curiosity about 40 percent downloaded actually opened the email and downloaded the package so how can we reduce that and how can we prevent that

so classification is the solution right a lot of classification modules can be used and many of our severed logistic regression decision trees support vector machines neural network as well classification of network traffic can be effectively be done by analyzing past events consisting of historical lines we logs of binary files login attempts it was received or inbound and outbound requests learning patterns from those from these events and hence creating classification models that can classify future events whether malicious or legitimate right so this is a linear regression and logistic regression these are two are the two most famous machine learning algorithms which which come under supervised techniques as well and the linear regression is used for solving problems whereas

logistic regression is used for solving classification problems and we can see the difference here of linear regression it's actually a linear regression is used to predict the continuous dependent variable using a given set of independent variables but with logistic regressions it is used to predict the categorical dependent variable using a given set of independent variables linear regression is used to solve regression problems and with logistic regression it is used to solve classification problems okay this is a decision tree right so what the decision tree it is just a structure there that lives there is leaves and present classification and branches right the advantages and limitations with this as well the advantage I don't actually recommend using decision trees mostly

because of the planning that came out actually it came out a few years back but there are some advantages and out of it and but without that one is you get high classification accuracy and simple implemented resolve it's actually simple to implement and the limitations that decision trees perform grid search of bursts splits on each node right and no it does not fit continuous variables and the other biggest issue is the overfitting profit actually overfitting is the one of the most practical difficult for decision tree models and the probe this problem gets solved by applying or up to the constraints of perimeters and and pruning right okay but actually the there is something fortunately we can use we can use

actually we can use deplaning which is which is something I will talk about Islam so whatever cyber security check up on SharePoint in the cyber cyber solution sister security system how should you be able to decide the following for every file sent through the network does it contain you these are the questions that you can ask a service or for for every login attempts because with machine learning in AI is all about data and data is the big thing with us and it's for you to do any email or the deep learning on a eyes stuff with data is the engine of it and it all begins by asking yourself questions where do we where do we get the data from what are

the problems and how do we then get to use it in the near future right so for every login attempt and someone's password been compromised and how has it compromised those are the questions that you can ask yourself is a wet places on oil and all that so for every requests to your servers is it a denial service attack or is it a man in the media whatever whatever it is it is that all these tests classifies of all events and where the malicious or legitimate right so which brings us to this and it's quite said that Mara attacks in South Africa increased by 22% in the first quarter of 2019 compared to the first quarter of 2018 so Mara is

just a blanket term for worms Trojans and other harmful computer programs egged echoes used to to wreak destructions and gain access to sensitive information and 58% of my attacks are targeted to small businesses and given the fact that our our economy South Africa is mainly focused on actually main our backbone economy is his small businesses just think of the bad things that could happen through our small businesses if we these attacks increases but there is always a solution to this and so these the approaches to more detection so three years the kitchen right phase so data is anything you can tell about a fire without executing it this may include executable file format descriptions core descriptions binary

data statistics text strings and information extracted via chordoma emulation and other similar data post execution first data conveys information about behavior or events caused by process activity in a system

right so with Mary Mae were classification methods being used these are them no classification models being used currently intrusion prevention systems having the ability to intercept the direct line of communications between the source and destination or might automatically act on detect anomalies encryption securing the data itself by encrypting it and on opposes om Tulane at learning about attack methodologies and gathering forensic information for performing analysis on the attackers action right so for me I detection that two categories right and it is burn-in malicious right fouls with those two on top then you get a train your data and you you you get predictive model right memory recognition models decides if an object is a stretch right

based on the data they call they've collected before or on it this can be this causal this data can can be collected when when you have been attacked or before the attack so we are just using the data that that we have at the moment machine learning both new a detection using various kinds of data on host network and cloud-based angel mobile components and so this is actually incoming stream clustering and you can cluster out your data sorry you can use clustering right which can be support vector machine or any clustering method many machine learning methods right so we get to see that incoming stream of unknown already classified objects that's when our our terror is coming in

then we classify our data or sorry we cluster or data then you get to see the point one point two and point three and then there's the women a notational part of it but with with clustering it he wrote it all depends with with the motor dairy I mean the algorithm that you use but I would actually sometimes it depends with I think with algorithms with a support vector machine right is one of the best algorithm that you can use but there are some limitations to that as well and so with what machine algorithm machine learning algorithm can usin L so these are the question that you need to ask before you implement any machine learning

so these are the question that you need to ask the companies or that you get to ask first before you implement an email so what is the size of your training data set called some some of the algorithms work better with small terraces and some of the algorithms work with big addresses right and for example dimension reduction there or dimension reduction there's an algorithm called Disney right and there's an algorithm called svms all right if we apply Disney on bigger algorithms for that dimension reduction it's it works well but if you for the others it doesn't and the other thing is are you predicting as sim samples category or a quantitative value do you have leopard data or you do not

have the about data if yes how much sleep order that we have all right do you know the number of results result categories how much time and resources do we have to train the model how much time introduce the do you have to make predictions right so these are these are these are some of the questions that should ask yourself when when faced with the machine in an algorithm selection right and what are the advantages now of using AI or ml so with admission intelligence and machine learning it can handle large volumes of activities that takes place across the a company's network and massive email files and websites access by employees in a small fraction of the

time needed by humans machine learning or in ia I can have then over time can identify malicious attacks based on the behavior of applications and the behavior of the network as a whole inma would identify the non threats hundreds of millions of malicious attacks are launched every year right and what are the potential limits limitations of applying a is all cyber trees the constantly evolving right and pet attackers are creative as well and visually limited resources as new traits are image secretive solution that you say I have to be trained in order to keep up right so you need to constantly train your data sets or your models of law and the other the other challenge

that were having is that the bad guys are now using AI as well which is quite crazy and they are now obsolete even training their data centers all but a systems are not yet advanced enough to be hundred percent accurate right and the other thing is with AI is the false positives right so with AI system you can never sometimes you get a hundred percent accurate there is actually research that was done recently with f-secure with way they're using CAPTCHA but it's actually under presenter but you can never get to to that point right but the biggest challenge that we have the poll was positive in the in the AI sector and that is one of the

limitations so what are the key takeaway points combined data since there are signs in cybersecurity the other thing is the domain knowledge is key right understand your data sets what why the Dyna data mining part of it is or how do you harvest data from and get to understand that do we get to use our log log in files what what what kind of terrace is information that we need to put in our terraces understands the retcon machine learning and how to apply it into into cybersecurity so false positive is a big challenge right and that's the biggest challenge in the app of applying AI deplaning or machine learning and that needs to be actually looked at and have

the right data said that is this is the fewer of machine learning right and yes thank you okay so with overfitting it's we are talking about the outliers that I mentioned okay let me go okay so she said what is overfitting and let me check out the slide first okay so with overfitting is actually going below or let me put it in let me give you an example of outliers right so I don't know how familiar you are statistics but okay okay so it's just the abnormality all right or anything that goes beyond the norm oh oh so it's positive false positive is something that is acting as if it is the correct thing but is the bad thing right

so for example we we're thinking but when attackers attack the eggs eggs if they attack in a way way they act like it's normal traffic which whereas it's bad traffic right for example did us attack any question

okay so you mean on runtime data that can be that can be implemented but then the is that is there is a challenge on that on that one and there are some limitations also but it all depends where where and what kind of data you are talking about can you elaborate on that

[Music] okay there are many ways we can you can do you can do that but it all depends if so for example I talked about it the the slide with the time series yeah yeah yeah yeah so it can be you can use a time series but like I said there are limitations as well because is this with with applying a IOM you know it's not hundred percent accurate and especially like one where a runtime data it's quite it's quite no tool but but if for example with Amazon right actually there's a good combat actually is or can't remember the name but what they are doing is for any abnormal traffic right they just so done once every pop

the other one right but using using Mao as well but for that specifically it can be a challenge but it's it does the problem with with time series unless if you if you combine some other algorithms with that is oh another question

okay okay there are many I think I think the did you is attack right with with the service right mitigating or detecting abnormal traffic right so there's a site I can't really remember the site right but the site with the site they actually implemented it was actually just checking actually it was a setup services all chimes listing and they were looking they were just looking at the traffic right in the variations ago as well seasonal variations in the problem day it was actually the nation right so so four four four four different countries we get different religious and what is other things in the ring know that from that country there was actually a holiday just for

example in America / pumpkins because of Halloween a certain period and we decide it was actually another country and when the traffic went up they detected it as an anomaly detection right and it went down picked up another server which is something we're all just talking about but they actually on that's on that same with I think after three hours or so I can't really remember where after three was also they picked up that the guys were trying to get into the system as well but they were applying mmm

BSIDES CPT 2019 - How machine learning and AI can help reduce the cyber- attacks - Silent Dzikiti

Related talks