← All talks

Unearthing Digital Fossils

BSides Lancashire22:1422 viewsPublished 2024-04Watch on YouTube ↗
Speakers
Show transcript [en]

thank you so if is interesting my talk is after gcq guys because I'm going to talk about cheap operation so let's finally that what I'm going to talk about oops okay yes and because I'm going to talk about so many discipline and so many ter I make it like a story so I'm going to tell you a story the different chapter to help you not lost the track so chapter one I'm going to talk about CH operation which is advanc I mention which one is related to gchq of course and I'm going to talk about machine learning if fre and then on talk about the dinosaur anology any dinosaur here that's good and after that I'm

going to talk about the project Raptor and after Raptor I'm going to say what's any and on my current project and you know I'm a researcher is going on I don't know what the last chapter so before I start let's have a imagine you're invited to the game okay s game and this game is ask you to draw every morning in the morning and you learn to have a warm con there is a vending machine they offer you 10 different marks it's has a coffee and tea however there is a warning sign and it said that one of them com name po immediately another one is why imagine you going to as my question is which

number you pick

and here sir sir you like to know answer then you have to wait to

so I'm happy they mention about the cheap operation heard about the Cyber attack so let's talk about the a what's the a is all about the make the art Advance meaning that there is no teenager in the patori to try to prove that they can do something the and usually government and Nation the are persistance like Russian before I I mention the name of the country I would say I'm not political people person I'm not pointing to any cont research and so like Russians the op l what does it mean if it close one door they found you can see different type you can see that of course different country invol people at Singapore Indonesia

Pakistan and what we heard about North Korea Iran China and Russia more often accordly because of the conflict between the Ukraine and Russia the are more happen more a lot and even between pistan and Israel how they name it depends on their C destroy and mandate and fire either they name after numbers or they comes from the name of the animal like turning K from IR or fing there might head of it is from Russia so they have different naming system particularly as you can see are six category depends on the practice and T and you might thinking which one should I follow is you mother which one I should follow people popular follow depends on they if they

love the animal they follow or they follow fire art is much easier to remember that was my challenging question in 2016 I was like why one his name would it make my life easier that's why I'm going to close the chapter one a and I'm going to talk about machine learning machine learning particularly is categorized to superise and UNS superise superv you heard about cat and dog very familiar one classic one the bation is like you are going to the party and depends on how many we you would take your behavior might be unpredictable that is called a regression going to unsupervise learning is different which is main um if you look at the supervis majority of on

working us of the superise and un superise is like classting so we group of all of the for examp similar cloes or similar mou we have two different category on unwise learning which is comes to the helping the machine learning model when you using the feature because not all of the feature are helpful because we we we the time is against us and we want to increase the accuracy as well so which type of feature more important that's why the dimension is comes to the picture and that's why the association between the SE come to the picture let's go to the next chapter more interesting chapter and E this one if your D you might say that

how even pronounce let's say the dinosaur reception how they can find that this piece of B belong to the Tron or T-rex they find a piece of f one of the strategy is they compare to the different Tales okay so if this is piece of this tast most probably is for example trodon or T-rex 2016 I came across one interesting article from casaros which is mentioned the advanc Mal is look like a f because we are digging inside the machine and we try to find do is belong to which family that's why the Raptor Bard raptor is based on my te which I have done with s University and during pic uh I was working for a small company

and we were thinking what we can do working virtually so we started to um I decided to write a technical proposal and a bit we have one failure and the second a from the inovative it was six month Project based on and what was it it was about usage of AI to find that if can cluster of a was thinking how can I create a b because if I have a Bor then I would say okay this is similar to AP 29 this is similar to AP 33 and you might say that what's the point of it I would say that in higher level of the politics issue it is the concern which a comes

from which country and who this is from China this is from Iran this is from Pakistan it might be different country so I try to find that create the B how I supposed to create the B I use the up Cod why up because majority of a use the [Music] C and is look uh writing s perhaps people from China use the qu from Uranian people and fascinating enough course so what happened I created different WS from the up code by us of pattern and Discovery is like sequential pattern I I don't want to take you to the so many technical details but is a sequential pattern sequence pattern and find similarity and dis to like a push

pop move between the different type of family and what happen you might use the a you might use see any of that I would just when I create a presentation I was thinking I would like the people know how majority of these uh tools usually work so you take you to it definitely they need a feature might be dynamic feature like the abnormality of network PE it might a statistic feature like a death lash of malware like off code and then they create a data set and Trin if you listen sorry if you listen to the poort they train the testing model and then feed into the desire model like here for example is clustering and what will be

happen if zero m zero day one ability comes to the picture if it's match to any similar pattern they give the alarm otherwise is pass the system and majority of AP can pass the system what do you think there are do you think are dinosaur egg no let find what are this is AP1 from China ap28 from Russia Iran CH Russia and China there are network of the up code of 50 sample of each family if you notice to This n they are quite identical what's the point of it to evaluate our project we use the Kagel data set if you're not familiar with the KAG data set is related to microsof is carry on nine different

family of malare including back door to war and this feature you can see is one of the heat pattern and you can see this family nicely separated so is pro of concept of what we created I will take you through the B one more time so what happened we gathered the data of the ap5 family from Russia China here F the sample of each of them generate the up code create the hidden pattern establish link between them similarity and dissimilarity cor relationship between and create the rules s to the machine not see so you might say thato you did your job this project might help me to understand yes I can separate his Grand family but imagine you use the machine

learning technique to get diagnosed your Health's problem you trust it what type of question comes to your mind perhaps you would say how this machines or model reach to this Precision which one do you prefer do you want to see like a um you apply for a loan and you've been rejected and you want to know why you apply for a job and the Machine reject you and you would like to know why that is why it's the concept of black box which is AI and the last box which is ex AI so the question is the black SPX is a AI we don't know what happened inside all of the machine in machine learning

engineer after some point they don't know why the system produced this result so what we are doing what we are looking for do we looking for um understanding the result is it the accuracy and performance how many false positive and false negative receive or we rather understand the result of the model which one is more important why not both that's why expi comes to the teacher it's not new CH maybe you heard about it because it's quite new to in cyber security but is not new in terms of medical field as you can see some of the machine learning when is go more accurate the more complex that you cannot understand it and voice so there

so when the AR you can understand why machine react that way the accuracy okay sub solution is explainable any but what is the explainable any part is it just understanding how this result produce is it the way we can review or revisit the performance there are LS of them there are not only one way to explain the M mod there are variety of them as you can see there are depends on you can trust the system Sy you can explain the system and so on so that's why the trodon B so trodon is another project I WR last year and we got the F from two University one University from UK sh Hal and another

one in L in meia like yes I'm going to see the congruency the spider yes and so the Tron is following the Raptor so still you are working with the AP but we want to find it that not only detect the family of a but also we want to find out why machine decided this is 20% AP1 is there 50 percentage AP 29 and so on so we can improve an enhance IPS system why Tron TR is a type of that t in first of is a Greek name it means um for what was that and it it means very sharp he a raising sharp and TR is one of the animal and one of the dinosaur is the Mi of the

bird and then kind of that M things which is part have a tiny legs but it has consider one of the dinos just has a big brain and big eyes so I was thinking I need the eye I need to find none of the unknown cross and I need the brain because we will think about why the why the mother decided this is now where this is IGN so if the rest is quite complicated I don't want to make you feel bored but this is somehow is what is going to happen in this resarch so phase one we are going to have a mixture of the uh behavior on a statistic M where ones using again AI model and then model

and then we apply the explainable AI having said that so we have different gather different a and we try to monitor main this time we are going to use a Sandbox because the rapor it just use a dead slash it was a corpse of the mar but this time we are going to use a Sandbox and from them we create a feature and then hopefully we can find not is any unknown sample cross step model having said that there is divided in three section I'm currently in the phase one I'm doing here uh because I need more AP sample just our family is not enough and hoping the next year we move it to this uh phase two and three

and definitely we are going to engage with some industry professional industry people to give us the feedback why because explainable AI definitely need the feedback from the people who are working if you are talking about the people who are working the defensive and routin would say that we have receiving we are receiving a lot of false positive which is is not nice to receive you know so that's why we're trying to figure it out um how we can reduce it and how we can revisit the

model okay thanks for your patient there is no correct or wrong answer to this question why because the reason I I was thinking what type of question I could ask you because me I would like to and not only cross some d dinosur and some models no why there is a difference between the AI model and human we are using rational decision and after we make a decision we have explain we can explain even a toddler choose a banana over Apple they can explain to you isn't it so be warn in a way that any sort of decision you make we can explain so that's why we St that's why we need explainable AR and

definitely I wouldn't say in is the future of because it's currently used in different application but I would say it might be quite helping people I wouldn't I wouldn't say that uh replace the people no but it would be help the people to uh diagnose the problem and PR any unknowing M who I am I am cyber security and AI research they call me alur of course because I love the dinosaur I'm arti this is one of my work if you are interested to work in this research or you are just curious person you would like to know about the after there's a kop you can I can share with you if you like to involve with tro

I'm happy to help but before that if you have any question who may answer thank you very much thank you