← All talks

Cognitive Defenders: How AI Transforms Cyber Security

BSides Bristol32:5472 viewsPublished 2024-01Watch on YouTube ↗
Speakers
Tags
StyleTalk
Show transcript [en]

hello uh thank you so much for having me and thank you for kind of sticking with it uh at the end of the day I know 5:00 P p.m uh lecture isn't necessarily on everyone's radar especially on a Saturday but thank you very much um for being here and it's been a it's been really great to today to see some of the other speakers and to really understand um some of the topics and the knowledge sharing that's been happening today so hi nice to meet you um just a little bit about myself so I am a University of Bristol graduate I graduated back in 2017 um since then I have been working in information security I started off as

a ethical hacker pentester and then moved into for kind of from that red team aspect through to Blue um and then ultimately ended up in in more of a GRC function I currently sit as head of security services at opencast software which is a software development firm that's actually based in the northeast of England up in newcast Castle um but we kind of have hubs all over the place and in my part-time kind of free time aside from playing video games I like to do um data science which is very very cool I know um and what I'm going to talk to you uh about today is around artificial intelligence and I'm sure if anyone's playing AI talk Bingo you'll be

pleased to know I'm kind of steering away from a lot of the like the key buzzword and you know kind of cringey topics around around this subject however to begin with I'm going to mention Skynet so you mark that one off your Bing award there in order to understand how we apply uh artificial intelligence to to to cyber security we first have to Define what is and what isn't AI now I'm going to do a little bit of audience participation here and I'm going to say raise your hand if yes AI um I just run through some examples really so is Uber AI No Hands raised okay interesting what about something like Google Translate is

that artificial intelligence raise hands for yes interesting what about something like blood glucose monitors would you consider that to be artificial intelligence yeah interesting okay so the reality is that AI consists of kind of seven key topics and these are that the the algorithm in place can learn it can adapt it can problem solve it also has a function of Reason reasoning and perception and it's also adaptable to change it's also fundamentally data driven so all of these kind of previous examples they all actually count as artificial intelligence and in fact if we consider what artificial intelligence actually is it's just intelligence but in a non kind of organic structure so I just want to walk through

kind of some of the the kind of the key terminology that we we have here so artificial intelligence itself is the kind of the broadest term anything that can learn and apply the learning that it's kind of been developed is technically called artificial intelligence where we go a layer deeper is how we optimize and we allow those algorithms to make mistakes and to learn from those errors and that's kind of the fundamental principles of machine learning there's lots of kind of different ways that machine learning kind of branches out um into kind of deep learning patterns and kind of that a subset of that is neural networks which are made up of um really really interesting units or functions um

called percepton which basically perceive and and perform kind of Transformations and functions and solve problems in a unit basis that then coales together to make this kind of this really abstract neural Network structure and the whole reason they're called neural networks is because they actually mimic a lot of the behavior of the human brain so now that we understand a little bit around what artificial intelligence actually is we can kind of begin to understand a little bit about what kind of problems they can solve in order to understand what problems we can solve we first need to know what AI is really good at and there's kind of no surprises here so things like data analysis and pattern

recognition repetitive tasks doing the tasks that we don't want to do or we don't have the capacity to do natural language processing is something that's um kind of a key Hallmark of of artificial intelligence you know I mentioned earlier is Google translate AI it is it's something that's learning and perceiving based on the data inputs and it's able to transform them into an output image and speak recognition so linking on from kind of Dr Emma earlier talking about that the kind of de awareness closed captions are a form of artificial intelligence and ultimately one of the kind of the key Hallmarks of AI is the fact that it's good at Predictive Analytics and automation but there's also a lot of

things that AI is not very good at at all and these kind of make a bit of sense when you think about them we are only as good as the data sets that we provide our Ai and quite often these lack relevant context and a broader environmental understanding is not often included in these data sets creativity and Innovation is something that AI also struggles with it's quite interesting because when I mean I don't know how many people in the room here are Consultants or software developers a lot of the time when we have client problems the the response is I want to add machine learning in I want to add AI in but actually AI isn't there to

facilitate Innovation it's there as a as a way to kind of of enable it through through simplifying a lot of the the processes that we previously wouldn't really be that interested in in performing so a lot of those big mathematics fits adaptation to unforeseen circumstances and error handling is something that artificial intelligence is really bad we come back to that point of we are only as good as the data sets that we we have available and ultimately understanding intent and human motivation this is something that AI really really struggles with and if you think about the field of Cy security this actually Pro poses a really significant problem if you consider how much kind of uh you know threat

detection is based upon understanding the intent of a malicious actor which will then lead to how we Define and and investigate our indicators a compromise so what does this mean for cyber so the reference model here is uh provided by Gartner it's provided by a Gartner Insight paper um and I think I think it really it's really interesting because it does illustrate where artificial intelligence machine learning neuron networks fit within the cyber security mesh model and really it's uh it sits kind of it's not necessarily technology agnostic so you can kind of see the products kind of sit on the right hand side and there's inputs and outputs from uh the analytics and machine learning it's also fed by

threat intelligence we're only as good as our data sets are and really what this demonstrates is how it can this kind of security intelligence layer can really add an enhance in an organization's architectural structure so I want to look a little bit at the rise of AI in cyber security and really how we got there so for those who are not aware of kind of the timeline ai's actually been around since the 50s it was first defined by Alan Turing uh the first artificial intelligence model actually came out in 1952 it was incredibly um quite quite significant in terms of re revolutionizing um some of the uh the key princip principles behind artificial intelligence we move through then to the

80s and 90s where we're looking at intrusion detection and expert systems and then into machine learning and anomaly detection which we'll look at uh in a couple slides into the 2000s we really start to see a ramping up and a a real interest uh in in artificial intelligence so looking at behavioral analytics if we think about this in the context of an organization things like expected user behavioral access control ubac is a form of this behavioral analytics do we expect this person in finance to be accessing this data from this part of the world at this specific time these are all things that kind of feed into that behavioral analytics that AI is really really good at at detecting

and then ultimately through the the the last 10 years or so we're looking at the the rise of endpoint security looking at the threat intelligence pieces and the cloud security and now we're moving into the automated response bit so this next slide here is uh the Gartner hype curve and I I really like this hype curve this is based around central government so that it will vary depending on kind of the market and it's about technology adaptation and how uh how we basically are go through these kind of these levels this you know almost like the seven stages of grief with technology where we look at those inflated expectations that come out the back of someone finding something new

and really exciting um so as you can kind of see on the The Far Side of this one um stuff like casby uh we're all quite familiar with we're quite comfortable with we're now getting to a point where actually we're able to apply this technology successfully and it's entering that plateau of productivity um where J AI kind of sits is almost in straddling this Innovation trigger and the peak of inflated expectations because everyone no one really knows at this stage how it can revolutionize working uh especially in the field of cyber security so I want to talk a little bit now about the the kind of the applications of artificial intelligence and actually go into a little bit of the maths behind

it um don't worry it's not too hopefully isn't too Mass heavy so thinking back to uh thinking back to kind of that first slide and around what AO is really good at is really good at establishing patterns and identifying um identifying and recognizing anomalies in data so one of the uh examples that I'm kind of going to bring up here is fraud detection with isolation forest and this particular um algorithm itself was developed in 2008 and it basically identifies where there are um anomalies within data sets which then can be broken down based on this kind of this tree this falling tree model to then identify your outliers and why this is particularly important is if

you think about the context of a big Financial Services organization maybe a bank they might be handling thousands and thousands of transactions a second and using something like isolation Forest which then breaks down expected patterns of behavior you can then begin to pull out those those those fraudulent activities we can then be notified and understand um how a user or how a set of TR transaction should behave versus what they're actually behaving understanding these anomalies is is really important um within machine learning so what you kind of get once you apply this this algorithm is you get this really nice kind of plot and it becomes very very obvious in these large data sets where you suddenly have

outliers and the benefits of using something like isolation Forest is that it's efficient it can handle large data sets so it makes it perfect for that example of fraud detection in financial institutions it doesn't require any real pre-training it's not a model that requires you to do a lot of um composition analysis of your data sets before you actually insert them and it can be also uh performed in both unsupervised and supervised learning models so when we say supervised learning models we're talking about we're allowing the algorithm to learn but we are adjusting the data sets as it goes through its learning process so there's a level of human intervention we're allow we're directing it towards

the conclusions we want it to make so that our model fits it fits its purpose accurately and it's also able uh to to detect Global and also local anomalies so understanding the bigger picture is particularly good at and I would highly recommend if anyone is kind of interested in data science and interested in in applying this in their own kind of workspace is to look at the uh the pit and um uh Pi data uh talks so these are a little bit older so the 2018 um but they really they do really great examples of how to actually utilize isolation Forest um in in the detection of fraud the next bit that I wanted to

cover uh is around botn Net detection um so for people who are unfamiliar botn Nets they tend to um compromise large uh very large pervasive Network systems things like iot devices tend to be very susceptible to them and how they communicate is they tend to send all of their information out to as a single command and control center um identifying kind of icious Network traffic as a whole is something that's been around for for a little while now and there's kind of there's two main techniques so we have signature based detection and anomaly detection signature based detection has some caveats with it and some some cons I'd say mostly around if we're constantly looking at a library of known

signatures we have to maintain that library of known signatures so it's very heavily reliant on up-to-date information which can then mean that the alerting may not be current for for the actual purpose that you're using it for uh the the other attempt is uh is to use anomaly detection which we've kind of seen previously when we've looked at isolation forest and identifying those outliers in particular um because of the rise of uh AI we have kind of almost the next level of detection techniques which is anomaly driven IDs it also one of the kind of the key issues as well around this is the fact that when we have uh kind of the signature based detection we have

usually a a threshold of alerting that needs to be reached in order for that to actually trigger this can change over time knowing what your network Baseline looks like is incredibly important for this therefore you have to do a level of statistical analytics looking at intertile ranges mediums to really understand and um understand how your network is behaving ahead of uh identifying those that anomalous traffic so in in this case with uh with with botn Nets um we tend to find that when we're using these statistical techniques to identify this anomalous tra traffic that uh we we have to take we have to understand first of all what the Baseline is and then what does not

normal look like now one of the things that I mentioned earlier is the fact that AI is really bad at handling errors and really bad at handling outliers so actually anomaly detection can sometimes be quite difficult and what you tend to get is you tend to get what's known as overfitting which is shown by figure six on the on the left hand side right hand side for you guys and this is where the statistical model will take into account the outliers far more significantly than they actually represent in the data set so you can kind of see from degree one all the way over to degree 15 you know a linear to through to a true fit through

to an overfitting example and there's a huge emphasis on reducing this level of error handling so that actually our understanding of what what is actually anomalous and what forms part of a normal data set um is incredibly important so not every AI is actually sensitive to this um but you do tend to need to do a lot lot of exploratory data analytics in order to actually understand how overfitting will will affect your your anomaly detection so how this works in practice is we we identify that uh botn Nets tend to call out to C2 infrastructure so your command and control um where you would fit your AI within in this case is in that security

intelligence layer that sits above and what this security intelligence layer will be doing is it will be a applying some methodology for identification of of the the botn net traffic and so this is either by anomalous detection which I've talked about at length or by Community detection this isn't the only way to detect this traffic and actually looking at the the traffic itself could sometimes be quite difficult instead what we might choose to look at is actually the interval and determining the the kind of the frequency of the pulses the time to live things like that and so different models can be applied within the security layer to to best fit your organizational purposes there's

been a lot of research on this actually in the the kind of recently um and it's it's kind of really demonstrated the the utility of using artificial intelligence to identify TR traffic um Network traffic signatures that previously we would be blind to the third one I want to look at is Biometrics so we're all quite familiar with Biometrics but Biometrics also represent really great use of artificial intelligence so when we look at things like facial recognition um we're looking for data sets of the face or we're looking at you know in fingerprint analysis we're looking at the grp the grooves and ridges that you see on your fingerprint what is often a problem is with AI you will tend to look for the

things that are most common so assuming that you know if we if we're trying to Val validate someone's identity we can assume a large population may have a nose therefore a nose necess isn't necessarily an indication of of identity uh something like a nose or eyes or eyebrows um these these basically count as um what's known as IG vectors and if you put them all together you get an igen face um igen faces themselves have high degrees of dimensionality what dimensionality is is the number of characteristics uh data set can have um so what you can get is a lot of confusing data all kind of stuck together in order to actually drill down and identify what a single ion face

actually looks like we have to do a thing called principal component analysis and what this is is is reducing the dimensionality of the data set so what we're doing is we're normalizing through one vector we're doing uh principle of Le squares on another vector and what we're trying to do do is basically reduce the number of Dimensions by which our our algorithm has to process in order to be able to create these images and so what you end up getting is these quite curs looking um Igan faces um I like the fact that it says predicted Bush and then true Bush I don't know why I find that quite funny so some of the benefits of using

PCA uh for this IGA face recognition is the fact that you you reduce the the dimensionality what you can see on the far right hand side side you wouldn't be able to tell who that is um using PCA we can drill down in those data sets and say this is likely to be you know Tony Blair in this in this context there's also noise reduction and feature extraction we're able to build an image we're able to understand this is what this person looks like does it align with our expectations yes and as a result you get that increased accuracy and in short that allows us to get kind of more unique and reliable identification and also uh it it just

benefits the the the the authorization piece the authentication piece that we're we're trying to enforce through this particular control now I want to look a little bit around Ai and human collaboration so if we think back to the limitations of artificial intelligence and machine learning we can actually draw out a lot of what we're good at and that is very much around the creativity and innovation knowing what to apply where adapting to unforeseen circumstances using that supervised learning model in order to enhance our algorithms uh in order for them to respond better to change and ultimately understanding intent so what does this mean for Security Professionals so if we look at what AI is great for it's great for data

processing but it doesn't understand context so how does this impact threat detection uh where understanding intent is important important you know I mentioned earlier looking at indicators of compromise we know that certain advanced persistent threats will have certain techniques tactics and procedures ttps in place so understanding the intent will help shape our our our models to be able to understand and and recognize those ttps similarly with indicators of compromise it's also really great for automation something that we pulled up previously but it also struggles to handle errors so we've seen that overfitting example where our data points are being skewed massively our models being skewed massively because of our our those outliers that's got a really significant

impact if you're using your your model to actually have an automated response so if if my if my EDR detects anomalous uh Behavior I it completely shuts off a user from the network what is the impact if that's actually legitimate so in short that human intervention is always necessary and this is kind of how you can build out an understanding of the collaboration between between Ai and humans so what's next this is a bit of a cop out to be honest and I was mostly thinking because it was getting closer to 6 o' that people probably switching up a little bit but the sky is is the limit and with the advances that we can

have in in kind of machine learning and developing algorithms for cyber security there's a really great opportunity to be able to optimize a lot of our our working patterns so where we can establish you know expected usual behavioral analysis it can help us uh in a sock um to understand whether that user's behavior is is natural or not and really what it represents is uh a really growing field and passion um with with people who are are really interested in developing these systems we looked back at the the hype curve and we saw that trigger for Innovation the reason that AI straddles those two areas is because AI often solves problems we haven't even thought about and quite often that's

kind of one of the the main driving forces behind the development and Adoption of of technology so I've included some resources I did think this a little bit like a university university lecture and I have provided like a resource list and um some citations for for everything I've included um but thank you very much for your time and I'll take any [Applause] questions hello so with your bro detection example you said that you don't need like PR processing the data set would that not just only show you deviations from the level of fraud that you have today if you have .1% of your transactions being fraudulent M are you not going to normalize that in that data

set to an extent so in that particular example that's identifying anomalies in in sort of perfect data to an extent you the reason that isolation Forest Works quite well is because it begins to if I go back to the slid it's probably best it begins to pull out those areas where there are that exist kind of fraud already so where where you see those kind of normal data data points that dra kind of drift off that would be your your 1% that would be maybe considered your acceptable level of Fraud and what we're looking is the anomaly that sits on the other side of that um there's lots of different ways uh isolation Forest isn't the only type

of um algorithm that that is used in this particular example um and so there's lots of other ways that you can kind of do it where you you do normalize the data but I think as a yeah as a as a off example of kind of how it could be applied um I chose I chose this one for that reason any other [Music] [Music] questions all ours fine just to regulate it world so

do so I think that was really interesting that that that committee was kind of brought together I remember because I remember sitting in on the announcement and and listening in and it's it's good to see especially the UK us trying to kind of lead the lead the force um in terms of getting that quality assurance around AI um when one of the things that came out as off the backs set of that that Community meet was the fact that actually adjusting education um and I don't know if this is kind of going to expand beyond the UK but there are there's plans in education to include um to make sure that people have an awareness of artificial

intelligence but also maybe a bit of understanding of how it kind of applies um and and things like you know making sure that people are are best suited for the jobs there are so there's like kind of more I suppose socioeconomic things that sit around artificial intelligence in terms of its place as a research tool I think it's great at optimizing and bringing forward um a lot of Technologies so it's been used in the medical industry for uh identification of kind of you know Alzheimer's early on Alzheimer's things like that it's been used in agriculture for for making sure for checking beehives for so I think there's definitely a use for it as a tool um but as we kind of

mentioned there are some limitations with that and that limitation will always require some level of human intervention um and I think that's where if you can get kind of agreement with those two things well yeah we'll be all right I think hopefully no skynets

you

closer do the progress made the last 18

mons it's it's always really hard to tell with research because I think if I kind of move back to this one and you know this is just this this is a slide that just kind of represents adoption technology within within public uh public sector in central government but I think this applies a lot to technology basis and to a lot of the research that's out there which is where you kind of go through this level of you know having these expectations around a piece of research and then uh maybe not understanding how to apply it or or actually how to define it really um so a lot of the stuff that is coming out it

looks it looks very similar but also looks kind of radically different so I think what it's going to what what'll happen is it'll be kind of period a plateau of of kind of understanding you know what the purpose is what how do we actually Define that how are we using it what are we using it for um and that will really be able to to kind of redefine where um kind of where we are uh with those various bouts of research but yeah it's it's one of those things where it's with technology you have that exponential growth in in you know the technology over time um and there's a lot of talk as well about that kind of

that human AI I want to call it the Event Horizon but it's where you get something that surpasses human intelligence and things like that and some people say that's 10 years away some people say it's 50 some people say it's never going to never going to reach there so it's really down to kind of like a time will tell kind of basis to to see how these different approaches and different pieces research actually actually uh interact with each other any other questions hi do you think

that

absolutely so you one of the key limitations with artificial intelligence is the fact that we are limited by our data sets and so the more data sets you have with greater degrees of accuracy the volumes of information that's being processed yes absolutely you can make those decisions an area where this is already being kind of Applied is actually in healthcare and looking at uh how we Monitor and how we understand the human body um and what data points we can take from that and things like the Alzheimer's research that I kind of mentioned and then also cancer research as well is already adopting a huge amount of of artificial intelligence in order to enhance the diagnosis um of

some of these diseases so I do think it's definitely moving to a p point where we're getting enough information in order for it to to be accurate and to to increase that level of kind of yeah cogn aess I think that's the word for it um but once again it comes back to the intent and what do we want it to do and and how do we want it to solve the problems in in how the algorithms will actually begin to take that a lot of the neural network stuff which uses percepton um it works on that basis of you'll have you know one particular percepton over here doing one particular function looking at one data set and

then they'll be communicating with another to see well what's the impact of my decision what do I think what do I predict and how does that impact this one over here um so yes in short it's uh entirely entirely likely entirely possible cool anyone else nice cool if you have any other questions or anything please feel free to grab me but yeah thank you very much for having me everyone