← All talks

AI in CyberSecurity: How to be a 10x Engineer

BSides KC · 202323:5298 viewsPublished 2023-10Watch on YouTube ↗
Speakers
Tags
About this talk
Sam Wallace explores how large language models and the LangChain framework can automate security workflows and address the cybersecurity skills shortage. Using real examples—including alert triage and playbook automation with vector databases—the talk demonstrates practical patterns for building AI-augmented security tooling, from proof-of-concept to production considerations.
Show original YouTube description
This abstract explores the exciting intersection of AI and cybersecurity, showing how it can transform engineers into superheroes in the digital world. We delve into the awesome ways AI can level up your cybersecurity game, making you a 10x engineer. This talk will be largely focused on an open source tool called "langchain". This tool simplifies the process of creating LLM applications. I will go over avenues of using LLM's and walk through examples on how everyday users can also use AI to be a force multiplier.
Show transcript [en]

thank you like you said you didn't have much choice but to be here um there is no other track uh my name is Sam and this talk is um is AI and cyber security and how it be a 10x engineer um before I get started here um this is this talk is not sponsored I'm going to be mentioning some company names some vendors um really just just don't sue me I'm not I'm not saying their name I'm these are all my personal opinions they don't reflect my employer all those things don't sue me um here's the um here's the agenda so we're GNA kind of have an intro we're going to talk about Ai and how AI is is

moving incredibly fast um we're going to go over sort of an elementary llm what is a large language model um we're going to go over an open source tool called link chain we're going to sort of go over how that works um kind of my thoughts on link chain and how they' sort of changed over time and then at the very end and this this totally wasn't planned but um we're we're going to kind of build on the whole sock thing and we're going we're going to do some automation using um and um using Lane chain so um who I am I'm uh I'm Sam Wallace um you know I I have Sears they're all

expired I don't pay the annual thing I've been the field about 12 years now um I'm a proud veteran and um yeah I I really I really just enjoy software I'm sort of a a cve dabbler um but yeah really really enjoy software and um yeah I I've got I have a website I kind of post on here and there but you know nothing too crazy um so yeah why why you know why why should I think about Ai and I I would kind of put this all in sort of this the camp of like security automation is like um like what what can I get and what's the advantage of this um really if if you're if you're paying

attention there's a lot of U there's like a high demand for security professional um I'm I'm sure you've seen lots of companies out there they're like hey you know where are all these people at we're 30% short where are these cyber Security Professionals and they really just don't exist um there's there's there is that like sort of demand there and you know that hey that's good for us it keeps our salaries high and all that um but um so that that kind of goes into like why so again why why AI well the thought is is there's a there's a huge opportunity right now um for people um to address that cyber security shortage so the businesses have risk and

if the 30% down the 30% has got to come from somewhere right so that's where the automation comes in and that's why using and leveraging Ai and LMS um can really satisfy that Gap and then um to to address this I I get this question a lot about people say hey I you know where do I where do I get started with like Ai and LS and like what's the chat GPT thing what are these models and all this I mean there's really a lot to think about and I I think if I had to wipe my brain and start over I think I would I would not think about like cyber security at all I

would just think I would just be a really good software developer I would learn the basics of i' really Learn Python is a strong data science language um I would I get a strong like software Foundation then and then I would maybe start learning cyber security and I think that's that's that's that's definitely the direction um that that I that I would go um so keeping up with um AI is it's it's definitely a challenge I would you know we're in the cyber security space so we're used to that change right we you so that kind of fast-paced it's keeping up as hard and there's a lot of sort of Maintenance there and I would I

would argue AI to an extent is is moving even faster than cyber security um some you know my thoughts have changed over time and all of this and the code I wrote six months ago just doesn't work um because they've refactored the library so many times the documentation I was looking at six months ago no longer relevant so everything it just changed so much um and some other things that have have sort of um went on since I've submitted for this talk is ohos you know they they released the top 10 generally targeting like applications well they released one um for large Lage model to sort of address that and all a slide on that but really that could be

its own talk um I just want to mention hey I didn't even know about that when I submitted for this talk and then you know lots of large companies like Adobe and adbs and azir and Google they're you know they're pumping tons of engineering hours in this they're putting a lot of money into this um so it's all very real and it's it's it's really not going away um and again so here here's here's just the top 10 I'm really not going to cover this I just want you know it exists there's security time and professionals going into this and looking at like okay like Hey we're building these large models apps and like what are are new attack like what

are the new attacks that are very specific outside like traditional like crosslite scripting and SQL injection like what else do I have to worry about when I build these apps and this is like their version I think they're on version like 1.2 now um but it's it's finally ver you know past version one so this is kind of what wherever landed on the on the top 10 and then you know more exciting things in the AI you know it's moving fast um it can now um see it can now hear and it can now speak which I I know what you're thinking and I'm thinking the same exact thing um you know this is how you get Skynet right this is

definitely we're we're almost there we're about to arrive at Skynet um so I thought that was a fun little thing um that they're they're working on um and then yeah so it's it's moving fast and you know really like do you people feel like hey am I am I behind is it too late for me to kind of like start learning about lmms and start building apps um is it is it you know and I would say I would say no I would say you're exactly where you need to be I think going to talks like this and then learning is um is is exactly where you need to be so don't feel overwhelmed everyone's overwhelmed trying to keep up

with just cyber security alongside AI so you know you're not feeling anything abnormal here and I would I would argue um you know joining sort of later like the late adopter may have an advantage over like an early adopter like myself because you don't have all this Legacy knowledge and all this code to rewrite and all these like pre-existing mindset you kind of come in fresh um so I would argue not behind it all great time to jump in great time to start learning about this stuff yeah and this uh this bear agrees with you um so a brief like this is like a one1 this is really not the focus of the talk about explaining exactly what L is

but um basically think of that as as there's there's input and there's output you know you you ask questions and you get back answers right that's kind of like fundamentally what's going on um there's you know thousands and thousands of engineering hours and money there's companies with large data pipelines where they process documents um there's there's really a lot going on with that you'll hear the word parameters here and there generally more parameters is it's better um but it's sort of like a diminishing return so if I say like 14 billion parameter model you know generally that would be better than seven billion but it's not twice as good it's maybe you know so there's some of

those verbes but really there's like text generation there's text translation there's summarization and then there's like your your general Q&A um use cases and then um you know a lot of a lot of companies out there they're all trying to like roll their own like Hey we're going to build our own large lineage model we're definitely going to do this and really the the problem with this thinking is first it's it's very expensive to do that it's not cheap at all to do that and then really the the the problem is on line 10 here is I'm I'm using the open AI library and I'm I'm asking a question and under 10 lines of code and I'm getting back answers and

it's it's I mean it's it's going to be incredibly hard to to be um to have a better model than what they're producing with the 3.5 Turbo with the new Da Vinci ones I mean they're they're really I'm not saying you could beat it but it'd be so expensive to do it um and it's incredibly easy to use their models um and I've had I've had a lot of success doing that um and then you also have all the same risk that they face as well with hallucination incorrect data um and you have to solve all the challenges

yourself and then um you know you you also have with LM you have like limitations with like hey you know they're really not going to know about like real time context so in like security world you know IP IP data IP Intel like it lasts like a day right it lasts even less than that it it goes quick uh maybe you've got an alert that context it needs to be like from right now and um you you know so these these LM limitations um definitely need to be addressed to even consider them using any sort of security capacity uh given how how things and how quickly they change um so I wanted to cover um in the

LM world the solution that sort of gets around this is called using a vector database it's essentially on the left side you'll see a document and then you'll see it converting into embeddings it just converts it into numbers really it's it's it's it's nothing more than your prompt includes that net context whenever you answer that question um there's a lot of different so I've got chroma up there um but there's a lot of different um Vector databases out their postgres now they have their own extension PG Vector that lets you do it and there's like SAS tools like pine cone which are pretty popular um but I would say chroma chroma is definitely like pretty good for like

Qui quick uh proof of Concepts and then um yeah so let's get into link chain so linkchain is a um you know it's an open source library that is intended to build L apps very quickly and um linkchain is not a large model um it's not like it's it's a it's a tool to like build AI apps it's not itself so think of it as like the think of it as like shovel so here um Lang chain is they're not selling you an llm I mean it's open source anyways but they're more so selling you the shovel right so you can build your own tool and like I said it's it's open source you can find it on GitHub you

know it's got a ton of activity you know you see in the top right all the stars it's got a ton of commits and then like when I submitted for this talk it was probably maybe like 1,800 commits or so and now we're well over you know 4,000 and um yeah there's an incredible amount of activity with linkchain so um yeah definitely recommend it's just on GitHub totally free to use when you think about link chain you you really should think about it being like an orchestrator in the middle so you've got your problem you need to solve and L chain is sort of that glue in the middle that connects all your all all the different nodes so in the top

there we've got our Vector store which is kind of what we talked about before we we take the documents convert them into beddings we need to store them somewhere it's just it's just a database um and then on the left side you've got your large lineage model so think like your your 3.5 turbo or whatever model you're using link chain would be that sort of middleman to hey let's query the database get that context just in time and then let's um add that to our prompt which then goes to our LM um so it's it's sort of that glue in the middle that that connects all these Concepts and that's really why it has a lot of

good um really has a lot of popularity in the community just because um they make it easy they build a lot of those Integrations for you and um it's and and it's all in Python too which makes it even easier to use um so here I I wanted to demonstrate like what what what's really going on here and like all the layers of abstraction so here's you you know you're happy you're using linkchain but really linkchain is just using the open AI library right they build that code for you and that's really just using the python request Library which is really just accessing the chat GPT API which is really like backed by python which

python is just an abstraction on top of c and you kind of see what's going on here is like whenever I'm using L chain there's like there's like a mountain of work that's already happened in layers and layers of abstraction um and you'll see um you'll see a tweet by Lex here um and it kind of it kind of makes you think about it all a little bit differently and kind of all the work that go it kind of makes you appreciate the work um and it kind of confirms my mind that you know it really is Turtles all the way down for sure um and then like like I said my my thoughts on linkchain like since I've

started using have have kind of changed over time and I I would say it's it's like um it's an excellent library to try out and use and sort of get familiar with all the different like concepts of building an app using LM um but it it feels a little like leaky in places what I mean by that is like like say you're say you're playing a video game say you're playing the new you know balers Gate 3 and you're having a really good time and then all of a sudden you get like a little visual popup and it says hey you know you're you're your graphics card is out of memory and that that feels really bad and now I'm like now

I'm now I'm aware I'm not playing a game I'm actually on a computer and like there's like errors and that's how that's how link chain feels at times um so so I you know there's a lot of you know there's a lot of controversy here like hey is linkchain worth using is not worth using I say definitely definitely check it out um as for like a production use case I'm still kind of on the fence on it um but I think you know I think I mean you can see all the activity it's it's going to get there um so yeah so here's here's sort of the the core of the idea here is like

hey could we use could we use a large language model to effectively address one of our problems so in this specific problem here um we want to build a system that can look at like sock alerts so most of us in our career at one point have been a sock analyst or we you've looked at alerts we've got EDR tools that's all great right um the expensive part about all that is actually reviewing those alerts and then closing them out um so really let's let's build a system that's f fast Let's Build A system that requires no humans and let's build a system that has that alert context just in time and then let's also make decisions based on that

um that data um so like this is a very very simplified sock version like an alert comes in so we've got the top left we've got let's say an EDR alert comes in um hey it's late at night maybe it takes a half hour for someone to jump on to take a look at it and it takes them you know they're they're they're waking up and takes them another half hour to sort of say hey this is a this is a true positive and we should really like kick off our instant response Playbook and go through that process but here's my here's my thinking on this is you know what if we could build a system that hey it automatically

looks at that alarm and does that analysis for me and I have confidence in that system enough for it to automatically close that alarm um so it's sort of a it's sort of a different concept up there where it's um a little bit different than security automation because we're actually using the LM to derve um results from that um alarm data um so let's so we're going to kind of go step by step here how how you might want to build this app out so on the left side think of this as whatever EDR tool you have whatever alert maybe you you've got it doesn't matter it's an endpoint or whatever let's just call this this is an alert that happens and

we would expect an analyst to respond to this in a certain amount of time um but in this instance we're going to our our AI system in front of that to sort of handle these alerts and we're use linkchain to proxy that so like I said with the um the vector database which is where we're taking the documents so think of the documents as like your incident response playbooks think of it as your tool user documentation your it standards and then like any sort of Wiki Pages we take those documents we convert them into Bings and then we allow um our system to be aware of that um so in the same way that analyst will be aware of those

documents this system would be as well but you know we want to build a system that works all the time and it works through changes so we want to make sure that our documents update as well so whenever someone updates a doc or maybe the tool has a new version with new features in the tooling we to make sure our system is aware of that so we're going to build in a um we're going to build a system here to sort of refresh our documents and if it is a new document it's going to refresh those embeddings as well um so here on the right side like I said linkchain has like an ecosystem of

support so like a lot of the tools you use like think of like your virus totals your gray noises your Yuba Solutions um linkchain has a probably has an adapter for that or a easy way to interact with those systems so say your your alert comes in and it has an IP address well wouldn't it be great if my system could automatically make that request come back and then use that um use that response as context to make make a decision um so here we can do that um we can connect to you know our Yuba our gry noise and we have all that additional context that we provide in the prompt to make decisions and then of course you know

hey our identity people get really excited about this hey you know it be really awesome if the alerts about a user we should pull in like all the active directory stuff we should pull in all the metadata hey is this a VIP user we should add that as context as as well maybe that'll improve the accuracy of our results on the system L I won't I won't go too deep into this but linkchain has a concept um called agents uh basically it's think of it as a way it's a series of prompts where it's asking would it be helpful if I looked into the vector database would it be helpful if if I queried virus

total and depending on that it would make further decisions um it's kind of magical to extent um but it's it's uh we'll just kind of leave that to where it is um and then so once we get all this context so we have our alert maybe it used some documents from the vector database maybe it made a few API request and then actually it puts that into the prompt and then we actually query the large Lage model and say hey you know what what what should we do with this um you know this is this is going to be more Enterprise you right so we want to make sure we're sending logs off uh we want to make sure hey if this

thing actually closes an alarm we should probably have like a slack alert so we're going to do slack alert and on the left side this is kind of to demonstrate like we're going to be we're going to do continue analysis to see if this system is effective and is it is it operating the way that we expect and I um to prepare for this talk I I wrote some code so you can steal all my code if you want um basically here what I'm really want to focus in on is on line 119 you see a really ugly Powershell command um really hard to read right like I mean if you're in if you've done like ocp or any sort

of like pin testing you kind of know what's going on here but if you're like a junior analyst you this is this is really hard to read and really hard to like kind of wrap your head around um however large W models are really excellent at breaking this down and and making it easy to understand so here you can see uh we're actually building our prompt using our context so we're saying hey you're a AI program who only speaks in j and the um the Json part there is is a bit magical so when we say you only speak in Json it means we're gonna only get back Json and Json objects um easily parsable using

Python and you have access to the key value pairs on the bottom here we're um giving an example response of what our expected output would be so we're asking the model hey would you consider this malicious uh we're going to ask it what's your confidence that you're correct and then then on line 81 we're actually going to ask the model should we should we close this alarm is is this is this a false positive and would you recommend closing the alarm and then um here's me just running it locally um no no pretty UI here just running it right in my prompt um the main point out to here is the explanation um is correct it was able to

identify it was a reverse shell um which depending on the analyst's level they may have not been able to do that um the confidence score is 990 so it's very confident um and the close alarm Boolean here is set the false so this one this one would not actually close the alarm so this is this is exactly what we want from the model and it's doing um it's doing a lot of the analyst work for me and I don't necessarily have to do that myself if I trust the system um so yeah so really the question here is like should I should I trust AI is this something that I could even do will this work at my company um my my

thought is like you know any any good security professional is going to say I literally trust nothing and I think that's where you should be as well um what I would say is why why not why not get metrics on how effective this would be and why not get metrics on how effective your current solution is and then and then compare that and is this is this better um and if it is then I would I would argue definitely worth exploring and pushing that development work um it can certainly be a force multiplier at your business and I I have found that to be um I found to be really effective in some scenarios and not so

in others but it really depends on the use case you're pushing um yeah and then yeah kind of my my closing thoughts on like Ai and lmms and all that um yeah cyber security and AI it's it's all moving very fast and I I would say you're you're you're not behind you know you're exactly where you need to be if if you're trying to get in the cyers scary space or learn about AI it's it's really not about like how how fast you're running it's more about the direction you're going um so please don't feel overwhelmed I know there there's really a lot going on in this space and I I find myself struggling to keep up at

times as well um so totally normal but you know you know this little dog says you got this so I feel better about that um here is some links so I have linked the uh know OAS top 10 and I've got a little website I write stuff on occasionally and then also the um the code is available on

GitHub