← All talks

GT - Overcoming Barriers in Security DSLs with BabbelPhish

BSides Las Vegas21:3345 viewsPublished 2023-10Watch on YouTube ↗
About this talk
Ground Truth, 12:00 Wednesday Overcoming Barriers in Security DSLs with BabbelPhish: Empowering Detection Engineers using Large Language Models The rise of detection-as-code platforms has revolutionized threat detection, analysis, and mitigation by leveraging domain-specific languages (DSLs) to streamline security management. However, learning these DSLs can be challenging for new detection engineers. In this talk, we introduce BabbelPhish, an innovative approach utilizing large language models to bridge the gap between natural language queries and security DSLs. We demonstrate its application to MQL, Sublime Security’s free DSL for email security, and its potential extension to other DSLs. BabbelPhish enables users to harness the full potential of detection-as-code platforms with familiar natural language expressions, facilitating seamless transitions from triage to querying and coding. We will discuss BabbelPhish’s architecture, training process, and optimization techniques for translation accuracy and MQL query validity. Through live demonstrations and user interviews, we will showcase its real-world applications and implementation options, such as a VSCode plugin. Join us as we explore how large language models can integrate natural language capabilities with the precision of security DSLs, streamlining security management and threat hunting, and making detection-as-code platforms accessible to a wider range of security professionals. Bobby Filar
Show transcript [en]

good afternoon everybody Welcome to bsides Las Vegas uh we next uh have Bobby filler and he's going to be giving a talk accelerating adoption of domain specific languages with large language models um a few things before we begin this presentation uh we'd like to thank our sponsors our uh Diamond sponsor is Adobe and our gold sponsors prism Cloud sem group and bluecat uh it's because of their support that we're able to put on events like this um so we really appreciate their support uh for cell phones if you guys have cell phones you please put them in the put them on in silent uh and avoid using them throughout the duration of the of the talk uh unless you take pictures and stuff and uh at the end if we have time uh walk around with a mic for questions um and if not and we do run over uh you can always pull the speaker aside for questions but without further Ado Mr Bobby he thanks a lot all right yeah appreciate it appreciate it yeah so uh so welcome welcome uh thanks everybody for coming out my name is Bobby filer I'm the head of data science at Sublime security email security startup this is my talk on Babble fish which is accelerating adoption of domain specific languages for large language models uh that's a lot of alliteration very early on so we're going to just move right into why I'm actually here um I work for a company that has a a query language I see a lot of shirts in here representing companies that have query languages they are pervasive um that's a great thing they make life a lot easier in a lot of ways they really open up your platform to more customizable ability uh you know you you get your engineers committing to things much faster everything's much more tailored it's overall I I think a great uh a great option and a great experience um but because they are becoming so pervasive security workers often have to learn like five or six depending on their full Suite of tools and that can be uh a little daunting I think particularly for uh new people uh in the detection engineering space um I think that initial onboarding period can be a little difficult as well when you really just want to pick up a tool and start using it and learn on the fly so that's really what uh this open- Source Babble fish kind of endeavor is about it's about reducing that barrier of entry and trying to increase adoption rate and onboard experience as much as possible so how do we do that I I think before we dig in there I just want to really drive the point home via like an anecdote uh if you will and this is one that I run across even though I'm a data scientist I still have to write rules for my company um just to make sure that my models work uh on the left you have your your hypothetical fishing email gets forwarded to your abuse mailbox you as a security worker see it and you're like all right there's some Adobe branding there's some language around there trying to get me to click financial information you know how do you go from that step to what you see on the right hand side which is a Sublime security rule um it's not overly complicated but there's a lot of nomenclature syntax kind of verbiage that is different certainly different than maybe uh Splunk or Sumo logic or or semrep or or something like that where you know you have a completely different domain that you're attempting to apply so how how can we make this process this jump from the left hand side of the screen to the right hand side of the screen more seamless and I think one thing that we can do that's often lost upon particularly the data science Community when we're like AI will solve everything uh and replace people and it doesn't have to be like that uh instead we should be taking advantage of the things humans do really well and one of the things that they do really well is they apply kind of an impromptu or or maybe I can't even think of a better way to describe it a translation process where when they look at an alert a piece of malware a fishing email they're running through a mental model built on domain expertise uh that they've crafted over an extended period of time uh and this mental model can can turn up a couple different ways but one of the ways that we see a lot is just a checklist this mental checklist they go through where each question they ask and the corresponding answer is is actually detection logic if you think about it um is this actually from Adobe is the language suspicious where did the URLs go were there any off failures in the headers each of these things are are really just Snippets of logic that when pieced together are a pretty effective detection recipe so what we want to do is really key in there and use large language models to capture that process to allow them to ask the question in natural language to learn the query language faster uh over a period of time and that to me is is kind of I think one of the bigger impacts large language models can make in the security space near term longterm tons of potential throughout a variety of domains I think short term there's a unique opportunity to increase usability of your product um those first couple of touch points with a new security platform making it as simple and intuitive as possible for you to get in there and start contributing um increasing speed and efficiency not only the onboarding process but of contributing to rules in detection logic within a within a platform um reducing the likelihood of frustration or coding errors which are commonplace inquery languages is somebody who's guilty of that all the time having a large language model was trained on real world uh working examples will will help increase the likelihood of the code being produced being correct um and then finally one that I I'm personally a fan of and works great you know you know at uh at hacker summer camp is just improve collaboration and communication anybody in your security organization can talk about a threat in natural language and it would be great for them to be able to contribute to kind of the security hygiene uh via these Platforms in a more natural way so with that sort of background what did we set out to do when designing Babble fish this idea of a large language model dedicated to natural language to code translation um in order to use an llm you need a data set uh Sublime security is uh is growing uh but we're not python uh we're not even Splunk uh there are not just troves of natural language and code Snippets readily available on stack Overflow or Twitter or forums and things like that so we had to get pretty creative uh where we pulled down this initial data set um and I'm hoping by sharing this you know those of you in the audience that work for similar companies could kind of take this as a as an opportunity to maybe do the same um we went through we used our documentation which gave us a really good background of syntax and the way you know we as Engineers describe the language that we're providing people the the schema the the way we break down an email and expose it via the query language has a lot of natural language descriptions in it uh that we could start to leverage likewise open source rule repos and our slack Channel Community slack Channel are really really rich data sources for not only real world Snippets that are effective but the way uh a diverse set of detection Engineers describe their work um so everything there uh led us to a pretty pretty decent sized data set we still had some more complicated or complex like compound queries that required annotation so for that we pulled in a group of detection Engineers internal and external to our company to like provide IDE a natural language description which is really cool um certain people were very verbose and methodical in the way they asked for PDF attachments and there were other people were like well I should just be able to say is PDF attachment and then it just spits out the logic and to me that's you know that that comes from experience that comes from expectation of the product and these are all things that I I think the large language model can can potentially help out with um when it was all said and done we ended up with a a decent size not a large data set but a decent size certainly enough to fine-tune a model with about 3,000 examples we uploaded that to hugging face uh so people could pull that down and start playing with it immediately uh I came from a couple of different data science groups that released Ember which is a malware classification data set very important to the open source kind of ml security space I hope that as we continue to grow this out uh this can be a another um another such data set to to help further further research so once we had our data set it was time to think about an ideal kind of large language model architecture it's a 20-minute talk so I'm not going to get into the to the guts of a transformer right now uh there are books and videos and everything else I'm just going to come to you as as somebody who wanted to provide an open source model the fastest and cheapest way possible so these this is like my my set of requirements um knowing that I wanted to do that a pre-train model was by far the most important thing pre-train models think of open Ai and and Claude and uh anthropic a few others um these are built uh they cost hundreds of thousands of dollars uh they're trained on tons of data on a variety of tasks C code translation being one of the big ones um but this knowledge base and the API access uh python libraries large Support Network where you could go and ask questions and get help uh all fed into this final like it needed to be inexpensive because we were just going to give this thing away for free anyway um so what we ended up settling on was a fine-tune gpt3 model um if you're familiar with that there are like three variants there's Da Vinci which was used for chat GP it's very it's very good uh it's a little slower and it's way more expensive and then you have Ada kind of at the opposite end which is very inexpensive um very fast but not not very good at code translation tests so we found kind of our perfect bed uh to sleep in with Cy um it's a mid- tier it excels it code translation tasks um it's very fast so when you think about integrated into a like a vs code plugin the translation step is is very quick um it's also very inexpensive to run inferences against which skin because we were giving this away we wanted we wanted kind of The Best of Both Worlds there um as I said earlier API access and cost from a resource standpoint were kind of critical you can see on the right hand side to train these like very sophisticated models now is like 30 lines of code um and it cost a dollar like a150 Max to to train which is an insanely good price uh it took like 90 minutes maybe and when it was all said and done we had a we had a model we really liked uh with with good API access infrastructure surrounding it and we were ready to kind of move forward so once we add a model we wanted to Benchmark it quick just to give the community and frankly ourselves some idea of how effective it was from a code translation accuracy standpoint um so we used Passat K which is an old information retrieval metric uh it's still very useful for things like translation um all that that says is you have K number of attempts given a prompt to get the right answer um we used three uh because three is about the max that you want to go anyway if you're going to provide these responses to an end user uh to try to evaluate and things like that we did relatively well um I think 98% of the time within three guesses it had the perfect response I think after one opportunity to get it right it was at like 90 3 94% so it was doing relatively well there um we also integrated that with a with a quick check with our mql kind of executable or evaluation engine just to guarantee that the that the output was like syntactically coherent and correct to avoid a lot of user frustration uh all these uh scripts and things like that are available uh on the GitHub kind of posted below as well from an implementation standpoint we we wanted to get it into hands of users as as quickly as possible um in a way or in an environment that encouraged them to use it so we were thinking that a VSS code plugin would be ideal for that the idea being that when they create a rule they're going to be able to use that code completion uh component much like GitHub co-pilot or intellisense or or whatever so I am not a typescript developer by any stretch but I was surprised how easy it was to get this off the ground using like event listeners to to capture the user comment a flask backend uh that a real developer on my team then then corrected for me and and made it uh all type script uh but push the push the prompt down we did a little bit of pre-processing in order to prep it to send to open AI open AI came back with the response we validated it and then pushed the screen using that text editor. edit function in in typescript and in the end it worked it worked really well um and we we've made that available you can pull that down from the marketplace today uh the actual source code is available too uh so if you want to rip and replace your own back end or take out our model and put in your model you're going to get that same sort of co-pilot style experience um yeah right off the right off the shelf um I was having issues with uh with demo life so I I I basically recorded and I'll talk through more or less what it does but I mean it's a demo where you type in English and it translates to the query language which is not like the the most crazy demo that you can show right now but the idea is is like you know thinking back to that adobe example it's literally just as you're thinking these questions being able to type them out and get the appropriate query language back um and then having that validated using the The Interpreter on the back end is I think a nice way to learn um I guess I use mql on a daily basis and I still run into situations where I'm like how do you how do you move through this for Loop or or how do you do this part and it would be really nice to just be to be able to ask that question in the way I'm thinking about it uh as opposed to scouring through docs and and doing that context switching um so yeah as you can see you can get like moderately sophisticated in what it's doing uh some compound queries there's plenty of room to kind of grow and evolve this capability and I I'll touch upon that in a minute but yeah this is uh this is the the tool kind of in action and like I said you can pull this down today and and start playing around with it um any any feedback is always welcomed uh too so you could you could hit me up on Twitter or x uh LinkedIn whatever to to kind of talk about next steps and and how to improve the the process um the last thing that we really want to do is is get this in the hands of not only uh customers but we have a large open source Community as well get it in their hands and start to understand whether or not this is a value ad or a hindrance and so we have a kind of robust set of user interviews lined up for the fall where we're going to do like a head-to-head against intell ense which is like the vs code built-in autocomplete and babble fish to try to understand task completion rates uh ability to avoid context switches um what happens if Babble fish produces the wrong uh code like how does a user cope with that and and what does that prompting process look like being able to understand that and attempt to quantify it will be uh instrumental in helping us kind of evolve the the tooling and then as far as a pass forward I think what we talked about today or what I talked about today is primarily for those initial touch points into a platform um not everybody wants to use natural language I don't think it should be force-fed or or anything like that I think it is a valuable way to get exposure to a new query language and become familiar with it but what we really are trying to do is move to more context aware code completion um using things like vector stores to capture common Snippets the people use in their day-to-day things like firsttime sender is a good example that's a that's one that's consistently used it's a fairly large query uh snippet so being able to recognize that using like fill-in the-middle prompting which is what GitHub co-pilot uses they they look at where your cursor is they look at everything that came before and after and then they populate you know the correct response so being able to get those two things in there as well as just continuing to increase the size and scope of of the data set for the uh the open source Community is is going to be pretty powerful I think so yeah that's the uh that's the talk uh a nice nice tight 18 minutes uh if there are any questions I'm more than happy to to answer them you had oh you had semantic parsing on the last slide what can you get into that more or yeah yeah so the idea there is to more tightly integrate our uh and this could be for whatever query language but the actual interpreter and and get that in there to help to right now we rely on the GPT tokenizer which is okay uh but our like our own internal uh interpreter would do a much better job at tokenizing at the level that we needed to so capturing those semantic relationships and and in theory making the prompting uh that much stronger as well but yeah that's that's a good question any plans to support like a test driven design style strategy like I don't know how to get to this kind of answer on say a test data set but I'd love to get there on the big data set now yeah so I think um you know supporting like the test driven development approach is something that would would make a ton of sense for us um I could see that being extremely useful for you know other vendors that you see kind of in the hall as well it's just this idea of um yeah taking a look at we we've thought a lot about oh geez what was it um like snippet back to description as well is a way to better understand what's going on uh we have a lot of a lot of examples where a user will check in a rule that is heavy on Rex and you're like that's great I'm sure that's useful but I have no idea what that does so being able to feed that to a model and have that break down for you what it's attempting to do to determine whether or not to allow it into this community rule repo uh to help out with testing and and things like that is is certainly I think a natural extension yeah any more questions all right oh one more uh my other question is you you're using gpt3 just like curious why that over 3.5 or four is it easier to find tun or uh yeah I I found it to be uh super straightforward to fine-tune I think you know progressing to those other models makes a lot of sense I I personally just like the ability of reproducibility it'd be very easy to pull down that data set from hugging face and then in a couple lines of code just get it to where you know I I got it uh anybody in the audience could do the same thing uh I think as GPT 5 comes out and 3.5 and four kind of kind of take the place of three as far as the uh the API architecture I think that'll be that'll be ultimately where we want to go because it's just bigger context spaces and and stuff like that which would be wonderful yeah all right uh I I promised I'd give a shout out to my seven-year-old daughter who heard that it was being streamed and now thinks I'm a YouTube Star so Ka uh I'll see you in a couple days yeah that's here