Who Makes the Rules?

Name: Who Makes the Rules?
Uploaded: 2024-09-16
Duration: 16 min 42 s
Description: A 16-year-old researcher presents findings on using prompt engineering and ChatGPT to automatically generate Semgrep-based security rules for detecting web vulnerabilities. By systematically applying zero- to three-shot prompting techniques across five common vulnerability types, the project demonst

BSides SLC · 202416:4233 viewsPublished 2024-09Watch on YouTube ↗

Speakers

Meghna Vikram

Tags

CategoryResearch Technical

TopicAI Security Vulnerability Research Web AppSec

ResearchMethodology Technical Deep-dives

StyleTalk

Mentioned in this talk

Tools used

Semgrep

Platforms

GitHub

Service

ChatGPT

About this talk

A 16-year-old researcher presents findings on using prompt engineering and ChatGPT to automatically generate Semgrep-based security rules for detecting web vulnerabilities. By systematically applying zero- to three-shot prompting techniques across five common vulnerability types, the project demonstrates that AI-assisted rule generation can not only match but often exceed existing human-curated rulesets, particularly when combined with domain expertise.

Show transcript [en]

so hi everyone I'm going to go ahead and get started um today's talk for me is going to be a little different from what you guys have probably been watching um surprise surprise I'm 16 years old I'm from Orlando Florida and I flew all the way up here for my talk today and um your talk earlier was like really inspiring so like hopefully this goes as well as yours did um so just to get started um today's talk is going to be me kind of presenting some research that I've been doing for the past uh 9 to 10 months um it's something that I've been researching and I've also created a prototype and so hopefully I can learn a

lot from you guys as well as you guys learning from this so a little bit of background of me as I said I'm 16 years old I'm from Orlando I'm in 11th grade at a high school called Hagerty High School um my interest in cyber security started when I was 11 years old um it started when I was watching my dad and I started wondering what is this guy doing on his laptop all day there's no way that he's that busy and then I really started to learn about cyber security which is exactly what my Dad does and it was just so interesting to me how there's so many different things I've never really taken into account when it

comes to the digital world and that's where my interest started and so at the age of 12 I did my first science fair project and it was super cool it was comparing Alexa versus Google home because I thought that was so cool for AI um and that was my first science fair project and it took me absolutely nowhere I got third place um but I did actually receive some recognition from the United United States Navy because the one thing that was really driving me forward was my passion and that was super motivational for me and so I moved on to continue doing science fair up until now when I'm currently in 11th grade um and I think one of the biggest

things that I've learned along this journey is that you should never give up because the first year that I even made it to States in person was this year and I've been doing science fair for almost six years now and I'm also going to be going to the international fair and that just went straight from doing absolutely nothing to going to the highest honor that science fair holds so that's a big lesson that I've learned thank you um so to kind of get started on some background of my project um hacking is something that I'm sure everyone here knows about um the one of the these are the three big ones that I feel like everyone's probably heard

about uh the first one was in May 2019 for first American Financial Corp um due to a just a simple web vulnerability um access to private information that was not password protected was leaked and eight 1885 million files were exposed including mortgages um bank account numbers wire transfer receipts just things that are honestly really bad information to have in wrong hands um similarly in January of 2021 Microsoft their email exchange servers were hacked and similarly in December 2023 the New York real estate wealth network was hacked and 1.5 billion records of their database and 1.16 terabytes of data were uh exposed to an unknown source so to this day we do not know know where that

data went and who was actually really really affected by it and that's just the big scare that comes with hacking and this kind of introduces my project and I stumbled upon um this tool called sup uh sup is basically um this tool created by a company also called srap um it basically it's a platform that has a large database of predefined coding rules and these rules are in the form of a yo file and these yo files are there for every single type of web vulnerability that exists for every single language out there like down to like C++ Ruby and for every single vulnerability so you can even imagine how large this database is and srap it's

this it's a tool where you can put in your code and srap what it does is it analyzes your code as its input and then it matches supposed vulnerabilities that can be found in this code that's been inputed and it gives you an output which is a rule that the develop like you the developer should follow to make your code better and I mean this is an absolute Ely amazing program and this could help developers in so many different ways but I was like what if we can make this even better what if we can really build on what sunre already has here to kind of create a novel work in progress so basically just to kind of

give a little bit more background I'm sure everyone here has heard of chat GPT and chat GPT is just a very simple input output function you give it something and you get something but the thing is when you want to get something you want to get the right answer and you want to get it fast you don't want to be sitting there asking no again no again no again because that's what I do and it's really irritating but the thing is with chat gbt the more important part of it is actually the input you actually want to give it an optimized input to in order to get your optimized output and that's where prompt engineering comes in and

one part of prompt engineering is called shop prompting which is kind of what I really decided to research a lot and Sh prompting is basically when you give Chad gbt an input you want to make sure that you're giving it enough examples so a zero shot prompt is just saying hey do this no contexts no nothing just Bam Bam one shot is like okay hey do this here's a little bit of context two shot is hey do this here's some context and here's an example three shot is two examples four shot is three examples and you just keep going and going and you want to give it as much context as possible to receive your optimized

results so now actually going into my actual project I researched five of the most common web vulnerabilities um because I as I mentioned earlier with the hacking vulnerabilities are super super common and they are the reason why hacking happens so these are the five that I decided to research um the first one is broken authentication and that is basically just compromise passwords compromise user information things like that um denal of service is the stuff you see in the movies where all of a sudden your computer starts flashing and things start popping up and you can't really do anything and stuff's going on on the inside um there's cross-side scripting which is basically just when the cyber security um attacker is

injecting like malicious scripts and stuff into your code eval injection um SQL injections they're also similar um an eval injection is putting something into your code that inside the eval function and SQL similarly with SQL code and so these are the five vulnerabilities I decided to research and I kind of started looking more into chat GPT itself and I wanted to see how I can um use shot prompting how I can use these different vulnerabilities to actually come up with something that builds on what SRP has already created so this is the Chad GPT playground and as you can see it's kind of like on the right side you can kind of see it's Chad

GPT by itself you have down there you have what you input and then it's giving you your output in the thread but on the side you have kind of like your customizable GPT and this is something I didn't even know existed and basically you can name your Bot I named it science fair I know it's original and then on the instructions you can actually put in what you want Chad GPT to focus on and you can kind of see in the screenshot here I put in some grep rules and this was stupid I hand copied in everything that I thought I could need but then I could have just copied in the repository link and I found that out like three

months in but basically you get to kind of customize your GPT and then you get to choose what model and for my model I use GPT for and you just kind of get to know more about what CH chat GPT is actually capable of when you're actually feeding it info because machine learning is the whole thing about this you don't want to already utilize what it has you want it to learn more and you want it to become your best friend so after what MC messing around with Chad GPT I came up with this thing called a prompt dictionary so as I mentioned earlier you really want to have that optimized input to get your optimized output so I kind

of created a standardized little dictionary for my um zero to3 shop prompts where I was going to put in like the same prompts over and over to kind of get Chad GPT learning like certain keywords for it to learn to get better and better results as my trials went forth so my zero shot prompt was generate a surab based rule that detects blank in this language one shot was okay here's some code so now it actually has some code to base itself off of and then it will detect a vulnerability and give me the rest then with two shot and three shot it was similar in the sense two shot was here some code and then I also

gave it an example of fixed proper code so now I have something to kind of compare like here's the bad code here's the good code here's the rule in that instructions tab let me see what I can do and three shot was the exact same thing but there's two examples so moving on with the prompt dictionary and just everything in um in consideration I came up with my little project workflow and how I was actually going to carry out this experiment so the first big step is I actually had to find this vulnerable code how was I going to find so much bad code to actually run an experiment off of well that's where this web goat repository

that I found online comes in so web go is just this huge GitHub repository of a bunch of different pieces of vulnerable code and what I had to do is I had to fetch it and I had to go through and I had to input that into my prompts when I was in those little red example places and then I had to go through my prompt dictionary and I had to go through okay here's my zero shot experimentation one shot two shot three shot and then I actually had to make it detect the vulnerability with the those prompts record the vulnerability but then when it's actually giving me my output what am I getting so obviously like with with

surab I was going to be matched with a rule but here the novelty of this is that Chad GPT is doing what Su grap was doing generating that novel surab based rule but the cool thing is is it was actually doing way more than what I asked it to because chat gbt knows it knows that as someone who's using it it want it wants to give you what you asked for and more because it wants to be as useful as possible so Not only was I getting a yo file with the rule based that I should follow as a developer I was also getting an English summary because when you look at the yo file

like I didn't understand what it meant and I think one really cool thing was that the English paragraph that it would give was something that I didn't ask for and it would end up explaining to me as a baby developer almost that you need to really understand what the rule is saying and then later on as I went forth with my trials it actually ended up fixing the code for me so yeah it gave me the rule for me to learn yeah it explained the rule but then it also ended up debugging and fixing my code for me as I went forth but I never ever asked for it and that's the beauty of the machine learning so then once I get

all my output I had to filter through it because obviously as much as we can trust AI we also can't so you have to really filter it by syntax and what you're actually looking for to make sure that you're getting the best optimal results and then of course collecting my data so when I got my GPT rules my GPT output I also had to compare it to what already exists with srap because srap is a company that's with so many like sorry with so many people that have way more experience and knowledge about this subject that AI I'm sure it does have but semrep has that human background to it so I wanted to compare the semrep

rule to the GPT rule to really see if GPT was doing anything novel so I came up with this little scoring Matrix um on the side you can see the 1 two 3 four five one and two were basically where yeah okay GPT was able to do what I wanted it to do but Su grap was just inherently better um then for three it was basically where sup's Rule and the GPT rule were almost identical for four GPT was able to elevate the risk of what I was actually asking to do for example like surab would say okay here's his vulnerability here's a rule to it GPT would find an extra rule or it would

find a little bit more of a importance to that rule that SRA wasn't giving me and then five is it did both it elevated the importance and elevated the number of vulnerabilities that it was able to find so this is kind of an graph to kind of summarize what I found so this is an average of all the trials that I did so with the five vulnerabilities that I explained earlier GPT um with a three shot prompting and somewhat with a two shot with the three shot and that honestly made sense it was the most um defined prompt three shot prompting was able to come up with almost nearing the four and five range every single time

except for denial of service and it was able to come up come up with more to feed me and more to actually work with in comparison to srap so yes srap was able to come up with novel um rules that are like hand filtered through by humans Chad GPT was able to do so much more and give me so much more so them coinciding and working together was giving me the most beautiful beautiful output that I could ask for as a developer and so now moving on I kind of wanted to elevate my results here like okay three shot prompting is getting me nice results but I kind of wanted to delve a little bit deeper so I talked to

one of my mentors and he introduced me to a cwe so a cwe is just a classification of lots of different types of vulnerabilities and so what we did is we wrote a short python script to kind of make a list of all the cwes out there so for example you can see the little defining numbers like at the top um the very first one cwe 787 that's some vulnerability but there's 39,380 different instances of this vulnerability so surab it was almost like a dictionary in the sense like okay here's this rule here's how it corresponds to this vulnerability but in reality there's almost 39,000 different instances of the same vulnerability in a different way so what's really cool is

that I had to take into account that not only am I trying to novel um generate a novel rule this rule has to take into account how many different ways that this vulnerability could be found because the way I code isn't the way that you code and that goes for every single developer out there so what I did with these cwe is kind of understood a little bit more and I actually decided to write a shell script to kind of connect all these things that I've learned together so um it'll be playing in the background this is kind of how my script works so step one is just I put in a code. JS file um and it did a syntax

check just to make sure that syntactically everything is correct then it utilized a python script to iterate through those cwe lists and then it actually calls the GPT function using an API key and Chad GPT is now comparing the CW that were already um found by step two comparing it and then running it through security Rules running it to um through to see when GPT finds the applicable cwe it has to then a like AI generate this rule so the rule is now generated it's given to me in a yo file I get that little paragraph then I go through and it filters again and then step five is honestly the coolest part where it

actually goes through line by line and filters to me what went wrong and then actually filter through and fix my code and debug this code so this kind of summarizes here step five is going to load in a second if it does there we go okay so it actually filters through and you can see that I actually went through my code and then it at the bottom you can see the explanation and notes so not only is it going through and fixing line by line my code and explaining why it's actually doing basically the dirty work that I would have to do after so it's not only is it teaching me but it's also doing

stuff for me and as a developer we're all lazy and so this is honestly a really nice blessing in disguise and so just to kind of discussed the implications of this it's a very very baby prototype it's something that like I'm only 16 this is what I'm able to do with my scope with the help of my dad and two professors but the implications of this is huge because some GP yeah it takes a lot of effort to create the rules that it does create but it takes so long and AI is there and AI is doing stuff for us and if we can actually compare samra's work to something generated um by generative AI it can

actually help so much because you can have so many more rules you can have more instances of these rules you can apply these rules in so many different ways and then these the implications of this is huge because hacking is something that happens everywhere sure these big scale companies are getting hacked Microsoft's getting hacked the um first Americans getting hacked but what about the people in like third world countries who have been working for 15 years to maybe get $25,000 to their name and then that money is wiped instantly and there's nothing that they can do because their small Banks aren't able to protect their data the same way that these large Fortune 500 companies are

like able to and so the implications of this project are huge because you can not only stop big scale hacks from happening that affect countries but you can also help the the little people the people that aren't really actually thought of as much and just to kind of finish off I want to acknowledge um my two mentors that really helped me and my dad who's right there um so my Dr Plante is a professor at seson University from where I'm from um Dr eoth who's also from sson University and my dad who's also from sson University um just a lot of all the zoom calls that happen at the most random times to kind of help me

like learn what I've learned and help me get to where I've got to um and just like their excitement as really motivated me to come as far as I am because I'm 16 I'm a girl and I'm in a very male-dominated industry and it's just something that really helps motivate me to keep going um so just a big thank you to them and yeah that's pretty much it thank you [Applause] you um any questions questions

Who Makes the Rules?

Related talks