← All talks

Who Makes the Rules?

BSides Knoxville31:50285 viewsPublished 2024-07Watch on YouTube ↗
Speakers
Tags
About this talk
Meghna Vikram presents a research project combining generative AI with static analysis tools to improve vulnerability detection in code. She built a compiler that integrates GPT with Semgrep to generate more nuanced and actionable security rules, automatically teaching developers how to fix identified vulnerabilities in JavaScript code.
Show original YouTube description
I aimed to test how a ML algorithm can mitigate vulnerabilities in otherwise valid code, by generating rules that were valid and non-redundant. I created a compiler that scans given code for vulnerabilities by calling a GPT API in order to make an output that can be actively used by developers. #programming #gpt #genai #semgrep
Show transcript [en]

um so hi my name is Magna I'm 17 from Orlando and my talk today is going to be a little different it's basically just to present to you all um my research project that I've been conducting for like the past year um so just kind of introduce myself this is me um so I started my cyber security journey in around sixth grade I was watching my dad cuz my dad also works in it and I was watching him click away on his computer all day and I was like what is this guy doing why can't he play with me and I would sit with him and I'd wonder why like what is he doing and he sat down

and explained to me this wonderful New Concept of artificial intelligence when I was only 10 years old and this led me to start my very first science fair project um back in 2019 it was called artificially intelligent or intelligently artificial and all I did was I talked to Siri and I talked to Alexa and I called it a science fair project and I said it was AI it was horrible I didn't win anything um it was it was a pretty bad project but it started my journey and in 2021 about 2 years later I actually won first place with another project called identical identity and in that project I basically used biometric authentication and AI for

the first time because that was when we were right coming out of covid and everyone was wearing masks but touchless authentication was becoming so huge but if we were using face ID for everything and then we had our faces covered how is that supposed to work and so for the first time I was able to win first place and I also got recognized by the US Navy uh later on in this project and then this year I continued my research and really went into cyber security which is what I'll be presenting today um but I won first place at my regional science fair qualified myself to States won at States as well placed in the top three

in the entire State um I then went to the Lockheed Martin science challenge one first place there as well and I just this last week I came back from the international science and engineering Fair held in LA and among 62 countries I was one of the top 50 in the world for software so I have also presented at bsides SLC this is my second bsides and this is basically my project that I'll present today so here's a little picture of U me at La those are my three best friends that's the LA Convention Center and that's me presenting to a bunch of middle schoolers um but to actually introduce so in cyber security I'm sure everyone's

heard of security vulnerabilities they the root cause of data breaches hacking and things like that and these vulnerabilities are the core reason why people lose money people lose their identity and just in general they are the core reason why problems arise and I wanted to see if I could partner generative artificial intelligence to help mitigate these vulnerabilities in a much more efficient way that than what already exists So currently there's this company called semrep and they are um they have a static analysis tool and the static analysis tool basically um it goes through and it kind of looks through your code and it gives you a yo file rule based on of a large repository

that they have so in this repository they have a yo rule that can tell a developer how to better their code for every single vulnerability that's known with an ID for every single language out there so it's this wonderful resource It's a Wonderful database but it's not being utilized in the way that it should be and what I wanted to do is to see with sup how I can make it better and why I chose Su GP is I did a little bit of research on other static analysis tools out there there's srap there's pilent there's flake gate Bandit and there's so many more and I think um pilot and I think pilent and Flake gate

they're like linting tools so they're not exactly looking into fundamental um issues they're looking more into just over-the-top issues and so overall I chose Su grap as a static analysis tool to analyze in this study and how I could make it better using AI so the goal of my research as I said was to partner external um static analysis tools especially Su grap with generative artificial intelligence to see how I can make it better so the five vulnerabilities that I chose for this project um specifically targeted JavaScript and these are the five vulnerabilities that I chose we have denial of service eval injections cross-side scripting broken authentication and SQL injections and I'm sure at least one of these names

rings a bell uh and these five vulnerabilities I did a little bit of research and there was an oasp top 10 list of vulnerabilities in JavaScript that really really hurt um a lot of developers on The Daily and that's why I chose these five vulnerabilities and what I did with them is I wanted to use Chad GPT because Chad GPT is a home name at this point Point everyone knows GPT everyone knows what it does whether you're cheating on homework or whether you're trying to get something fruitful done everyone knows GPT and when you're using GPT or in general an llm you have to get an optimized output but how do you get that by optimizing your input

and so prompt engineering is a big part of that optimization and specifically shot prompting is what I use for this project so what I wanted to do is using shot prompting for or to explain what shot prompting is it's basically for example a zero shot prompt is saying hey do this a one shot prompt is saying hey do this here's some examples or here is a example Two Shots is two examples three shots is three examples so on and so forth and so what I wanted to do is I wanted to use GPT and this partnering with shot prompting to actually structure out and make GPT create me a yo rule mirroring what sumr is doing but

instead of doing it in that static way that sura does doing it in a more Dynamic way because what sumra is basically doing is it's saying it's giving you this access to this giant repository and all you have to do is say okay let me look through my code lines 10 to 20 they look kind of wrong maybe I could be susceptible to a Dos attack let's go to surab go to Doss under JavaScript and implement this rule but if I'm a baby developer like me I don't really know as much as anyone in this room I'm not going to be able to recognize that Doss attack because I'm a baby developer and so sup is kind of

that resource that isn't being utilized in the right way and it's it's static so what I wanted to do is using GPT so um this is the open AI playground so it's kind of where you can customize GPT to a larger extent so on the on that side you can see that's kind of what GPT looks like for mostly everyone you have your output you have your input but on this side you can actually customize what GPT what you want GPT to do so what I did is under the instructions tab I fed it what a su GP rule looks like and I fed it how to analyze a semrep rule I FedEd the link to the repository I FedEd as much

information as I could so in a way I guess you could say I was training it but not really I was more so teaching it and with all this information that I provided it it was actually generating rules for me and it was generating rules mirroring Su grap but what's cool is it was generating rules that had a lot more Nuance in comparison to Su grap so when a srab would give me any form of output I'll show later on in the slides it would say it would have one vulnerability per output that it would give me it would never ever really go into the Nuance of any issue and that's why like I'm I'm sure at least half of

you could raise your hand and say you've never heard of sunr before because it's not really going into the nuance and so with that shop prompting um what I was able to do is obviously um the three shop prompt was giving me the most nuanced um yo rules and when I actually go on to compare um when I go on to compare how GPT was really presenting such a more it was presenting such a more nuanced rule that I could honestly say that it was one uping some up and so with this software I was actually able to write a shell script that kind of connects this and an array of more steps that I will go into

and I actually was able to purchase a domain and I have a domain now called Crypt ml so this is my new product that I will be working on over the summer and basically how I my first step was actually working with GPT but then I actually wrote a small shell script this is just the base code for it so it's a six-step process and my first step is just going through and it's evaluating for syntax errors inside my whatever input I give it and always the input is a piece of vulnerable code for the entire experiment I went through this repository called Web goat and it just had a lot of bad code in there and it

also had commented in what was wrong with it so I knew when I was running it through Su grap what to look for but um for my software what I did is the first step was just a syntax error check and the Second Step was actually going through and going through a giant list of cwes so cwes are just a large classification of different pieces of vulnerable code and so when you see here this is just a snippet of that list of vulnerable code that um I kind of compiled with a lot of my mentors so as you can see there are this is just one snippet of this list and cwe 787 for example that I believe

is a SQL injection um there are 39,380 instances that are all unique and different to having a SQL injection in your code so what srap is failing to do is it's giving a very generalized dictionary definition for a rule in a sense that if I thought I had a Doss attack and I need the Dos rule I'm going to be given that one rule that srep has and it's going to be expected to fix the Dos attack in any scenario but the way I make a mistake isn't the way you make a mistake it's not the way anyone else makes a mistake code is different for every single developer and so that Su grip rule might pick up on my issue but

if you make that same issue in a totally different way it might not even pick up on it because it's structured in a certain way so what these cwes are providing my software is it's providing context and it's providing 39,000 instances of this context so that step two of my shell script is actually going through and iterating through every single cwe that is on this list and how I got this list was just a simple python script but I went through and it iterated through every single instance of found vulnerabilities so let's say it would say Okay cwe 343 it was found now let me go through every single one of these instances okay maybe it met

instance number 200 oh but instance number 200 has this vulnerability too let me go check and it was a very complicated long process of iterating through these examples but then when you compare the output it really really pays off so step three was actually calling GPT for the first time and it was evaluating the rules um that applied and it was evaluating and creating a yemo file and this is where the cool part comes in so step four is just going through and doing a little validation through sura to make sure Theo file is valid by syntax but step five was something that I never actually asked forever and this is probably the coolest part of my project so in the prompt that

I was feeding GPT at any point in time all I was saying here is some code detect any vulnerabilities here's some fixed code for reference I never asked it to recode my project I never asked it to debug my project but that's the beauty of machine learning it kind of knew what I was looking for within maybe my fourth or fifth trial and it started giving me a line by line filtration of which specific lines were wrong and then as I continued with my trials it started giving me a little paragraph explanation of why they said these lines were wrong and like for example like this user made this mistake and this is wrong because

this is why you shouldn't do this here's how to fix it and then by later like towards the end trials it was giving that paragraph it was giving me a line filtration of what's wrong and then it was debugging my code for me and then explaining what it did to debug and then it was fixing the code for me so as a developer this is a dream you run this program you go to Chick-fil-A you come home and your stuff's good so I think it's pretty cool and I wasn't expecting this to happen at all I was I never asked it to rectify the code and that's what step five and six are doing it's

just giving me a display and actually rectifying the code for me and step four was just a simple validation because the emo file is still something that's pertinent to the output like I'm sure like some people are just going to skip over that rule and go straight to the end but when you look at the yo file too it's giving you a learning experience in the sense like you made this mistake and you filter through this yo file and now you kind of know how to fix your mistake because a yo file is so so easy to filter through there are other files out there like there's XML files there's Json files and they're cool they're way

way more nuanced um files but the thing is they don't really they don't dumb down what you need the way a yo file does and that's why I loved the fact that it was able to create valido files and when you actually go through the output now here's the comparison so on this side we have what samre was able to do from the beginning so I FedEd a piece of code and it gave me a severity warning meaning that okay there's this specific um vulnerability for example here it was unsafe concatenation and it's just a warning so as a developer I'm going to say oh okay it's a warning I'll come back to it later like a

warning doesn't really make you want to change something the way an error would which is exactly what GPT did for the exact same issue unsafe concatenation it gave the severity as an error and it also found me three new vulnerabilities and why it was because it was going through those cwes and it was Finding so so much more context so this is just the step two where it's going through and filtering through rules and finding vulnerabilities that you need to keep an eye on but as you can see there are four vulnerabilities on the GPT side versus only one on the SM grap side and this is kind of what I was talking about earlier so this is a

snippet of an output from GPT um using crypto and there are five cwes that were identified and it gives you a little bit of a snippet sentence of what exactly was wrong so for example CW 943 improper neutralized ation of special elements using SQL command SQL injection okay but then again NCW 942 which is the third one down the list which is the cwe right before it so it is so so similar it is also an SQL injection it's also an improper neutralization of special elements and SQL command but as you can see up there it's rdb RDM rdbms SP specific I can't I don't know these don't make sense to me but the issue is that 943 and

942 there's so so close to each other that if you run this through sem grap it would just say SQL injection not give you any more context but running it through a cwe is giving you so so much more context and even as a developer like half of these rules don't make sense to me half of these cwe don't make sense to me you just saw that but they don't need to make sense to me because with my software it's fixing and teaching it for me because down here you can see this like this is that paragraph I was explaining that it's actually teaching me and telling me what I did wrong and why it's wrong so the detect

Ed SQL statement that is Tainted by a wreck object this could lead to an SQL injection if the variable is a user controlled is user controlled and not properly sanitized in order to prevent SQL injections it is recommended to use paramet parameterized queries or prepared statements and then it gives me that line where this specific paragraph was talking about line 17 that you should have had a param parameterized query and then it even baby feeds you a little bit and gives you where the ending semicolon is and I thought that was really cute but you can see that how much more nuanced you can really get output from any tool out there samret pant flake a anything because machine

learning and artificial intelligence knows so so much more than you and like in the previous talk we were talking about that like any llm it's not always accurate sometimes it's going to give you BS and that's that's AI it's learning with us but if you if you keep AI in your own hands for example what I did with the prompt engineering you can really really get it to what you want wanted to do versus when you like for example when you go on open AI playground when I fed it all of that context I fed fed it a whole repository I fed it how to read a yo file I fed it what a yo file is I fed it how to read a

semrep rule if you teach it things it is going to keep that into account it's not like talking to a kindergarten where you have to say red is red 10 times until they understand it's going to understand you the first time and is going to implement things for you at a much faster rate and get you the results you want and yeah okay it took me 10 trials But after those 10 trials this was consistently getting better and consistently finding more and more cwe this is from like Trial 6 or seven as you continue on there are more screenshots that I have with cwe lists that take up a whole page it gets more and more Nuance as you

continue and so I actually have a little bit of a video of the compiler through crypto running so this this is step one happening with the syntax error check um there was no syntax error is found so then step two is going to run it's going to go through code. JS which is the inputed file look for applicable CW then it's going to call GPT it found three cwes for this particular one it's addressing them and then step

three so there's the chat gbt response and then it gives you the little paragraph for explaining the rule step four happens here's the line by line filtration and the paragraph on top explaining what and

why and then this is step five with the new filtered rectified code and then evaluation complete and there's a little explanation in notes at the end so what the software is really providing you is not only a demo file which was the initial intent it's also giving you a learning experience it's doing a lot of your job for you and even if you don't use the rectified code that GPT wrote for you that's fine use it as an example because maybe that's not the style you code in who knows but it's there and it's giving you information that you could use in any given instance and I think that's what's really important to take from this and so a

couple limitations that this project had is that this really only focused on JavaScript so a big part of when I was really structuring this project is that I wanted to understand what I could do and what I couldn't do at the age of 17 and I was really really comfortable with JavaScript so I stuck with vulnerabilities that pertain to JavaScript I wasn't looking into things that pertain to Ruby or C++ I don't know those languages um and another thing is when you look into Java or you look into C++ when you try to compile languages like that you have to go and you have to learn assembly languages and with my with what what I knew and what my

mentors were able to help me with in the six months time that I had to really initialize this project I didn't have time to go through and learn what an assembly language was I didn't have time to compile Java I didn't have time to compile C++ so in the future like over the summer and for next year because I still have one more year of high school left I'd like to really really stray away from just JavaScript and make this a more comprehensive um comprehensive software because if this can find something that's more than into JavaScript cuz JavaScript is a language that's used yeah okay in web applications but if you really look into languages that have a lot more

pertinence like C++ this soft could really really help developers at a much larger scale and the second thing that I was actually told by a judge at ISF was Krypto isn't really finding anything new it's taking resources that I've dumped into it and giving me a dynamic comprehensive piece of output but what if there are like vulnerabilities that exist or that come up that aren't exactly identified in a sense like what if it's a new vulnerability because as developers when we make mistakes hackers are going to try and one up that mistake and hack into it and it it's it becomes a cat and like an what's the word cat and mouse Chase um because hackers are

always getting better and developers are always going to be one step behind because the hacker is always going to find something new and the developer is going to have to play defense but Krypto isn't really going through and finding these novel rules all it's doing is it's taking a lot a lot of context and giving you a dynamic rule so in the future I definitely would like to expand this out of just pulling from a repository and pulling from knowledge that semrep already has but really make this something that's filtering and fundamentally going through line by line pieces of code and looking for things that could be a new vulnerability make this instead of playing defense we could

for once play offense with something using generative artificial intelligence um so just some ending notes I know that I did not go 40 minutes but I guess questions can take that up um some implications of this um krypto's main purpose of this entire project was to add to what srep has at no mean did I want to demean what srep has but it was enhancing what srep was doing at a way way larger scale and instead of having to now sit there and manually line by line half asleep on three things of coffee go through your code before you compile or um before you commit your code you can actually just run it through and as I said earlier go

get Chick-fil-A and come back and your code is debugged and it's ready for you to filter through one last time before you commit it it's giving you way way less um labor time and it's cutting down everything that you that it's cutting down your input into the project a lot and it's going through and doing things that aren't the fun part for you um so a little bit of a special thanks so my first uh special thanks he's in the room with me in the back that's my dad uh he was a big help in this entire project and then these are my two mentors they're both from Orlando uh from Stetson University Dr Daniel

plant and Dr Joshua ekroth um they both really really helped me through this project they introduced me to sreb they introduced me to all the different softwares that I kind of came across this entire project um and thank you so I guess I'll eat up the rest of my time with questions

okay uh thank you Magna that was fantastic um I was just curious uh so like chat GPT has this notion of a context window right so like you can really only ask it questions with a certain amount of data do you find that that is a limitation of any kind sort of versus Sim grip like are do you have to bust up the code base into smaller chunks or would did was that not an issue in your research um it's definitely it is a drawback in a sense because with some grab I didn't have to I didn't have to feed it that data but at some point someone was feeding the data so at the end of the day the

process is the same cuz samr was hand created it was hand compiling all that data for me since Chad gbt was working from scratch I had to feed it that data but at the end of the day it's not really a roadblock because you can't expected to do some for machine learning to do something without actually giving it the teaching to learn so I don't think it was more a draw it was not a drw back it was just a Time issue okay cool um if you have questions here's what I want to try to do um we need to get the questions into the microphone so they can be recorded if you can come up

to the front and use the microphone to ask your question that will save me trying to summarize your question um if you don't want to come up front just shout it out and I will attempt to summarize for the recording um anybody have any questions for Magna yes please now we got to wait for everybody to like walk through rows of

people so first off incredible amazing thank you glad you uh were able to do this uh just thinking of it from a counter perspective as you're working with srep and these tools to develop it basically a tool that will improve code what you're doing is you're teaching and training and learning uh Within These models to develop hey here are the types of vulnerabilities that we have here's what you might do to fix those have you considered the offensive perspective of what are you going to do when adversaries now start investigating your work and using your work against you in the development of oh this is how this model recommends that we make this change resolves this

vulnerability this is how I can get around that now right and so it just becomes like you said the cat and mouse but how are you building that into your uh solution to prevent that from prevent that from happening I've gotten this question before and I keep saying I don't know um but I think I think one big thing is that's a big ethics question because when okay there's always softwares out there that are trying to do something good and there's always going to be someone who's trying to work around it and I think that's just that's just something we're going to have to deal with in a sense like you could use this software in bad ways too

like if hackers purchased this and then they're trying to see okay if I run this code here's what they find let me Target these things without them rectifying it first and that's just something that I guess with time with machine learning if I can implement this to maybe have kind of like a like what you said like an adverse attack almost like have it give me another snippet saying if a hacker were to look at this here's what possibly could happen because there are um tools out there like I know Bandit which was one of the static analysis tools that I mentioned earlier they have a couple things where they look into like what this looks like from outside

perspective so maybe if I could teach it some research like that it could give you a developer kind of a kind of a warning in a sense of what could possibly happen if you do it this way and here's another way that you could work around it but I guess implementing machine learning at a higher level could help with that but at this point I really not sure sure yeah thank you okay um Rosie Hall did you have a question okay uh yes please yeah 17y olds very impressive thank you what made you choose um semre as supposed to like say sneak or a keto or I don't know veric code um I so I

looked looked up four and those were the ones that I kind of compared between it was sugre pilent flake a and Bandit those are the ones that I looked between um I think the biggest thing that is I was really really in love with the way srab wrote their emo files because I never knew what a emo file was before I did my research and this the way that it was very easy to filter through was something that I appreciated and so I liked the Simplicity and I also liked like for example pilent and um band flake8 they're lenting tools and they a lot of these other ones are linting tools and so they're not really going

deep into kind of what's really wrong um with whatever input you're giving and so I really like the fact that Su grip was efficient it was easy and then the graph that I had earlier um it showed that srp's like running time is so so much lower than all of them so my biggest thing was I was chasing efficiency because this still was a science fair project to me at that point and I wanted it to be fast and I wanted it to be reliable and I liked the fact that SRP had a static repository that I could look into it had very easy to filter ymo files and it's runtime was very very low and again you you you prefer the

sast approach as opposed to Dynamic yes right any reason why um I just I guess of chasing efficiency that's like my biggest Point any other questions

yes thanks very much for sharing your research with us uh one of the things I'm always paranoid about with anything llm is hallucinations uh did you hit any hallucinations like new cves that actually don't exists in your research yeah so that was a big part of when I was going through um zero Shot One Shot Two Shot there were so many times like especially with my one shot zero shot sometimes shoot two shot proms I was getting a yo file that's fighting something that was not even a real issue like there would be parts of the yemo um like you know how yo has like the definitions and classifications area they were blank but they were still

fighting something like the yo file had no cohesiveness to it and I think that was a big part of why I kept increasing the shots until I got cohesive actual real yo file rules so that's a big part of shop prompting and prompt engineering when you're working with llms you have to give it the optimized input because you can't blame a llm for giving you bad results when you're not really giving it enough to work with so I think the biggest thing with hallucinations is in order to mitigate them you have to give as much context as possible because it is still you have to teach it in order for it to learn yeah Adrian do you have any questions

for our speaker he says no okay anybody else yes fantastic thank you um Let's Pretend We're like a year from now you're grossly successful in making everything you want you've got your business stood up like what does the delivery model look like is this on developer machines is this in cicd workflows developer machines okay I definitely do want to keep this so a big problem or a big question that I was asked during one of my competitions was do you want this to be public or do you want this to be almost private in a sense because let's say like you're compiling your code and this is going through GPT but then GPT keeps the code that you've inputed in a

database then what then if this is something that you you didn't want your code to be out there for GPT to ever reference how is this going to work so I definitely do want to keep this developer side make it a command line tool um something that never really leaves your computer never really leaves your area the only thing that gbt is um doing is just calling an API running its results and feeding it back to you so definitely a developer tool most likely a line tool that can be implemented by soly

developers like everybody else said very impressed U so GPT has had some kind of public oopsies lately what is your plan if they disappeared tomorrow that's a good question um well I mean there are other models out there I think the reason why I went with GPT is because it's that home brand name that everyone knows and it has the most context because it uses everyone's using it so much but there is Gemini there's Claude there's a lot of other models out there and I like I mean yeah maybe GPT might disappear tomorrow but Ai and like llms are not disappearing tomorrow they're never going to disappear so in that case it would just be training a

new model and training a new set because they're all kind of doing the same thing in just in different ways and honestly if I use something like Claude which is a much more Niche um llm it might even provide better results I never really went through other LM so definitely going through like I mean I hope gbt doesn't disappear tomorrow but if it did there would be other options I think any other questions for our speaker okay seeing none let's thank Magna for an excellent talk [Applause]