How to Train Your Llama: Lessons Learned from Finetuning Llama 3.1

Name: How to Train Your Llama: Lessons Learned from Finetuning Llama 3.1
Uploaded: 2025-02-10
Duration: 49 min 8 s
Description: Training and fine tuning LLMs is an incredibly complex process, but thanks to different libraries and frameworks we can easily find our own data and fine tune open source models like Llama 3.1. This talk will be the story of how I scraped Telegram channels operated by threat actors and used this da

BSides Philly · 202549:0874 viewsPublished 2025-02Watch on YouTube ↗

Speakers

Cory Wolff

Tags

CategoryTechnical

StyleTalk

About this talk

Training and fine tuning LLMs is an incredibly complex process, but thanks to different libraries and frameworks we can easily find our own data and fine tune open source models like Llama 3.1. This talk will be the story of how I scraped Telegram channels operated by threat actors and used this data to fine tune Llama 3.1. It will give attendees an easy way to fine-tune their own models and demonstrate what steps to take and what pitfalls to avoid.

Show transcript [en]

okay all right check check how's everybody doing today nice yes who started the clap yeah thank you here you go here's the

B yeah uh hope everybody's uh having a good day I think I have to speak into the mic because they're recording it normally I just like to yell at everyone but I don't think that's going to work out today I'm not a big fan of microphones but anyways appreciate you guys coming out hope you're having a good day uh appreciate bides uh for having me and allowing me to tell a story of of what I did when I wanted to figure out how to fine-tune llama 31 uh and so that's what I'm going to talk about today just real quick about me my name is Corey wolf I'm the director of offensive security at a company called

rer 60 security and compliance consulting firm based in Atlanta I am not from Atlanta I am from Scranton and now live in the Poconos uh I a core team member of red team Village so uh if you've ever submitted a talk or a workshop or red team Village I've seen it uh and so I would also recommend that you submit again this year if you are thinking about it or have done in the past but myself along with another gentleman named Wes uh we handle the uh workshops and talks at Red Team Village I'm also a part-time farmer I live out in the country uh I got bees I got chickens all kinds of random stuff going

on uh but uh I I like to try and balance it all out real quick just about today and some things to know before we really start to get into it uh I originally had started to work through machine learning in 2018 uh I had gotten into so if you think about machine learning back then people will get mad at you if you said AI uh they be like it's not AI you know whatever uh but back then uh I you really had to get into the weeds of it so you had to understand the math behind it you had to understand uh things like gradient descent you had to have basic calculus which I hate uh but I knew the

basics enough to start to kind of figure it out uh but I had started in 2018 and for various reasons wasn't able to focus on it for quite some time until uh chat GPT came out and I was so mad because I didn't keep up with it and I was like I know what they're doing why don't I keep up with this um but uh this uh this was basically you know that that uh side project to help me get up to speed with what was going on right so uh there's uh lots of differences from since 2018 back then if you wanted to load data for example you had to literally load the data set uh parse it clean it

you know all within your your your script before you even did anything you'd be 30 40 lines in just getting the data together and getting it ready to even start to train it nowadays with things like hugging face you just do load data set and you throw a path to the file right so there were lots of things like that that that had changed since the last time that I had really looked at this and so that's uh for me at least that was the motivation behind trying to figure figure this out um I had seen a lot of tutorials uh I'd seen a lot of videos and all kinds of stuff and they made it seem super

easy it's easier but if you ever see any of these tutorials or videos you'll see that they leave out a lot a lot of like in between stuff and every time I would go to look and I'd try to like catch up with what was going on i' would get so angry and frustrated because they'd say you know just do this then do this then do this and you're like well there's probably four hour of work before in between those steps right that you didn't even realize uh so that's what this is as you can see I'm sure you've already read already uh the code in slides the code right now is available on the GitHub so if you want to pull it

up and start to look at some of the actual code as I talk through it you're welcome to do that and true form for myself I didn't finish the slides until about an hour ago so they are not in the GitHub but they will be by the end of the day so I mentioned earlier that uh I am the director of offensive SK security uh and I think that uh it's important to kind of call out how you know the the things that I'm going to talk about today is more classified as uh cyber threat intelligence right we're gathering information from threat actors we're trying to analyze what they're doing we're trying to understand a

little bit better what they're doing that's threat intelligence uh for me it's important to understand how this relates to offensive security uh because like I mentioned you know that's that's what I do on a day-to-day uh so I should always be trying to understand understand and and consume threat intelligence and it's important because that's literally our job our job is to emulate threat actors to do what they're doing uh so that we can of course test systems right uh so it's it's important that if there's so for example in this case if I'm able to find a way to take threat intelligence throw it into an L somehow and help me kind of keep up with

what's going on in an easier way well that's that's valuable to me uh and then last if we're going to test something we should know how it works right so there's whether it's a you know some sort of chat GPT wrapper or I'm sorry open AI wrapper or uh a custombuilt llm you know us as offensive security people specifically when it comes to AI red teaming we need to understand what's going on right we can't just prompt engineering is a big or prompt injection is a big part of it don't get me wrong but there's a lot of other moving parts in AI systems and so we need to understand the whole picture in order to

be you know really good at what we're actually doing uh so that that was important to me uh important to my team as well uh but uh the other thing I wanted to call out as far as attacking llms is that yes these are relatively new systems technologies I'll say uh but the systems that serve them are the same things that that have been serving web applications and network applications for decades right so yeah it's an llm but it's still view or react in the front end right it's still node or python on the back end with some little tweaks right uh but at the end of the day that's why supply chain attacks are are a big part

of what we see in dayto day also it's fun so uh I mentioned that I'm going to talk through uh some of the pitfalls I encountered first one is it's pretty expensive so uh for those of us that don't know if we wanted to train our own Model start from scratch rather it's extremely expensive and it takes a lot of data uh and so you know open open AI Google anthropic all the other companies that are generating these um types of models are just basically scraping the web right uh in basically what they're doing and that's a lot of data right um so the more data you have uh the more uh the better that your model will be but

to just gather that data you know I don't know just even gather that data like scrape you're talking about millions of dollars right like a lot of money uh and not only that once you have that data you can't just take that data and throw it into a training model right you got to clean it you got to normalize it there's different formats there's different uh you know prompt templates there's all kinds of different things that go into actually training an llm so just because you have the data doesn't mean that you can actually use it right uh you have to take a lot of time to uh normalize it and get it to where you

want it to be that's basically data science right we talk about when we see people talking about being a data scientist and then of course you need a train it which I think we're all familiar with Nvidia being a 400 billion quadrillion company nowadays uh that costs a lot of money uh so this is why we use pre-trained models uh because companies like meta uh I won't say open AI because they are closed Source models uh but companies like meta uh Bert um uh mistol has open source models they do all that heavy lifting and so when we get those models we can then do things like fine tuning on them right so even if we were to start with uh you know

creating our own model it it'd be extremely expensive so we use pre-trained models and then tune them uh you can do this a couple of ways but even when you want to do it locally it's still an investment um and so just for your own understanding of what I was working with during this this is my basic setup I had a 4070 uh you know decent amount of ram which it will max out uh so I had 64 and I wish I had more at the time uh but uh it it will max out because a lot of times you're doing things like loading the whole model uh into memory uh or you're doing a lot of computations so

for example uh as I get a little bit further down you know I had to generate some question and answer pairs for training data uh and this is basically just a loop looping through lines uh and making certain calls to to another llm uh and that alone was like my I that alone was like 18 gigs of RAM just doing that right so uh I'm I'm able to do this with the setup that I have and everything went well uh but uh you know that's still an investment uh because it's it's a decent machine at a high level uh when we're talking about fine tuning it really comes down to three high level areas you gather the data uh

you create the training data and then you fine-tune so in this case uh I gather the data from telegram's API uh and you know basically have a list of telegram channels that I wanted to grab messages from use the API to do so store them uh you know and then moved on the process but there's lots of different ways you can do this um so you can grab public data sets from hugging face for example and if you're using the uh data sets library from hugging face you can literally just put in a data set name and you know the uh the the python package will literally just pull it down for you um by the way if uh you're

interested if you look on the GitHub there is a link to the data set that I gathered for this training uh which is available on hugging phase so if you go to the GitHub you can see that but it starts with Gathering the data so here's just some sample data of some of the messages that that we were looking at in various channels so this by the way is just all breach data like just like usernames passwords that kind of stuff uh but uh just to kind of give you an idea of the things that we're we're looking at here U these are just some of the messages um they're exactly what you would think

they would be they're idiots uh but uh so basically we had all these and we needed to gather them so using the telegram API long story short it's kind of weird really weird just the way they do it uh uh for those of you that aren't familiar telegram uh was built by I can't remember his name right now of course because I'm in front of people uh but the telegram founder is the same guy who founded VK in Russia uh which is basically their Facebook if you didn't know that uh got kicked out of Russia got in some trouble because he wouldn't give Russian authorities uh data on its users eventually Russia got data on the

VK users but their founder left uh became I think a citizen of France and was recently arrested if you follow the news uh over the past sometime in the last three months but uh he Prides himself on being a Hands-On keyboard coder uh so him and I think his brother uh basically built this whole API and you can tell that he's not a professional uh it's it's annoying it's real Annoying um so I mean I'm not I wouldn't say I'm an Enterprise uh software engineer but you know you could tell it was weird so it took a little while just to understand a lot of times like when you go to API docs like it's just there and

you can look at all the endpoints and it'll tell you what the endpoint accepts it'll tell you what the endpoint is going to return right the telegram API isn't like that it it took a little bit longer than I expected uh to really understand how the API works and then there's limited packages at least in Python there's limited packages to actually uh interact with the API um sometimes I'll default to just using like requests and literally just like put it together myself if I have to I don't know why I can't tell you why uh but that just wouldn't work when I try to uh you know call the the uh telegram API um and uh it was just it was a lot

and I've worked with lots of different apis um Au is based on the phone number it's kind of O aesque uh but again it's kind of weird if you do any of this I hope it goes without saying use a burner number do not use your own uh but uh everything is based off your phone number you know like a lot of places a lot of accounts you get is based off your email of course in telegram it's a little bit different uh but needless to say it's it's kind of weird but eventually I figured it out and uh here's just a code snippet of what it looked like looping through some messages so you know basically I had a

list of telegram channels I was looping through those channels uh I was grabbing all the messages and then this is just that code block that shows the portion of that just to kind of reiterate it being weird if you see one of my comments there says seems silly but we call history. message it returns everything so like another example of working with their API is if you say give me all the messages it'll give you all those messages but some of them might be empty so you had to kind of do like just little silly things like that just to deal with with the the telegram API a lot of times like stuff like this would

just take me like an hour or two to figure out this took me forever because of just stupid stuff like that I'd have to actually like you know print out the whole thing and look and investigate and there's no documentation but this was probably one of the tougher parts of uh the whole process was actually getting the data uh the other thing is like a lot of times when you read or you see tutorials it's like yeah just use this data set from hugging face but the data set is not related to anything that you want to do with the llm so it was really important to me that we figured out a way to get fresh data and get relevant

data so of course I continue to kind of fight through it and figure it out uh just some other call outs as to what's going on with the code here so in that line that I'm pointing out uh this was to remove basically I just wanted to make sure so when you get it returned back uh there's all kinds of weird stuff in that response so one easy way to clean it up is to convert it to asky and then just decode it because it'll basically just remove everything that's not asky um so that's what I'm doing there I'm just kind of cleaning up the the response next one is I'm adding context so these messages are going to be fed

into training data so what I wanted to do was just kind of give a little bit bit of context uh when I started to create that training data uh so I formatted the messages the message uh in the way that you see there uh and then at this point I don't remember why I did base 64 I think I was super frustrated as some kind of formatting or encoding thing and I just B 64 it and just dumped it in a file so ran through all these and ended up with 18,000 U messages from uh five different telegram channels so here just a quick output of of uh my script running but again I I mean seems

like it took a little bit longer much longer than I expected it to quite honestly just because the way telegram's structured and their data types and all this kind of stuff so I had the data well yeah I had the data uh and then I needed to move on to uh create training data so I need to formulate it get it ready for the L lots of different ways you can do this um um but uh you know ultimately uh I used a another llm to basically create question and answer pairs which I'll get into in a second but I had the data and it was time to create the training data for fine-tuning I mentioned this earlier but every

single thing that I saw uh it's not going to take 15 minutes so that's that's uh kind of the second lesson I learned here is it's not going to be as easy as I thought it was would be uh there's lots of these videos that are five minutes 10 minutes 15 minutes and when you watch the video it's like oh that makes sense and they link to some notebook or something somewhere like oh I could just copy and paste this I'm good right no it's not it's definitely not going to do that it's definitely going to take longer uh so it's definitely going to take a lot longer so I would suggest that you keep it simple

uh so I I uh showed you my system setup before uh the reason that I went with WSL is because managing NVIDIA drivers and Cuda drivers on Ubuntu I don't know maybe I just have no idea but for me it's always a nightmare uh and uh you need very specific Cuda versions depending on what you're doing and so for me I basically said all right it's 10 times easier on Windows I'm just going to use Windows and then the good thing about that is if you you install the Windows driver and it passes through to WSL so you don't even have to worry about it I can still use Ubuntu I still use Linux like I like to but I don't

have to worry about these drivers and I can actually figure out what this stupid telegram API is doing uh usea I don't know why I'm more of a v EnV or a pipen EnV kind of guy when it comes to python but AI Engineers seem to love K uh I fought it a first and it was a terrible idea uh so if you end up doing this I highly recommend that you just if you've never used cona before it's pretty easy it's a python virtual environment um but I would highly recommend that you do that and of course uh you know be ready to to read so we talked about uh Gathering the data now we're working through getting it

ready for training I mentioned earlier that there's different formats that that these models take uh and depending on the model uh it takes of course different formats llama 31 uses alpaca format which basically just breaks down into a Json um object which is uh an instruction an input and an output and there you can see a screenshot of some of the actual data that I gathered uh in hugging face but I had to take these basically you know just messages and start to put them into uh training data and uh you know one of the common ways to do this is to actually create question and answer pairs so if you don't if you just have an answer you

just have raw text you can't just feed that in and fine- tune the model you need to come up with an instruction right and so one way to do that that is to generate Q&A pairs um but uh you know like I showed before there's 18,000 of them there's no way that I'll be able to just manually answer these right um and so an option here is to use ol Lama so has anybody used olama before yeah right so AMA is just an easy way to load a model locally and um uh basically query it right so it allows you to it's pretty easy to set up and install and so like if you wanted to

just run llama 31 locally you literally just install o llama uh you do o llama pole I forget what it is but llama 3.1 it'll download the model for you it'll get it ready for inference it'll spin it up and you can actually just use that model locally but the other thing is that it has an API so what you can do is you know if you can find a model that you like uh or that gets you the results that you want you can have Lama run it uh and then use the API to generate those question and answer pairs uh there's a lots of different file formats as far as outputs but something that I

wanted to note is that uh olama requires GG UF GPT generated unified format uh remember that for later because that'll make your life a lot easier it's basically a single format a single file format that just makes things easier to move the model around a lot of times when you're working with L models there's like Shard what they call shards so there's like I don't know if it's a 20 gig model they'll be four five gig shards right and it's a bunch of pieces most the time you'll see like safe there's a format called safe tensors uh and most of the time when you see safe tensors it's like that it's real Annoying there's json. config there's

all those different stuff uh but ggf is a single file and basically AMA works with those another thing to keep in mind is that ama uses llama CPP under the hood CPP for C++ uh to C++ written uh I don't know how you explain it but it it basically performs various um methods on models so like if you need to transfer from one quantization method to another llama CPP can do that for you but anyways what I need to do is generate Q&A pairs so I tried a number of different different ways to do this I said chat GPT here's my data you know give me some Q&A pairs uh it didn't like that very much it wouldn't do it uh you

know because content uh and then uh you can also get uncensored models so popular one is Dolphin mixl so there's a um anthropic has uh something called mistol with an ABS s uh which is a mixture of experts model but basically it's a pretty good model and what happened was is that they created an uncensored version of that mistol model and that's called mixol tried another one called neurom made uh that was I don't know that was hot garbage I don't know why it didn't work but it was just spitting all kinds of nonsense out um and so I ended up using a model called neurochip uh so what I basically did was I just went into AMA pull neuro chat uh

and then served it and you know basically scripted a way to Loop through all my stuff and get Q&A pairs here's some examples of when I was trying to get chat GPT uh to answer well generate these uh uh a subset of the data so I gave it 200 and said hey try and uh I'm sorry I gave it 200 and said generate some questions for these answers basically wouldn't do it uh and then I'm not I'm not sure if you can really tell but it's spitting out so like that spark thing you see spark like that's a random piece of a message uh so it was just like spitting out like random data and

it just wasn't just wasn't working out I moved on and uh and started to use

neurochip AMA use scripted a way to Loop through uh all those Bas 64 encoded messages that I gathered and generate questions for each and every one of them which of course ended up with you know about 18,000 questions this was actually fairly quick uh it was very surprised at how quick This was um and the questions were pretty good there's definitely room for improvement there but uh they're good enough for my particular purposes um and so at this point you know that's kind of where where we were at oh and then dumped them to Json L file so for those of you that don't know again I don't know why but AI engineers and data

scientists love to use Json L which is just Json lines it's just it's it's a Json object per line so every new line is one Json object I don't know how that's different from just parsing it but uh this is a Json L uh format with all those questions and messages AKA answers so we're at the point where we we gather data we uh created the training data uh and now we can fine-tune uh our model I mentioned earlier that uh you know in the past there were it was very kind of manual thing um so before there were things like hugging face or UNS sloth which we'll talk about in a second uh you

basically had to use something like P torch and numpy to go through and like basically do all your calculations it was a pain um but uh of course now we we have you know Advanced our our our uh tooling right uh but uh kind of the last big lesson that I learned through this was that uh just Gathering that data is half the battle so training I shouldn't say training fine tuning uh a pre-existing model uh there's lots of different things that can go into it um there's lots of not lots but there's a decent amount of ways that you can actually do it uh ultimately I decided on using UNS sloth uh which uh you know basically

uses you know different ways and methods to cut down on the amount of ram that you need uh just the amount of general computing resources that you need in order to find tun model uh UNS sloth is specific to fine-tuning models like that's what they do written by a guy uh Daniel I can't remember his last name uh but uh it's basically one guy which you know creates headaches for you down the road you know because it's just one guy maintaining this big project if some you know breaks in it you know you're kind of riant on that guy to go fix it right and uh in the meantime you're waiting because or you're trying to figure

something else out and you end up going down another Rabbit Hole uh instead of just trying to do what you were trying to do so uh you know that that last big lesson for me was that you know just Gathering the data and getting it ready for training was just half to battle uh trying to work through actually fine-tuning this was going to be a challenge there's a couple different ways that you can find tune models uh one of them is called uh Laura or low rank adaptation uh it's uh just another there there's Cur there's Laura there's there's a bunch of them um but it's it's one way to fine-tune models uh now you

could theoretically take a model and just continue training on it right um theoretically you could do that but you would basically have to hit the entirety of that model right so you would have to go through and look at you know every single parameter in the L in the LL model uh and that's a huge model when you're talking about llama 31 or any of these other bigger models uh but uh it really adds a an additional layer onto the model so you're really just when you're doing uh Laura fine tuning you're really just focused on adding an additional layer on top of all the other layers that happen during training right uh and adding our data on top so uh when

we do that though uh it's called Laura uh adaptation right so when you do this uh the adapters are saved separately right so it's not like if you do it and you find tun it and you just save it at the end of the day you just have that that additional layer right the the additional adapters it's not the whole model um so that's just something to keep in mind that if you fine tune it you're just going to end up with adapters and you're going to need to do some conversion some other stuff to add your uh Laura adapters onto whatever model it is that you're trying to F tune uh so when it comes to fine-tuning

if you you can kind of think of it at least I did in three high level areas you do the fine tuning we want to get it into ggf because that just makes it easier to put it into ol Lama to put it into LM studio uh basically to actually use it ggf makes things a lot easier uh and uh so that was that was a big part of it but for me it was fine-tune it get that ggf file which You' think would just be like a oneliner it's not uh get that ggf file and then import it into oama and yes I know that's a sheep not a llama um so like I mentioned I decided

sorry I'm moving fast there's just like so much stuff and I yeah I'm doing all right uh but uh if you have any questions too please just like interrupt me uh but like I mentioned I I decided on UNS sloth uh they do have a lot of really good examples uh like i' mentioned probably thousand times already I thought I could just copy and paste and get out my day uh that did not happen uh but basically for my purposes and I guess maybe for your purposes as well all you really need to do is all you really need to know is it does some fancy stuff uh to allow quicker loading of these models

and then actually fine-tune them so if we wanted to uh actually go down the process of training the model first we have to uh load the model so in this case UNS sloth has optimized models of um various other models so what I'm doing there is I'm calling llama 31 the UNS sloth version of llama 31 uh which is which is optimized um that's where you need to start right this is what we're f tuning refine tuning llama 31 uh sorry right uh and then on that second part there is where we start to actually load in the model uh and start to set some hyperparameters related to fine tuning uh that also is through uh

UNS sloth so uh UNS sloth fast language model uh is is an UNS sloth optimized version of llama 31 pretty much so the first place we want to start of course we're going to find tuna model we need to load it right uh here we go so now we need to actually load our data set so these are all the telegram messages uh and like I mentioned before you know in the past you had to like basically rip everything out or I'm sorry you had to basically load everything Yourself clean everything up on your own now you can just use load data set from hugging face so hugging face has a bunch bunch of python

libraries uh that make things a lot easier for you uh and so that's where you start so we load the model now we load the data and now we need to set our trainer so hugging face has the sft trainer which is the supervised fine tuning trainer the all the stuff that goes along with that uh are what it referred to as hyperparameters so these are basically all the variables when it comes to fine-tuning and training but when it comes to fine tuning uh and these are the primary variables that I was working with to um to fine-tune and like get the llm to do what I wanted it to do right because if these parameters

are off or they're they're not where we want them to be uh it can really you know your model just doesn't do anything the output isn't any good uh but these are what you need to kind of work with some of the big ones are epics so epics are how many times that it'll actually run through your data set right because if if you only have one Epic that means it's just only going to go through the data set once uh that's an important one to keep note of uh the other one uh is steps so when we're training with UNS sloth steps is how many actual uh steps it'll take through the data so you can set these parameters

differently it'll depend on how much data is ingested how much training stuff happens um there's a like just talking through all these things is a talk on its own so I'm not going to go down that road but those are what are referred to hypers hyper parameters and then you train so in this case we're fine-tuning uh but uh you train it and uh that trainer. train is hugging face so when I first saw this I couldn't believe my eyes because to train back when I was first starting to look at this kind of stuff you there was all kinds of different things you had to Loop through stuff you had to do uh like

you had to bring in numpy and do like crazy calculus on things you had to calculate the Lost yourself like you had to do all this stuff and I saw this and my mind was blown that like you can do all this just in one line right you just do trainer. train uh but that's what is actually fine-tuning the model and using all those hyperparameters and looping through right so if we think about how machine learning operates in general uh if you were to start with a new data set I'm sorry if you were to create a new model you would start with your data set you do some calculus on it but basically you can Loop through all your data so

you have a big data set right and you say okay here's the answers to 30% of uh our data set give me the answers for the rest of the data set and you can calculate when the model was right and when the model was wrong right and that's your loss and so what you can do is you can basically just continue to Loop through and you want to optimize on that loss and so you just keep looping through looping through here's 20% give me 80% blah blah blah and you calculate the loss and so that's what that trainer. train is doing is it's looping through is saying here's x amount or right uh give me the answer for x and it

calculates the loss it changes some of the parameters it does it again right uh and it's doing all this pretty much on its own so it's a I mean this whole file is probably 80 lines or something it's really easy well shouldn't say easy shouldn't say that uh so so uh going back to kind of my highle three uh three-part process of fine tuning it was uh train the data get a ggf file and then use it in O llama so I can actually ask you questions this is a documentation excuse me this is a documentation from UNS sloth uh and as you can see it's really easy it's just a oneliner that uh creates a ggf file but no it doesn't

work so everything I cannot tell you how long I spent just trying to figure out how to get to a ggf file uh because everything I saw said yeah just do this oneliner you know save pre- save pre-train GF right uh and it it wouldn't work I mentioned earlier that UNS sloth is developed by basically one guy uh and what happens is is that um when you are using UNS sloth to fine-tune right uh it needs to use it needs to use llama CPP which I mentioned earlier uh so what it does is it looks for llama CCP in a folder so if your project folder is tet train in my example it's looking for llama CPP

folder in my telet Trin folder which why would I ever put it there but regardless if it's not there it errors out so I figured to work around to that I go okay they said yeah you need to um basically go back a couple versions you have to do some GitHub stuff to get like an old commit build it uh and then you know the files would be where you need them to be I did that it still didn't work uh because something was broken and UNS sloth uh and it was literally just looking for a specific file that didn't exist anymore uh in Lamy PP and so this took forever to kind of just get this to

where I needed it because like I said I was like oh great here's a oneliner just can dump out the file that I need didn't go that way you know spent a lot of time Googling and that kind of thing finally worked through it uh and and got it done and figured it out but um you know that's the thing with a lot of these these tools like Ai and ml are moving so fast that you know I mean technology is always been fast but this is at like hypers speed right because the tooling changes like by the hour like literally like by the hour there's some new update that breaks everything and you have to

kind of start over again and you know it doesn't work with this other tool and it's just so fast the the advancements in Ai and ml just in the last year or two have been you know exponential I mean it's really insane uh I mean I just can't explain it but the point is that these two different tool sets were moving too fast you know that they couldn't even keep up with each other and there were breaking changes here breaking changes there and so when you go and you want to fine-tune something it turns into quite a headache um and this case that's that's what happened to me I ultimately figured out how to do this but uh you know we can so

I mentioned the ggf format I've talked about llama CPP uh that file format was created by the developer of llama CPP right which is why they work so well together um and so that's why I wanted it o Lama under the hood is using llama CPP right so that's why I wanted this file format uh and so one way that we could do that instead of just using that UNS sloth oneliner is to create what's called a merged uh 16bit file which I mentioned earlier Laura there's the adapters and then there's the model uh they're separate but you can merge them together into 16bit quantization uh and basically have what I need right uh and that's what I did I

ended up saving it in a merge format uh and uh yeah there you can see it uh so this is UNS sloth you can see my comments um in the the one that I actually ended up going with uh but that's what I did I I ended up saving as a 16bit merged uh model file uh and now we can use llama CPP to convert it so llama CPP has a whole bunch of python utilities one of them is convert HF hugging face uh to ggf you basically give it an input of the folder um where your file lives where your model files live uh and it spits out a GG UF file so at this point I was

probably 60 days into a project I thought was going to take me a weekend uh and I was very happy to get to this point because that was like for me that was a Holy Grail I was like just get this one file if I can get this one file I can actually test it I can use it in different tools like this is this is what I was working towards uh and no joke it was something that I was like oh I could do this in the weekend till like two months later I finally had it figured out uh so I know it's just a screenshot and you weren't there but for me this is a huge win right here uh I

can't I can't explain to you how big of a win that was so um so anyways uh so we have the file format we want uh when we want to run these types of file formats in llama on our own uh you create what's called a model file which is specific to oama it's basically just a config file for a model you want to load uh you tell Ama to create it uh so and passive that model file AMA creates it and then you can actually run it right so that's with that last line yeah is me actually running it so this is when I was able to start to actually test it out so at this point I I had I had it I

got it done uh I got something at least that I could test um but uh the results were not great uh they were not great at all uh and they gave all kinds of random stuff uh sometimes so starf fra uh start chat is um a chat so are you guys uh familiar with the snowflake bre right the guys that the primary thread actors in that were in uh Star Chat it's basically just I don't know if you're familiar with that whole crew it's like a bunch of like 18 to 25 year olds just doing work and uh they're like one of that Star Chat is one of those channels where they basically make fun of each other and you

know post breaches and that kind of stuff uh but uh it's definitely not uh where does it say it yeah so like here for example says breach discussion right breach discussion isn't a channel there's a channel called breach data one right so that that's not even close to like what it is uh and some of the stuff on the top it just wasn't like it should have gave clearer answers uh so I did some fine-tuning of those hyperparameters basically I gave it more epics so that it got to go through the data more and it started to get a little bit better uh you can change things like learning rate uh which basically is how much uh data

it ingests uh per learning site so I basically messed around with some of those and got those to where I wanted them to be uh and uh I know this a big old block attack I couldn't figure out a better way to show this to you but uh this is a better response so if we look at I'm basically asking it uh I'm basically asking it what's the main feature of the service offered by Gorilla panel gorilla M spoofer and gorilla callbot which are three services advertised in one of the telegram channels uh and it got a little descriptive a little too descriptive uh but uh it actually the information it gave me was correct so which to me I was

like all right great I just need to figure out how to tell it to shut up and tell me what I wanted to tell me instead of going on a long thing like this is basically that whole thing is its answer uh so I was able to really uh kind of get it to at least give me the right information uh and the information that I because hallucinations they call them hallucinations but um models will sometimes just make stuff up like it it did here like it's just kind of making stuff up in these responses it's because it's you know it has underlying llama 31 data plus the data that I threw on top it's just making stuff up at least in

this case it was factual right what they're saying and I went and looked and checked uh it was actually what it was supposed to be but it was just a little too uh a little too long and there are certain things that you can change like uh let me see if I'm probably gonna just make a mess with this but where is it there's certain things that you can change where if you see temperature for example this is when I was generating those QA Pairs and calling out to ol via API like temperature is something you can change it describes how descriptive it is uh top K I have no idea what that is but

what it does it it respond it determines how responsive and how descriptive the model is right so basically the point is that you can fine-tune and tweak some of these things without having to retrain the model like I could fine-tune the stuff directly through oama API or you know whatever other application you're using right so um so it it had given me the response it was factual and to me I considered that a win so uh since this screenshot I've done a little bit more it's gotten a little bit more precise uh so the idea is that at least for me is that if I could continually you know kind of update this model I can ask it generic

questions like what's going on in um you know what's going on in thread actor telegram messages and it should be able to give me a a summarized list of like trending you know like like trending topics you know that kind kind of stuff uh or if I have a question like hey what's gorilla mail spoofer I can go in there and it's going to tell me what it is and the user that makes it and where to find it and all that kind of stuff so um that that's the end goal I'm pretty close uh still needs some more tweaking probably a little bit more data uh the sample size I used to F tune this was

18,000 lines which is relatively small uh a lot of data sets that you get are like 100,000 lines of you know various information um but uh yeah so for me next steps uh refine hyper parameters like the things I described uh clean up the data I could probably figure out a way to do some better question and answer pairs so that the model is a you know a little bit more more focused and just accumulate uh more data so uh yeah I think that's it I feel like I kind of just like talked super fast but uh there was just a lot to kind of talk talk through um you know give me uh you know by the

end of the day we'll have these slides up there uh also want to say thank you to a gentleman uh Matthew Nickerson who used to be on my team I used to work at a company called layer 8 where I was their based in Malin uh where I was the um practice manager of offc there uh Matthew was part of my team there and he's been super helpful with all this thread Intel stuff so much appreciated to him also I got a whole bunch of red team Village swag so if you want some patches stickers whatever uh please please come up and get one and I appreciate you not all falling asleep thank you [Applause]

How to Train Your Llama: Lessons Learned from Finetuning Llama 3.1

Related talks