← All talks

YARA Rule Writing 101

BSides KC · 202137:091.1K viewsPublished 2021-11Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
This session will discuss YARA rules and how to write them effectively for alerts within a cybersecurity environment. YARA rules can typically be added to any platform that analyzes binaries, knowing this, effective rule writing is crucial to protecting information systems. The three sections to YARA rules will be covered as well as the basics on how to test the rule itself. Regular expressions are a way for us to get the full capabilities of YARA rules and we will discuss patterns that can help in detection. Open source tools and a live demonstration of the effectiveness YARA rules can have will show the audience how to implement robust YARA rules within an organization. Aaron Riley (Cyber Threat Intelligence Analyst at Cofense Inc.) Aaron Riley is an Cyber Threat Intelligence Analyst with the Cofense Intelligence team. His responsibilities within this team include incident and malware research which require extensive analytical and technical skills. Aaron is an avid proponent of cybersecurity. He has three degrees in information technology including cybersecurity and information assurance and holds multiple industry certificates. Aaron shows great insight within all domains of the cybersecurity realm and presents regularly on multiple security topics. He attends many local and national cybersecurity groups and enjoys connecting with other peers in the industry.
Show transcript [en]

without further ado thank you very much my name is

all right so my name is aaron riley we're going to talk today about your writing uh yara is an amazing tool but first i want to talk a little bit

[Music]

these are our sponsors i want to thank them for everything that they do for us and without them that we wouldn't be here and without uh them i wouldn't be able to get to tell you guys about your writing my name is eric riley uh i do uh cyber event intelligence and analysis for copens uh i have a bachelor's in cyber security i have 12 certifications um there are a ton of errands in it i've noticed this so professionally over the last 10 years i've gone by rightly so call me riley okay and if you want to know more about me you can find me on twitter or discord and stuff like that but we're here to

talk about yara so that's what we're gonna do so what is yara all right well it's pretty much a framework used to analyze binaries and other objects based on rules that are constructed in a certain way think of a master search tool and how you can search just about anything on a file system it was developed by victor alvarez who works for vivestol and virus killer uses it exclusively on a lot of their binary data polls and stuff like that the acronym is pretty debatable it stands for yet another ridiculous acronym or yet another recursive acronym whichever one it's meaningless yara isn't meaningless all right so i mean the word is the final extension is that

you'll see when we do this is dot yaw or dot dot and we're going to use that yara yar rules have three main components with an optional component on the end and we're going to talk about that as well today we're going to go through just the basics of what yara is and then we're going to do a demo on it on the on

we're going to do a demo on what yara is and hopefully a live demo goes especially well because we all know how that works so i know you all are like i know you all right now you just explained it it's pretty easy right no that definition was kind of it's kind of long-winded it was pretty dense so yara is just a pattern matching tool it matches patterns that you set if you can see specific patterns within your data you can pull it out of the yard and what kind of use cases well you can do it with filtering of any data you can do it within email analysis is where i use it in my day-to-day life uh you can

do it within memory analysis so if you have like a sandbox that's studying uh or looking and detonating malware then you can use the yaw rules against that memory and pull out certain things that you can identify and malware with that's how i use it you can also use it to hunt so if you're going to be hunting you can use it in your network you can use it with the other yard rules on top of each other to help create an actual like path that leads you to your target so understanding this we need to talk about the four parts the three main parts in the optional part the three main parts are the rule which you see

here is this rule and then you need an identifier that's the name of the rule and so here i've named it the science kc underscore 2021 the optional field is the meta field and that's for your notes and your descriptions all right what you're going to be matching on the pattern you're wanting to match is going to be the string section and that is mandatory the condition section is the logic here we're literally saying hey with this strings that i'm trying to match how do i want to match them all right so you have the things you want to look for and how you want to look for them and those are the those are the this is

the main just straight construction of our basic yar rule here i say if stream one and stream zero are there fire on this rule so if b sides kc 2021 rocked or rocked is in there or rocks and besides kc 2021 or two different ways i want it to fire up so there's we're going to talk about the three mandatory things or all four mandatory things uh that the three mandatory things and the optional thing that you need for your garbage first we're going to talk about the rules the rule actual start so you need to start with the word rule and it needs to have an identifier like besides kc 2021 it's in c programming syntax so you kind of

can't get got to understand the camel case and underscores and things like that or if you're using a certain way it can use any alphanumeric character and it can have underscores but it cannot start with a digi the rule name cannot start with a digit uh there are keywords in the url that are reserved for the yara engine so you cannot use them within your rule name and we'll talk about those in a bit uh and then the main thing that i really want you guys to don't forget is your swindles i don't know i i don't really really know the word for it i call them swimmers you know the word for it because i i like swivels

and i find a lot when people go to copy and paste don't forget the bottom one or they'll forget the top one and it'll just wreck and you'll get an error you don't know why but don't forget your spoons brackets all right and so the optional section is meta and it starts with the meta tag and anything underneath it is seen as plain text and you can't start the the variable with the dollar sign that you do in the other ways so this one you can put as many characters as you want in it it's for note-taking it's for documentation if you don't know like if your valor your rule says just general rule and

then your documentation has all the reasons why you wrote this rule then you're better off handing it off to somebody else and they know what it's for and it's typically within the first section of a rule it doesn't have to be but for flow of reading it's better that way so that when you like hand it off to somebody else they don't have to scroll through everything and get to the documentation section of it now strings like i said these are going to be the pattern matching that the patterns that you're going to be matching on and you need to kind of construct them in a certain way each variable starts with a dollar sign

that's how it knows that it's looking for a variable you can have string modifiers which we'll talk about in a minute in a bit no case wide and ascii no case does case sensitive so it's incase sensitive y books for utf-16 characters and ascii just looks for passing characters your variables can be uh one of three things it could be a regular liberal screen like the text hello uh and it has to start with a quote and end with votes it could be a hex representation and it starts with a squiggle and ends with a squiggle uh and then it can be a red jets and the red x is a pcre so it's pearl based and so if you know that in

the french x it's very easy and i'll show you a regex tool for uh helping you along the way in a second uh here the condition section like i said is the logic it's the brain it's really where you need to kind of understand fifth grade math to be honest because it works in arithmetic order you can have your boolean expressions in there and we'll use a few of them there's a actual trick to the knot it doesn't work that's the trick and then they're the bitwise operator if you guys understand midwives operations like xor and all that kind of stuff you can use them within your logical patterns uh and then there's special variables that will show but

that's a little more advanced but we'll show a special variable when we're hunting for malware within my data set and syntax is critical you don't want to have a variable and an am and because it'll grab all of them together that's just all of them if you want like and or and this and that you can group them into groups and we'll talk about how that works so syntax is absolutely critical in the condition statement all right anybody have any questions about any of those sections [Music] exactly okay exactly any other questions all right because uh we're gonna move to a live demo let's see what happens uh and before we do that these are our

keywords um there's a lot of them i'm not gonna go through all of them i just want you to be aware that like just random words all any not in of those are those are keywords okay and then these are string modifiers like i said before we're going to use the no case we're going to use y and ascii and

all right so we're going to switch to

yes

those keywords are in the login using those keywords fully prohibited with the name of the rule yeah yeah if everywhere else is fine you can't have it in the name yes yeah okay they're there are other places where you can't have it for other keyboards so it's they're called reserved words so here uh i wanted to show you how i know our rule is constructed i have created a list of samples i have a bunch of agent tesla ripco snake nanocores all those samples of life there's about 15 samples in there if you wanted to know what they did and you wanted a really malicious usb i can give it to you but these are actual malware samples

that are seen in the wild and i've i've personally analyzed each and every one of them within this stream file i wanted to show you that the screen here is a warm ipsum string file okay so it's just a regular text and i want to go ahead and put all this together for you and i want to write a rule that looks into that text file for this green set kc all right now how would you do that if we were to do that we'd write you know rule and then i don't know we just write set kc because that's our identifier it's really hard to do this with the mic you all hear me yeah all right so we're

going to do it like this don't read your squigglies you don't even have to fill it out straight and then here's where we're going to put the barrier so if i put a variable i just want to look for set kc i'm just going to write it like this okay set kc and that's the variable that's all you have to do you name the variable and you've given it its contents great so now we just go to the condition statement all right now what am i actually wanting to look at if i can spell it correctly i'm actually [Music]

any other way

there it is and as you can see this is cyber chef if you're not familiar with cyber chef it's an absolutely great great tool you can get it online it has all sorts of different things on it even as a yara engine and that's why i do it it's free cyber chef git er gchq dot github dot io slash cyber chef it's amazing and what i did was i took that test stream uh here as you can see this text here and i dropped it as an input file right here then i loaded up my yara rule and i said i'm looking for set kc the rule says all right set kc the rule has

matched one time and it's matched on this wording here awesome all right that's great now you're thinking to yourself what's the difference between that and find and replace well find your place

so find and replace uh only happens on the text file the rules can do it on all file system you can literally point it at an entire file system and it will locate every file within it and go and find that set kc so we're going to actually move from this to the [Music] to the actual yara engine and we're going to go ahead and test it out so we're going to do i've written a couple of rules that are pre-written pre-written in here and this one i call the midwest magicians and you can see it if i open it here it's the exact same thing testing yara calling hard set kc everything like that okay so we're going

to look for in this text you know and import that so we're going to write the arrow um and then actually we'll just show you the help files with it so yara has the manual it does all sorts of things right we're actually going to be looking for the print strings and we're going to go recursive so you're going to see that a lot i'm going to do the dash s and the dash r recursive means that i'm going to be looking at all subfolders and everything underneath it and then dash little lowercase s means print the strings that you match on so we're going to do yara just to initiate the engine we're going

to go rules because that's where we

magicians and then i want to go into the file that i want to do or the folder structure that i want to do we're going to be dash s dash r as you can see midwest magicians the rule has hit on this file and it hit on this variable which is that easy enough right all right so if we're doing this and i changed that i said you know what set kc is actually no case means case insensitive let's just make it where i i don't even understand where this case incentive comes from will it hit if it's capital rule nope

so it's easy to see how you can use it on a text file and things like that so now we're gonna go ahead and we're gonna move to like a cred fish we're gonna see like uh what it's like using it on production systems so i have an html a folder that i have 50 different samples of html cred fish that are verifiably malicious and i can show you kind of the patterns that they see so with patterns within cred fish you typically see it's an htm or an hdmi file and it either has a post or a kit that's typically to a php so if i was going to do that and i want to do that

we're going to go ahead and we're going to open up this rule and we're going to see how we do that like again i said we need to have the rules name here we're having uh the gold the h2 knowledge for cred fish it's pretty descriptive the meta we're testing a yara on html for good fish this strings it's going to be we're looking for php so i put the note case on that but here on the bottom one you can see oh actually that's not very big

all right so as you can see with the html variable i'm actually using a regex and regex starts with two forward slashes and or starts with a forward slash into the forward slash uh what i'm doing here is i want to match on htm and html files so the l is actually in a parenthesis group with a question mark what that means is it's either there or it doesn't have to be there still so i don't even have to write two different htm or html it's just a red text for both and the condition here says php and html so i want to hit both when i go to file all right so we're going to do yara

rules

and then

[Music] as you saw before the samples have a whole bunch of folders and everything in it so we're just going to do dash s dash r like i just want to know what's everything in there and that has over 600 samples in that folder 600 look at this all right so now we have it says we hit on this is htm the hdm's name is bradley macmillan i'm guessing it's bradford and it's just gaps over here right on the mcmillan.htm it has a php it's html within it and some of the samples and stuff like that that's that's

live ammo right

okay so you can see it actually hit on that but a lot of legitimate things have php in them and a lot of legitimate things are html so what if we go ahead and make a thing further what if we start looking for things like https or a host and things like that so i went ahead and do a rule that does that so let's look for html with a post all right so here i have the post which is typically happening the php but then again on bottom side where you put your regex screens are on the bottom of your strings because of speed and processing and i put another regex string in there

i want to look for http or https i want to know both but as you can see i've changed the conditions i've changed them from this and back to all of those i want all there's no reason to write them all out i can just do that and it works so let's run that one

yeah that's crazy

[Music]

[Music] and again we're seeing a lot of them that have the post in it we're seeing things that like here's an html it's got a php it's got a post uh and it should have an http right there so there's that one right there so you can say okay that one might be a little more malicious let's down down a little bit more and as you can see here i meant to uh notify you guys i i switched and you can add multiple modifiers to each screen so i said i wanted to be no case and i want to ascii only all right if you change it to y for utf-16 you'll miss out on a lot if you change it to

ascii you'll miss out on a few so you really have to kind of construct your rules and how the targeted scope of what you're looking at for it to work so the next uh we all know that microsoft's a big brand that's uh been spoofed a lot of times and we all understand that most uh cred fish pages are stealing your credentials so let's think about for a second if i'm looking for something that's stealing my credentials and i know that it's going to be a post and typically i'm looking for microsoft brand uh it's gonna have username and password right it probably has login but it won't have all of them so here is

a rule that comes from microsoft redfish all right and it's looking for the php the post microsoft login user username password the two regex screens before well i want to know microsoft specifically so in my logic here as you can see it says brand for microsoft and all of so it's a grouping of these variables so i say this and all of those okay or this and one of those three so i don't have to fire on all it's not like your fifth grade mathematics right you're just writing it out so that it fires in a certain way all right and if we run it

we can see that we get hits https from html has a password in login microsoft post and it's dfa from debraj so you can probably imagine that this html was an attachment on an email that went open looked like a microsoft login page because it has a post microsoft login password because it doesn't need the username the username what they're trying to do is get deborah.chapman to log in okay you don't need the username you need the password so that's how you kind of hunt within datasets with html for using yarn rules for malicious crayfish and things like that now if we were to start moving in a way where you're like ah i don't really deal

with redfish or html and things like that i deal more with malware well i don't know if you know what agent tesla is it's probably one of the most prevalent key loggers out there today it's a massive volume it's originally designed by a guy out of turkey it's in four different versions it communicates via email it communicates via https it also communicates via telegram all this malware is kind of a swiss army knife for what it can do but it's sold uh on a lot of public forums so there's a ton of people who don't know what they're doing sending it out and we get all sorts of different versions so that folder is absolutely full and just so

that you know i'm not just kidding you yeah these are live detonated detonatable executables and i'm not going to put them on properties just so you can see windows executable so i get them down for my work i look at them i do all sorts of different yards and things like that on them uh and then so if you were to be looking at them you know if you were a malware analyst or you were even just trying to understand the threat landscape uh you can start gathering all these different malware payloads and you can start doing pattern analysis on them with yara as well and so what we're going to do is we're going

to actually move to looking for binaries and so i'm going to go back to cyber chef and i'm going to load in a binary all right and then we're gonna actually look for uh a packer a well-known commercialized packer uh and for all of you that don't know a packer is used to obfuscate a payload what they'll do is uh threat actors where even legitimate services will take a executable and they'll pack it which means that it's compressed and a lot of the files inside are hidden away so that you can't do static analysis on it when ran the packer then unpacks itself and loads the executable so we're gonna look for a well-known pattern

and to do that i'm gonna do upx just like we were doing before i'm gonna come in here and i'm gonna look for the uh upx signature upx is the well-known packer that i'm talking about i'm sorry i didn't state that just a second earlier uh it's got it's commercialized like crazy it also has a self unpacking tool which is what is one of the best things that in my life and it should fire on that and all right there it is did you see the difference all right so it wouldn't fire because the u and the p and then the lowercase x it wouldn't fire because the capital u lowercase p okay it's x but if you put no case on it

it then does fire upx is known to put their upx in all of its streams it's all over the place so it's an easy find okay so if we're looking for ups pack binary ups binaries we're going to start looking into different yar rules for it so that's one of the easier ones and i have that rule already written up all right so here we're going to go down to we're going to write our url again so yara just to instantiate the engine rules and i just really just want i love my samples folder and yes and to be recursive show me the screens let's be recursive all right as you can see there's a ton of upx and

stuff like that but it's we're getting odd things is an html and xq why would somebody pack an html that's not right what you're seeing there is the html probably has upx within the code base but it's probably some kind of encoding it's a random string so that's a false positive all right well let's break it down let's get even more in depth what would we need to look for if it was something more than just the upx let's look for actual executables not just plain text so ubx underscore bin here looks for an executable we we all know what executable has a header the executable header has certain things that we can look for the mz is

like the key to most executables whether it's farther off into the payload right at the first segment to write the first bit of the of the header it's got to be in there somewhere right well i know that mz is the stream and i also know that uh 4d 5a is the hex of that stream and i want to find one or the other right and so here we're actually showing you how hex is used like i said squiggly or d space 5a is in z and hexadecimal close your squiggly all right now the condition says i want the upx packer and header mz or hair hats so i want upx packer number one all right and then the other two could

be one or the other and if we go to run this

we can see that yeah we're getting mz45a we're getting all the upx and stuff like that on this one file that's awesome all right that's got to like get rid of our false positives with html right but we can check that so we do that and we rep for html there shouldn't be anything that's crazy that's nuts right the reason is is because mz can be another random string within that encoding of those htmls all right well this is where it kind of gets a little harder because you know how i said not has a special trick to it which means it doesn't work and if you were going to do like a uvx packer and header hex and not

mz it wouldn't work not doesn't work and so if we were going to go ahead and do that we need to start getting into a little bit more of the use case scenarios and the conditions now uh there are special variables that the conditions can have but i'm going to show you how knob doesn't work because i am a button for punishment uh here the ubx packer it's all the same thing all i did was add the html string because i don't want that if it's got an html screen in it don't give that back to me as results right okay so let's go ahead and run that rule upx in underscore not html thank you very much and we're going

to run it through the rep if this works you shouldn't see any results ah we have results why because not doesn't work there in the conditions in advanced yara you can have if else conditions you can have uh while statements and stuff in there and that's where knot works it'll say if this equals or not equals that's where not works it doesn't work in the condition statement like this so how do we find the mz header at the front of the file and make sure it's an executable well we can do a whole lot of string matching we can say this this this this and do a whole bunch of things like that but yara and its awesome abilities already

has a bunch of different what i like to call special functions they're called special variables and we saw them earlier here when we're doing keywords so in 16 means integer of 16 bytes into 8 integer of 8 bytes and 32 uh 32 bytes like it is but those are little indian which means that the red backwards that doesn't help me when i'm trying to read it as a human all right like i don't read backwards i can't it takes a long time i barely know how to read as it is people all right like don't do this to me so they created my big indian and all big indian does flip it the other way

so there are special use case functions that we can use to help us so i'm going to open up this rule here called ubx bid all right and this is a very sleek very small rule that does everything that that rule did before the ruleboard did but better so i'm looking for the upx factor but then on the condition i'm going to use a special functional variable that says u and 16. so i'm going to say on the integers of 16 big indian so read the way i read on the position of zero so when that header when you open up that file the first position the first bit and goes for 16 bits that's what i'm looking for

all right i want it to be equal to the hexadecimal representation 0x or d5a i want that to be the first thing you look for because i haven't said this before but yaw rules they start the condition when they're being read by the computer and then they move to the strings if the condition says this and then that it goes well this has to be first and if that's not there it doesn't work it moves on to the next rule it's very fast in doing that so i want to know i want to know all the executables first awesome well then i want to know all those that are packed with upx pack all right and i don't even have to set that

bottom special variable within the string section it's already going to do that for me so if we run this rule just like i said

there we go we're starting to see a different malware and it says these are executables these are actually malicious files and they have the upx tag line in them and look there's only six seven there's only seven here and one of them is a double up because i used the malware twice i used it in two different places so we went from over 600 different samples to seven but using the r rule with two lines that's pretty solid you can use this anywhere on your network you can use it on an endpoint for a dfir you can use it for email analysis where i use it you can use it on data links just to say hey

what's the data point here just throw it out okay all you have to do is download yara it can run on operating systems of linux mac i'm not sure about windows but cyber chef does it for you and you can even download a isolation of cyrus cybershep to your local machine all right now if you have problems with regex and things like that i didn't show you this but there's this tool here called regex101 this is kind of sidenote i love this tool it's amazing it helps out a lot with regex if i have an issue i'll throw a bunch of code in the bottom here and they'll just start typing into gear and as you can

see if i don't do it right it'll give me help on the side it'll even tell me if i'm doing pcre or pcr e2 it's a pretty solid tool and i really like it and it's used a lot in tandem with your boys all right so to be honest that's my top and i think i'm a little early so if you guys have any questions please let me know and go here to help me get back

hit me up on twitter at malwarepi and i swear i'll give you whatever you want not my social security number not my credit card you can have my firstborn child though he's two all right but yeah any other questions

he is an amazing person he's an amazing man uh florian rob has a database of yard rules that are free out there that you can get he has really good ones that are paid for but you have free ones out there that are awesome uh yara has documentation that they have a whole set of yara rules that are basic rules they're everywhere helped develop the second iteration i believe of yara and he has been able to implement an automated system where when virustotal gets in a new sample his automation system creates a yard rule for it and pumps out like that it's great so i i encourage you to follow him there's all sorts of resources on nara

the documentation is amazing if you have any questions about it just look at the documentation there's over 300 pages i haven't read it all because i'm not that weird but it's out there it's great any other questions thank you all your great [Applause]