
[Music] he
awesome all right so uh just a be brief introduction about myself my name is Peter I am a uh student at senica College in my second year in the information and security degree and I'll be giving my talk called application fingerprinting with Kitsune let me just get a quick survey here how many of you have heard of the OPM data breach sorry uh for those of you who don't know it's rated one of the largest data breaches this year there were actually two incidents one in May and one in June and there were over 22.1 million records stolen but it's not the only data breach that happened there have been multiple data breaches this year ranging from uh
in the Healthcare in incidents in the healthcare industry military um government Etc so the the question is is this unique to this year only and reports actually show that it's not the cost of data breaches has been increasing year by year and there's there has been a 23% increase over the last two years and the average cost uh for a data breach that happens is 3.7 million dollars which is Big right now um in 200 in 2014 alone there were 472 breaches and 140 million records still this was uh taken from the Verizon report which was published this year and it spans multiple countries so it's not just USA alone it's USA Europe Canada South
America Etc um for furthermore a lot of the breaches that happened were due to vulnerabilities and the one of the things that the reports showed was that uh the vulnerabilities which were um exploited there was a patch for them over a year old so 99% of the exploited vulnerabilities there was already a patch released uh so you know there would be an exploit then uh there would be a cve published and then a year after that cve published uh they found out that some of the system were still exploitable to that vulnerability so evidently vulnerability patching is a huge problem the question is why what makes it so difficult to identify what applications are running
on a system and fix those vulnerabilities if they show up and it really comes down to one main aspect that it departments don't know what vulner what applications are running on their systems so for instance take senica college for example example every student has their own uh account where they can publish their own web applications which they develop um but a lot of them are not completely security focused so when they publish a web application it may be vulnerable furthermore when the student leaves their account may not be completely deactivated so the vulnerabilities can still be there and it's hard for senica college um or these big it companies to scan everything and figure out what the vulnerable software
is and there's a few Main reasons for this um so infrastructure is becoming increasingly more complex as we're moving from uh hardware systems to more like cloud computing software Docker containers uh virtual machines Etc so for instance uh we don't only have to think about the vulnerabilities which are in the containers but also the containers themselves for instance the recent Venom vulnerability where a piece of malware could break out of the virtual machine and and affect the host so we also have to uh take that into account and plan for that um furthermore there are a lot of Cloud Computing Services where you can just take your application and publish it um or deploy your application but it's
impossible for the cloud uh hosting providers to identify what applications you're running and prevent these vulnerabilities so these third party um also a lot of these applications use third party services librar such as you know Ruby on Rails or Jango or uh a website may be using Wordpress or jumla which they all have vulnerabilities so not only do you have to account for the application that's vulnerable but any libraries that the application uses that could be vulnerable as well um and as you develop your career um as a software developer you can you often times change company sometimes and you may not have a full picture of uh the software that's running on the system so how can how can we fix this
issue how can we find out what vulnerable software is on the system um and clearly identify what they are there's a process for this it's called application fingerprinting which is tries to identify what applications uh and what libraries are on a system so uh let's examine some of the current tools that are out there the first it's called Uh p0f which tries to uh find these applications based on network activity but it's a purely uh offensive tool so as a uh from a purely defensive standpoint uh you might want to prevent p0f from finding your application the same thing with blind elephant it's a great Tool uh which tries to identify these applications based on the source
code which is which it produces uh the HTML source code but again uh it's an attacking uh tool not a defensive tool so as a Defender you would want to prevent the uh blind elephant from knowing or identifying what versions you have and then the third tool is called plost uh it identifies web applications based on the readme file so for instance it we'll try to find what version of Wordpress you're running based on uh the contents of the readme file um every application in version or new version of Wordpress WordPress which is released has its uh has to have a read me with it so it's an piece of identifying information which you can
find out what this application is um but currently best practices to dictate that when you deploy an application um you should not include the readme so you shouldn't be able to access the readme so it makes the tool sort of uh not accurate enough so um I want to talk about a tool that I've been working on for the past half a year it's called Kitsune which is um a Japanese mythical creature known for its intelligence which is usually depicted by a Fox so we wanted to create this application in a smart way and try to identify as accurately as possible from a purely defensive standpoint what applications uh are on your system and not only can It identify what
application you are running but it can also identify any libraries which your application uses so in the current state katuni currently uh supports or matches against 627 unique versions across 11 different web applications and we're constantly adding more and more as it grows uh it so a little bit about how it works it's split up into a three-stage process the first is it performs a file system scan and you can specify a path from where you want to start scanning and it collects all the files into what we call an artifact and in the St stage two we take the artifact the collection of files and we try and we fingerprint it so we say
uh which application do these files match and then the third stage is we put put into the probabilistic model so we can't accurate or 100% say what application it is or what version of is but we can say it has a 80% probability that it is you know WordPress or it has a 70% probability that it's jumla and we can control the um threshold we can say only show us results which are 90% or more or however much you want so it's a completely open source application um it's written in Ruby and it's using a sqlite database which you can change um to any SQL like database that you want and when we were designing
the application we wanted uh it to be as configurable as possible so you're not limited to our database but you can create your own database to use so if you have a application that you use internally uh you can create the file uh the fingerprints in the check sumps and uh stored into the database you can also import a database uh or download a database from someone else and use that in your system and we use a hashing algorithm called XX has which is a non-cryptographic hash function uh built purely for Speed so we decided that we didn't really need a cryptographic hash function we just needed some sort of fingerprint and um XX hash is really
really fast so it greatly improved our uh performance so in the current state um these are some of the command line options that you have so you can uh so it requires two options first a path where you want to start scanning from and the second is a uh the database file that you want to use um so you can specify what format you want to Output the results in so uh the default is just some sort of uh text output but you can say I want it in Json format or I want to in yamamo format or CSV um you can also create your own custom formats if you wanted to and the reason we did
this is because we wanted uh you to be able to uh we didn't want you to be limited to one output but want you to uh see the results in a way that you best uh would use the best and you can as I said earlier you can specify the threshold so you can say filter for results which are greater than this or uh less than this and you can also filter for a certain web application or a certain version so um now I'd like to give you a quick demo hopefully it
works can everyone see
that better all right so I'll just um do a quick example to
so um I'm specifying a directory where I want to start the scan from which is I have a application in my downloads folder um which I I'll talk about in a bit and uh the database which I use is just a sqlite database which has the uh 11 different versions or 11 different applications in it so it's taking a little bit to uh scan there a bunch of files in there so shouldn't take too long though can you all see that right um so as we can see the application I I had to I had to rename it because I was actually testing it on a uh application which was um which you could which is in
deployment right now um I didn't want to reveal what the application actually was yet um um but it was built I can tell you it's built upon WordPress and uh you can see that it uses three different versions of uh WordPress 3.9 uh about but we can filter these results even more if we wanted too and uh I'll make it a bit easier to read as
well and I forgot to filter it sorry
and I think if I'm not mistaken because in the results that you see here uh none of them were uh 90% or more they were around 89 so we should get no results here I think yep and then if we can say anything which is matches a 50% or more we should get a lot more results
so uh so it didn't match anything else so in this case you can see that the web application uh was built upon WordPress 3.9 with an accuracy of about 80%
right um so as I said again the uh application is completely open source so um feel free to contribute if you want um there's my information are there any questions yes yes any
yes um currently uh so the question was if we can um use this fingerprinting mechanism to scan uh not applications but devices themselves and um I don't think so currently because it literally just takes a uh check some of the FI actual binary data on the file system um I don't know of how you would be able to do like actually identify the device that way so right yes
um So currently it is web applications but if you can take any application which you have the or you have the source code too um and added to the database and it would still be able to identify the
applications yes yes
yes uh you currently you have to manually add it but uh we can add that not yet no yes
um I think if it was to be used in production it would have to be Rewritten first um but currently it started out as just a project to work on but you could use it in production um with a bit of work yes
um so uh so the question was there are other tools which do application fingerprinting so what makes this tool different than others um the first is from what I've seen most of the open source uh web application fingerprinting tools that are out there are from a purely offensive standpoint so they're not defending they're just attacking trying to uh identify the web applications in some sort of um black hat manner right um either by identifying Network traffic or uh trying to find files in a certain location on the web server and fingerprinted that way uh this tool is built purely from a defensive standpoint so you are supposed to run it on a machine which you have
complete access to and you would fingerprint the files so it's not so much uh trying to identify a version for attacking but trying to identify the version so you know what you can defend or if there is a vulnerability for that version you know what uh to fix yes no it doesn't uh the question was does the application support any apis and uh no it does not it it's written in Ruby yep it's open source so
[Applause] um for