← All talks

Recog: Open Source Asset and Service Identification

BSides Toronto · 201421:17269 viewsPublished 2014-12Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
Greg says: Project Sonar is a community effort to improve security through the active analysis of public networks. This includes running scans across public internet-facing systems, organizing the results, and sharing the data with the information security community. This talk will detail Project Recog, a derivative of project Sonar which normalizes and open sources service fingerprinting information to improve host OS and service identification in remote scanning. Project Recog is an evolution of the thousands of hours invested in Sonar scanning to produce a lasting investment for the community in network service identification. Anyone interested in network reconnaissance and fingerprinting should appreciate this new data source.
Show transcript [en]

I work at a rapid seven and today I'm going to talk about two initiatives that we have yeah you know wait for it okay so there's two initiatives we have I'm gonna talk about projects so hello yeah you guys ready ready for this okay so two initiatives project sonar as well as project recog here so project sonar is an Internet scanning project like scanning all the things and project recog is an open-source asset and service identification model which is based on the data collected from the sonar project a version of the stock was given its sektor this year by two of my colleagues ross barrett and ryan papa but today it is given by me this hair is

a faded version of what was once a vibrant rapid7 orange but it was actually before I started working there which was about a year ago and before that I worked at the Citizen lab at the University of Toronto where they also know a few things about scanning the internet as a data wrangler there is still a technical advisor and they do a lot of research into internet censorship and state-sponsored malware attacks and i'm aware of at least one project there that has actually used the sonar data Thanks all right piped down nice nicely done all right so so yeah sonar as I said it's a initiative rapid7 launched a little over a year ago in September 2013

and so what is it it's an internal net wide scanning project as I mentioned it's the lab steam at rapid7 that does it but also it's a community effort which is a really important part of this so the idea is to identify and raise awareness about misconfigurations and just other issues in a general security space allow this to allow users to find potential security threats and other potential issues that could be used by attackers it's been continuously running since we launched it just over a year ago yeah so really it's a community effort to improve security through the active analysis of public networks so we run scans across public internet facing systems organize the results and share the data back to

the information security community and yes so so the rapid seven labs team is headed up by HD Moore and together we advocate this internet wide analysis as a practical tool for security practitioners the idea i'd like to meaningfully improve your network security so the goal the end goal is to enable the security community to start sharing data and working together so we can more easily identify and tackle all these problems that are confronting the industry and this isn't intended just to work for security researchers it's a tool set that all security professionals can use to become their own researcher and scan all the things or contribute to a shared analysis so like all the security research that rapid7 conducts

and publishes project sonar is designed to help organizations and individuals understand how they're at risk and take the necessary steps to protect themselves so it's you know part of a commitment to make our customers but as well as well the broader community more secure so collaboration is key here we want to enable the security community to collaborate more on this start sharing data and work together on the things that that we're all facing so anyone here like any of you can download the data work with it and reshare it with everyone else that's how we want this to work it's the best way to advocate and teach people about what is out there so so it's all about sharing with the wider

community so that's the the why why we're doing this what is it what what are we actually collecting here so among other things we've got a we collect all the SSL Certificates visible on public ipv4 HTTPS web servers so port 443 this data can be used to detect changes such as malicious replacement of certificates or reveal the revocation of a compromised previous certificate this data is complementary to the eff SSL Observatory project if you've heard about that other purposes for this include the detection of in securely reused or still actively used revoked certs an addition with a sonar data one can see all IP addresses or services that claim to represent a particular domain so this in turn can be used for

asset identification and detection of malicious certificate usage so also the certificate fields can be used for software and hardware identification in specific situations and this ssl work is being expanded to encompass non HTTP services as well like start TLS enabled smtp or IMAP server is that kind of thing so as well as a ssl stuff we collect the HTML content of the index page so just the root of all ipv4 web servers again such as port 80 similar to what search engines do but we just grabbed the initial root page we don't crawl the site any further so one of the potential uses of this data set is the identification of compromised web servers and like injected malicious

snippets HTML snippets such as I frames or javascript actionscript you know exploit kits so during the course of this collection this research we found several instances of these iframes pointing to exploit kits that try to infect client computers accessing the web pages we also use this data to identify vulnerable embedded devices through fingerprinting the content and the headers of the HTTP response and what else so DNS records we also gather the road verse DNS records for all ipv4 addresses and this enables organizational asset discovery and can help identify misconfigurations and potential DNS hijacking attempts as well and so and so now uses the domain names gathered from the above processes as well as certain TLD zone files to conduct DNS any record

requests and is also useful for asset discovery as well as for the identification of fishing portals and new malicious domains matching algorithmic patterns and finally well last but not least is the UDP services that are scanned as part of this project so this includes a netbios dns ntp ipmi nat PMP bacnet sip SNMP mdns and a bunch of others as well and we use this data to identify large-scale misconfigurations and vulnerabilities in the consumer enterprise and critical infrastructure systems that are out there so we've used this information to to publish advisories present present them to vendors to help them improve their own overall software security for example rapid7 did an advisory on ipmi last year there was an ntp one a few

months ago and most recently there's been a month ago there was a an advisory about nat PMP and we keep adding things every day and again share the information back out to everyone out there and again you can download this information and do what you need or want to do with it so that's the the y in the what where can you get this main site is scans I oh also sonar dot labs doll rapid7 so there's all the data is available here scans do is a partnership with the University of Michigan and swear we share everything so this where the yet the ssl certs dns reversing the route pages so it's downloadable there's a blog post by

HD more on rapid seven that details some of the data and talks about useful tools to use while you're working with it and processing it so that scans io the sonar page has has the mandate for the project like what we do how we do it and how you can opt out of our probes even though you know we're not really doing anything do bad but you can say no thanks if you want and so the other legal type stuff like that and also information about what we have and what we want to do moving forward with this project and one caveat I can mention is that since we're scanning all the things the data sets

can be quite large so hd's blog post is helpful for this tools like P bzip2 or pigs that exploits multiple cores to speed up decompression operations on these zipped up dataset files alright so that's that's sonar that's the the basic the raw data collection the foundation for project recog which is what I'm going to talk about it now so we took sonar and moved forward toward this now this is a bit from the rapid seven perspective because this is where I work now and I've the deck I've got was from the sponsorship track at sec door but uh most of you probably know about metasploit right it's an open source project metasploit framework that has a

commercial product built on top of it let's develop by rapid seven and come on the other hand a different model there there's an exposé which is a vulnerability management tool commercial product which has a free version that you can download in trial and use so for both of these tools a common issue that we've really struggled with and it's not just us I know like many others in the community continue to struggle with is the uniform identification of devices on the network we see a lot of things when we scan right networks are just jammed with open ports and the purpose of these isn't always known so we see an open port but but what service or protocol or

actual device is running on there so what's meant to tap in here is go from that vague understanding of what's happening in the situation where we have in ok so we pinged it there's an open port maybe some kind of syn ACK handshake type thing going on maybe there is a banner there and then you get like CSI enhance and you know this is well whose actual sonar not some TV but uh it gives us maybe an idea of the shape of something that's out there slightly better understanding and then the next step up here is a higher quality side scan sonar image of a sunken ship something we can actually recognize right so there's more

actionable information so yeah I think it's clear the basic idea here going from the basics can with potentially many unexplained ports or protocols and services to an increasingly better understanding until we can positively identify specific end points may be given enough context about them so sonar provides a lot of data and we want to code that our knowledge of that environment to leverage the data right so the raw data in sonar is terabytes of the ssl certs banners DNS records updated weekly or even daily and there's a lot of churn around this stuff like it just changes all the time just different configurations things moving around people asking us not to scan them but we

want to build on the raw data to get the real understanding so as of a month or two ago this data this project recog data is backing all of metasploit and necks poses device fingerprinting so a major theme here is that it's harmonized right we want this to be useful across project I'll talk about the github information later but it's open source you know other people can contribute to it so a quick example here it's a code snippet it's what one of the fingerprint files looks like so it's an XML file in the recog repo this example in particular hopefully you can kind of see it it's a pro ftp demon running on a linksys

wireless access point so hopefully this you know chunk of XML is in too off-putting so there's some XML some regular expression stuff going on in there there's the example tag which is helpful to for people to get a sense of what we're actually looking at is also used for test cases so running the the tests the regex is across the examples serves as a sort of test as well so I don't know how familiar any or all of you are with with regex is but you can see at the top here the fingerprint pattern equals and then we have in quotes this is the regular expression so needs to begin with the string proftpd and then it's in the brackets it's

capturing the version information so it's like numbers followed by a non white space and then we got server and then has to say linksys and then we know that it's this particular wireless access point and so we have the parameters there which refer back to the capturing groups in the regex so so where it says a pause equals zero that's that just means we positively identify this we know that's proftpd so we just say that we're not using any of the information that we've captured out of it out of the the capturing groups sorry and that it's a linksys linksys wireless access point but the parts that very we have the position equals one for the

version and then for the actual OS product and hostname are the other two positions the capturing groups here so this is helpful if you have a device on your network that could be showing up just as linux based on IP stack fingerprinting or something like that maybe i'll just say unknown so we want to move past that and get get more useful information so great yeah so I can't take credit for all this clipper but uh yeah great so so what now just go through a quick example here again this was originally done by someone who is in me I would have done my own home network recog discovery project specifically for this talk but I'm still evaluating the

best options for my future home theater PC solution I don't actually have that many interesting devices hanging out on my home network but this this qnap Nass here it runs a raid one file server and big Network stack all sorts of stuff on there you can see this is screenshot from an exposé scan showing all the services that are available there and how it's been fingerprinted so it's like Apple Bonjour the NFS it's got a built-in web server and you know more and more stuff so it's it's fingerprinting this is like Linux 2.4 which is probably not actually the case it's it's kind of old for for device running all the different services that this thing is so now the IP stack

analysis is not always that reliable and in particular here we can see port for 915 to at the bottom this is this fingerprint it is HTTP but has a Linux 2.6 banner so what's actually going on here doesn't really make much sense it's bad results it's not precise it's not accurate so so a upnp fail here's some some more human readable version of the next post log file from that scan so it's a you know HTTP isn't strictly wrong it's upnp running over HTTP but we can we can be a lot more accurate than this so we've logged the fact that it's linux 2.6 upnp one point oh we've got more information here so it's not

matching against what it found at the very least we should be able to call it upnp instead of http right and we could also do better on the OS fingerprint here so yeah again not precise so here's the another fingerprint example from recog so the original intent of this example was to find something to actually add to the project and go through that but it turns out it was already in there so but the idea is that you could be scanning your own networks your own assets and if the fingerprinting is not there not reliable not showing you what it could be you can write one of these fingerprints and have that available so yeah this is just I'm

not going to go through another example of the fingerprint since I it's more or less the same idea as the last one right just running a regex against the the banner that it pulls and the examples for what what this is supposed to actually look like when you see it and and pulling out the relevant data in a more structured way so here's the next pose scan again but with the recog fingerprints backing it and you can see that there's a new OS fingerprint that's linux 2.6 which is more likely based on what else the machine is running there and better fingerprinting of poor for 9 152 it's got upnp on there and a version

of lib pnp which so this this gives you better boner ability identification right in correlation with other devices or services on your network better knowledge and understanding of your network topology and you know something like this comes up you see it on a scan upnp this is bad news you don't want something like this on your perimeter of your network right so before it's like HTTP whatever but here you're like okay make sure that that is you know off limits to the outer world so yeah again you can contribute back to this is a github project if you've got devices that metasploit or an exposé doesn't recognize or worse it recognizes it but is completely wrong which is known to

happen grab a banner and you know it's it's not that hard to create one of these fingerprint files and contribute it back and then everyone in the community benefits so that although next pose and metasploit use this it's not directly tied to it it's meant to be agnostic to to whatever projects want to make use of it so yeah that and the data is there you can take it and do what you want with it so you know github basics the idea is just clone it you know for kit uh depends on ruby 1.9 point 3 which is not that new so hopefully you know reasonable requirement for most people and yeah there's a you know the example

tags you can put right in the fingerprint are a nice way to have super easy unit tests for them as well and so finally there's some some of the people that have worked on this HD Moore John Hart other rapid7 labs people and then expose team metasploit I mostly rapid7 and also thanks to Ross and ryan for this deck that I didn't have to create myself and that's all I've got it looks like we're at the 20-minute mark there's any questions I'd be happy to take them and if there is not I would be happy to not take them as well all right