
Welcome everybody. Uh my talk is making open Intel open up for the visually impaired. That's an open door. Oh my goodness, the darkness is bad on it. Okay, that's an open door. Uh and we're going to unpack what Open Intel is about and how it can help uh pentesters, bug bounty hunters, etc. First, who am I? Uh you've noticed possibly a misspelling of my name. It's not a misspelling. I'm Polish and that's how you write my name. Uh you can call me Shimon. Uh but I don't expect you to get it right. One of people in SensePost only found out seven years later that my name is Shimon and that he thought that others were joking.
It's not. Uh you can also say Simon. I'm a pentester at SensePost team uh for cyber defense and I enjoy recon apps and sec ops. What inspired this talk? Rapid 7. I'm not endorsing them. Uh so they had a thing called open data which was based off of project sona um and you could easily download that data like five six years ago I should have listened to villim I didn't listen to villim now you can see that they've got about 50 terabytes of data but you can't access it you need to submit a request they've also changed their terms if you're an individual researcher don't bother submitting the request they're not going to listen to you uh if you are
a company or you come from a company, there's going to be a commercial thing. That's what I found out. Uh, so every year I sulk and I look around and this year I found Open Intel. I think I improved in Googling. Uh, so they've got millions, billions, and trillions of data. Sound like the president right now. Um, but we're not going to focus on all of that. We're only going to focus on the forward DNS data that's available. Uh I took a screenshot at that time it was 2.6 terabytes of data that I downloaded. [clears throat] [snorts] Um like skinning a cat there are many ways to download this data. I first started off naively with a
Python script. Uh then Michael Roger back there did a silly bash script. And then Leon one day asked me why didn't I just do wget- mirror and I felt really bad at that point. Like legit it took me an hour to recover from that. Uh and then a month ago I looked at it again and I saw oh hey it's actually an S3 compatible bucket. We could just get a client and sync. So that's what I did with rsync. And we didn't have to code up any logic to like kind of uh get it. We just told it that and it would sync the entire thing right? Cool. So the data is about 3.3 terabytes, 67,000 files each day of each
month of each year from the active sources which we'll touch on just now. We take a we store that data. They go and do these DNS measurements and then they go store it and it's publicly accessible. Maybe after this talk they'll take it down. I don't know. Um and it's stored as a parkquay. Did I get it right? No. Damn. Okay. Park whatever guys. uh and it's column orientated. Uh it has about 90 columns and there's a asterisk caveat uh because they started in 2016 and they probably saw that they needed to expand it and make changes etc. So now in some of the files you'll see like 95 columns. So it differs when which which year you got it and so on.
But there's a way to get around that. It's not going to be an issue for you. the current sources that they have. They base it off the top 1 million uh for like Cisco, Tranco, Cloudflare, Alexa no longer because uh AWS stopped monitoring and giving that data. And then uh Google Krux was something for this year together with Majestic Million. We don't care about that. We're going to look at Google Krux a bit later, but we do care about Majestic Million in the grand scheme of things, but uh Krux will get a better kind of recognition later on. Cool. how do we work with parkquay or whatever you call it. Uh so you've got various libraries for various popular
languages. Uh and then you've also got clients that can support it. Uh the one that we're going to look at is duct DB. It's a pretty cool tool and we're going to shine that spotlight on it. I hope you use it too. Uh so reading a file, a parkquay file with Duck DB, you can just do a SQL query uh give it the file name and then it will return everything to you. Uh, if you do that, don't worry, duct DB will save you. It's going to try and spit a lot of data at you, but duct DB goes and truncates it. So, that's pretty nice. Unless you do dash list, then it's not going to truncate it. Uh,
now, if you've got more than one file, which is what you're going to end up with because there's 67,000 files over there. Uh, you can do globbing and it will support that and will go and try and read everything for you. Now, I did tell you guys that um they differ in columns. So there's certain files with 90 columns. Certain files with 91, 92. Now when you do globbing, going to probably get confused. Oh, hey, there's some more columns here. There's not so many columns here. Uh if you do the union by name, true. Then it's happy. It's going to stop complaining. Oh, sorry. I double clicked. Uh you can also do remote files. So you can go and read
a remote file. And this is pretty nice because again there's a lot of data. Maybe you just want to go after txt records and so on, right? So you can grab the remote file and then use some SQL to say call where txt where resource record is a txt type etc. You can store that and then nice thing with duct db is you can just add JSON and you're using SQL query to greet the file but then it comes back in a JSON format. So then you can pipe that into any kind of JSON compliant tools. Um, or you know, if you are pretty bad with SQL, but you're good at jQ, then you could just pipe that
into jQ and do your filtering over there. Cool. So, you know what the data is, you know how to get that data. Now, what what are we going to do with this? Right? Um, so we're going to take a quick pause. Forward lookups. You use something like a host, a dig or NS lookup to try and resolve a host name. And in the end you should get a IP address. I say you should depends on how they configured it. Like here they did a CN name and then they made an A record and so on. Um so we're just focusing on that. You get an IP address at the end. I just want to establish this concept of
a forward lookup. Cool. Then in DNS you've got PTR records or pointer records uh or also colloally reverse lookups, right? So you can do host the IP address and then there will be maybe a record configured for it. You'll see here it's not tipdub.fas.com facebook.com but some edstar mini shv whatever right so it doesn't correlate and so on but the point is there's a ptr record that allows to do like a reverse lookup and I'm going to be talking a little bit about reverse lookup somewhat similar to this but not exactly this we're going to pause again name servers right can we do a name server reverse lookup right so can we find domains that
are configured with the same name servers right you can do host-tnsfas.com you'll get all the name servers for facebook.com Okay. Can we do the backwards of that? Can we look up a ns.fas.com? Yes, with a star. There are services out there that can help you do this. View DNS, who is XML API, security trails, hostile. Those are the four that I've listed that I like to use quite often, but this isn't the exhaustive list. You can find out more on the internet and so on. Why how do they do this is because they go and collect all this data and they've got a database and so they will go and look up on their database. Cool.
which other domains have these name server records. Okay. And this is what it looks like. Cool. Uh this is who is XML API. Uh you'll get a JSON thing and uh you can see cool. Uh one count kit is part of facebook.com. Two count kit is also part of it. I don't know what these are. I didn't look further. Uh but there's a problem with it. Uh eventually you'll run out of API credits. the demo is pretty limited in how much you can get and so on, right? And we don't like that. So this is where open intel comes back. You can do it against open intel and you can maybe get around that. You
have your own data. You can query as much as you want to and see if you can get that right. So over here when we did that lookup we got now 259 rows back. It only shows 20 because it truncates it etc. But now we have 259 whereas the who is XML API I think gave me the first 50 or first 100 without logging in and so on. Cool. So it's a nice way to use that. Taking it further um and this is kind of one of the better points or aspects. Uh so we've got txt records. Most internet based services do not do reverse lookups on txt records. So it's really hard to find it or maybe I'm just
bad at Googling, right? and I couldn't find that yet. Um, why is that important? Uh, one of the good examples is, uh, verification, showing that you own a domain. You set it up with a TXT records and, uh, cool. Uh, this is like when you sign up for an online account uh, to say show them that, hey, you own this. And then what people like to do is they have more than one domain. Uh, so they'll put on all the domains that same txt record. Okay. So let's do a reverse look up on the txt record for amazon.com. Uh in this case we did the TS17 etc etc. And again we got now 54 rows back. Right? So it's pretty cool.
We're able to find and expand our scope if we're like kind of enumerating recon target. Right? Another benefit of that is if you're a researcher and you're looking for new targets etc. you can monitor uh the TXT records and you can see cool what other new services are emerging what's becoming more popular and so on so you can focus and drill in on that which is also pretty nice okay change of direction uh this data has been spanning over many years uh if you have a good memory Alexa started 2016 and was retired in 2023 this is Cisco started in 2019 and it's still ongoing right so historical data right we've got access to do that. Here's an example. I
went and looked for all of GitLab's uh tipup.gitlab.com, all their IP addresses. And here I got a whole list of it. I did truncate this and so on, but I also highlighted day, month, year that these were captured. Why is this important? Um, so nowadays it's getting more and more popular, actually very popular that people use Cloudflare or some kind of W. they want you to go through that and you're pretty much stopped when you're doing scanning etc. So you want to find a way to maybe find the origin IP address and [snorts] see if they've misconfigured that you can hit it directly. Right? And so that's what we're doing over here. We're going to see if we can if this is
helpful, right? If this open Intel data can allow us to do this, right? So it's time to fo. Let's find out. So, uh, I found a repository in GitHub which goes and, uh, queries all of the bug bounty platforms for targets. I ended up with 34,282 targets from, uh, these various sources. And I found that 1,765 of them were behind Cloudflare, right? I only checked Cloudflare. I didn't look at things like Akami and so on. I just wanted to see, cool, is this good or not? I found that 68 of them were directly accessible. Right? So I found the origin IP. I went and connected to it. I saw I did a similarity check. I saw that 68 of them
were 100%. I don't in this count I do not have things like 99.5%. I didn't know where to draw the line, but I wanted to show you that's 100%. Okay. The image is supposed to show a bypass, but it's either my prompting is bad or LLMs are bad and they don't know where to put the security guard. It should have been in that curve. We're going around it. Cool. Um, if Open Intel doesn't work out, you can also use Virus Total. I find that to also be a quite nice site uh for looking for uh historical IP addresses as well. Just going to throw it out here because it's pretty useful. Um, but you do have the
same limitations as I mentioned earlier, like you've got uh a set amount of API credits etc. The one problem with open Intel is that it's based off of the top 1 million of certain DNS service providers. Um, and why that's a problem is let's say South Africa, its user base on the internet is maybe much smaller in comparison to the US or to Asia, right? So our take lot might not reach the uh 1 million right. Um so what happened uh is that this year they released Krux Chrome user experience reporting. Uh that's a little snitch combo with uh Chrome logo. Uh because uh Chrome if you set it up it can snitch to Google and tell them oh
hey this user visited this site this is the response time etc. This is more for a performance thing but they collect that data and that data is freely accessible to anybody. There are caveats over here. Um they have a certain threshold to be listed in that site and also your Chrome needs to be configured to snitch on you. You need a consent. Uh and the other caveat it's only started this year. So if you're looking for like historical data etc. uh it's not going to happen now but maybe visit that site in 5 years time and you'll have 5 years of data collected right so you can use this and this allows us to you know make uh attack
targets that are more regional right so if you're here in South Africa uh you could be more successful and that's pretty much my quick lightning talk on open Intel any questions