
uh as he's getting micked up i will give you some clues about his identity he is a return speaker to the b sites tel aviv stage he is one of our most popular speakers and i expect this to be a fascinating talk and he's got a few hobbies in fact this morning he went for a run i'm talking not about a run command i'm talking about a run on the beach yes which is quite surprising with this weather and what else i can tell you about this mysterious speaker that i'm sure you many of you already know i'm talking of course about gal butansky so galbitansky is going to be ready to come you can
come from here it's okay okay so please give a warm welcome to galvitansky running all the way from the other side of the stage and gal is gonna talk to us about some really really cool stuff we're going to learn about leveraging passive fingerprinting for link scanners evasion what are link scanners salute kampai what are link scanners and how do you evade their detection all that and more by galbitansky give it up to gal yay so yeah it's working so hello everyone good evening and welcome to my talk my goal today here is that that everyone in this room will know what what link scanners are and how we can use this fingerprinting concept in order to evade
it so first and foremost this is my all of my social media accounts i'll greatly appreciate if someone will send me pictures of this talk because my polish grandma is driving me nuts so bonus points to anyone was assisting with this kind of area okay a bit about myself um yeah so there will be a lot of cheap powerpoint animations in this talk that i am a powerpoint animation evangelist but i'm also uh really into all kinds of malware evasion techniques i did mostly this stuff for about four years but from there i kind of proceeded into doing more web related stuff at akamai namely bots mainly i'm currently sharing my home office with all of those feline beasts that's
chewbacca sitting there on my shoulders actually doing the research you are seeing today fun fact and also i really like b-sides as karen said it was the first place who gave me a chance to speak here today here's like live footage from the like turkish hamam we had for the first b sides with inbar even next me on stage it was you can't even imagine how we we got like so much further okay so what is fingerprinting let's start with speaking about fingerprinting not even in the context of any like technical stuff but just like fingerprinting 101 we want to positively identify an object by collecting properties we can do it actively like asking questions like hey what's
your name or can you show me an id this will allow me to identify this object right but we can also do it in a passive way i can for example eavesdrop and overhear maybe those two girls speaking with each other and then overhear the name of one of them and by that i might be able to identify one of them but we are not here doing csi or even csi cyber oh god forbid we're speaking about technical stuff so let's proceed and see how this concept takes part in the world of web so i guess that most of us already written at least like a couple of python lines which uh perform a http request fetching
resources from the web and you know that client's implementation varies you know every one of us can write different codes different languages different implementations different errors maybe it just varies and also the infrastructure level you have different os device operation systems you have different network devices all of those might have all kinds of impact on the fingerprint that you have we'll see examples like much later in this talk do remember that some of those are by design for example user agent user agent allows the server to serve you with content which might be better suited to your device like if it's a mobile device of some kind so it might allow you to do this kind of
stuff but and some of them are not very those are side channels for example different browsers return different errors or they pass different stuff a bit differently they have different implementations for specific functionality not going exactly by the rfc sometimes it does happen it is nothing new it is being used it is being used in the product i worked on in the area of the anti-bot industry um trying to detect bots by the fingerprint for example you know stuff like recaptcha and this area they use exactly those tactics we're going to hear today and also in this like web fingerprinting technological kind of concept you also have passive and active fingerprinting passive meaning that you only
take the incoming traffic and sniff it and pass it and do whatever you want without actually altering the normal flow of you know network communication and active well you might ask the the server or sorry the client um can you execute this javascript for me or you might try and check which ports are open on the ip that is you know associated with the client okay and if you want to see a bit more about this tactics i really recommend this website am i unique it will show you a lot of those tactics you can scan this qr code there will be some qr codes throughout this talk so feel free to do so am i unique
okay fingerprinting we know it's fingerprinting we have some idea about web fingerprinting but what are those link scanners or let's begin maybe with why why we need those link scanners there's an actual email i received from the not so legit domain with the link there below something something with a tlt of xyz that's why we need link scanners because links can be dangerous dangerous so we need to check them before the user accesses those links hopefully and then alert and block and if if no one actually clicked the link and went to this you know malicious website we win as simple as that no one actually accessed this phishing content or downloaded the malicious binary so good
for us but where we can find those and it is kind of surprising but a lot of different products have this kind of component which fetches data from a url not only link scanners but there are a couple of those for example virustotal there's a url scanning capability in virustotal there's also urlscan.io which is amazing and highly recommended it is from like in my opinion there are also email security products you have incoming emails incoming emails have urls sometimes like in my case um g suite office 365 all have this kind of component scanning incoming emails for links and then tries to grab something which is behind this link sandbox solutions also surprisingly enough you can scan a file and submit it from a
url which means there's a client downloading this file into the machine somehow auto magically like some people like saying even instant messaging platforms sometimes have a similar component for doing this kind of stuff if someone which you don't know sends you a message with a suspicious link it might get scanned but you know it is all very nice and fancy but what's actually taking place behind the scenes so those content html content binary content and we want to fetch this content somehow how you fetch it usually http request we're going to speak today here about http requests mostly and how we do we do that well you guessed it the answer is always python something like this i am not familiar
with the actual implementation of some of the clients for checking the security of links but this implementation might be actual stuff which actually goes there behind the scenes that's the best code in their opinion they don't have security in their mind too much as we're about to see so we know what's fingerprinting we know what's web fingerprinting we know what link scanners are and why we have them but how we can leverage this fingerprinting stuff to evade link scanners because that's our goal here today before trying to fix things later in the talk so i had a what if moment link scanners are web clients just like bots just like browsers just like anything which fetches
content from the web and web clients are fingerprintable because i know that from my day-to-day job as i said implementation varies on many different levels so let's say i can identify those specific clients and i can tell oh this client that's not a victim that's not a phishing victim that this guy or girl doesn't try to download my malicious file and execute it that's an actual like i don't know varus total trying to scan my website so this will allow me to identify this specific client and send back a very like different response um graphically a bit of like a simplification server client request fingerprinting and then i make my decision whether or not i want to send a
benign or a malicious response as simple as that now for the fun part let's go layer by layer and try to understand where the vendors did something wrong see some very fundamental flaws which were across all the vendors i've checked and some nice examples we'll start from the like top from application level and go as deep as even link layer so application layer my favorite http everything is so easy nice and dandy and here comes the thing which annoyed me the most i've inspected 15 different vendors in my research and the thing which most of them did wrong and really wrong was obsolete user agent you're all running your machine you have a browser if you're living in 21st century and
it's usually kind of up to date maybe you didn't update it i know a couple of months maybe half a year but what we've seen in our research is prehistoric chrome and when i'm speaking prehistoric i'm speaking about chrome 50 something which is over five years old or even incredible stuff like inter explorer eight or even inter explorer six i think that today in the crowd we have people who are younger than internet explorer six and i'm kind of joking but i'm not it is like even either explorer 8 is over a decade old just like to give you some reference and my all-time favorite virus total i don't know how many of you ever checked it but when you scan a
website with virustotal there's a user agent but that is the user agent that you get yes the word varus total is part of the user agent and yeah by the way that's i think one of my favorite animations in this presentation it is inexcusable to scan a website and check if it is a malicious or not when you're saying to the website oh yeah i want to check if you have any malicious content no they're aware of that by the way at least for a couple of years
some other header anomalies now squid proxy is a proxy solution for almost any proxy related stuff you do everyone uses it but it ha it i think by default adds a via header this header was added by a sandbox solution i've tested it was fixed by the way but not only that it included this via header which included i know squeeze proxy which might be suspicious but not necessarily associated with a security product it included the entire path of the machine within the lab like sandbox machine number one and like sub lab number three or i know what but it disclosed the entire topology of the lab in this header again ridiculous but it is what it is
also another one of my favorites one of the largest email security vendors had this referral header kind of trying to explain it they synthetically added this header which means to the website you browse into that somehow you've found exactly the name of the link somehow magically in google i know how and from google after searching for exactly this url you reach to the potentially malicious website no one actually ever looks for http uh whatever it doesn't happen moreover if you're clicking on an outlook link like in the desktop version of outlook there's no referral header at all so even the presence of this header is a telltale of something fishy is going here it was fixed as well by the way
let's go a bit deeper so if we've been so far in the application layer we'll go a bit deeper now to transport internet and link layers all of them are a bit more technical i won't dive too deeply into them but let's give it a go so there's an entity called as or asn autonomous system number it is it has something to do with a protocol called bgp their large entities reside within these protocols like isps or enterprises for example but what if those enterprises are security companies like i don't know kaspersky palo alto akamai i know [Music] almost every large enterprise have an as for its own so let's say that the ip range which is associated with
this entity is the one scanning my server it is just like again a very distinct telltale that i should never respond maliciously you know why should i as someone who's trying to run a successful phishing or trying to kind of spread malware send for free my content to the security vendor when it is very simple to know like this is to distinguish this association with the security vendor also there are many security like sorry cloud providers every cloud provider actually has their own aes google azure like microsoft aws amazon and digitalocean you name it all of those are also again a very good indicator for someone who's running some machines in the cloud our victims
are almost never you know how many of you are using a cloud machine as their day-to-day daily driver machine i guess that's none which means for an attacker if it is a cloud machine don't attack it next some examples you see this this is what you'll see when you're hovering on top of an office 365 link by the way it is all mangled i've switched a lot of stuff so don't try to copy paste it and type it manually you only waste some time and just before you click it of course that microsoft will check whether or not it is malicious right but how it will send an http request from this ip which surprisingly enough resides
within microsoft address space so yeah it kind of i don't have a lot of motivation to send malicious content back to microsoft i know that my victim won't browse me through microsoft and by the way they also had this user agent chrome 79 is not that old but it is about a year and a half old today we're chrome 91 since may so they might need to update it and also an interesting idea so they have windows as part of the user agent what if the target is an apple user if i have some intelligence about you know this organization i know they have a lot of macs so i expect the request to come from
something which with a mac user agent not a windows user agent i will never respond maliciously to this user agent right and another tricky one dns ptr record we all know dns mostly the a record which is the one which returns to you when you're doing a dns query like what's the ip address of google.com ptr code is the opposite of it it's what you get when you're doing the reverse dns query again with this nice animation let's say you get a request from the ip elite you send a query which might be as simple as ns lookup trying to figure out what's behind this ip surprisingly enough you get the corporate's home page or a
scanner kind of url it actually happened to me twice and then you can decide how you want to respond but it allows you to identify again association of an ip with a security product really easily all you need to do is ns lookup that's by the way one of the cases i had it it is a bit redacted but it was actually i think partially fixed even but when you see it you have a web page with like the sole purpose of checking whether or not a url is safe or not and they're probably checking it from the same machine running both the ui and the actual module running you know the link scanning activity
so easily detectable when you run this ns lookup thing you'll get the url of this web page like come on and they don't think it is a product it's a problem by the way i worked really hard to convince them it is a problem another nice case so yeah i had a request from this ip with this user agent remember osx 10.6 we'll go back to it in a sec so it is trivial to find that this ap is associated with the four as4444 four four point five times four but also if you do this ns lookup trick you will see it is associated with something which ends with mailcontrol.com what's the main control well it sounds a
bit like a male security thingy i did google and it is associated with websense which was merged into forcepoint etc so again this ptr record discloses the fact that this ap is something you don't want to respond to more even you can maybe scan the entire ip address space in this area and find more mail control systems just like you know doing brute force dns queries or something and remember this 10.6 so mac os x 10.6 is also a decade old os who's running a decade old apple os osx on their devices no one it makes absolutely no sense again inexcusable do some i don't know devops stuff it is a couple of python lines to automatic automatically
update this stuff to something which is at least reasonable to some extent but it is all is it all purely theoretical and here's the answer i promised cheap animations this is taken from a real phishing kit and the first few lines is get host by address it's php which sucks but we can kind of understand it it will actually do something which is equivalent to ns lookup and we'll check for stuff in a blocked words list like sukuri.net which is a brazilian firm doing mostly url related stuff that's exactly what i suggested to do the dspdr method also you have similar checks for some ranges of ips and some user agents all can be found in a gist which is
available in the url in this qr code again you can check it out on your own very simple surprisingly enough i saw it only on fishing kits with very little similar stuff in the world of trojans and banking throats etc let's go even deeper now tcp link clear maybe a bit of ip protocols which are much more oriented in transmitting data between two clients efficiently reliably trying to maximize the number of bytes you can transmit without losing any it is tricky it is complex i won't dive into tcp too deep now but you need to have some understanding of the protocol and some of its flags at least and the number one reason to use this type
of fingerprinting is os detection because if i know that i have a user agent of let's say windows but the tcp fingerprint is more like seemingly associated with let's say apple related os it is fishy so ttl between wind windows and the other two operation systems is different that's kind of well known is even ip not tcp but also window size for example it is very similar and even identical in most cases between modern windows devices and apple devices but in linux it is all over the place so it allows you to have some kind of understanding what's the operation system really easily or not that easily as well about the css also you have other stuff like mss which
is if you add 40 bytes it is the same as mtu a size which should prevent fragmentation um i have 45 minutes for my talk right yeah just making sure so mss and mtu it is something which should prevent segmentation sorry fragmentation in packets sent within your network different different mediums have different limitations all kinds of clever engineers try to optimize the size of packets and you have this value it is different between different mediums maybe you're going to see some value which is a an outlier associated with vpns or some cloud vendors have a very distinctive mss this should be very interesting and this might allow you to identify something which is wrong with
the client maybe this will allow to identify it as a bot and this is why tcp stuff is not that simple that's me sniffing wire shark of me sending random packets over the wire it is a nightmare to go through all of those fields and try to identify what's exactly happening there but this is why we have this incredible thing called poff again open source tool available in this link in the qr code it has tons of different signatures for tcp related stuff like windows sizes and mss and much more flags which i won't even get into today it is so easy to use very nice piece of code even it has like very very exotic stuff
like did you know that google had a very specific mss value of 1470 maybe there was a phd candidate or something doing research for google optimizing by 0.7 by 0.7 you know like the efficiency of their network by manipulating this value but it is a very very strong hint that you're actually facing a google related machine right now very interesting and you can by the way test all of it with psdf square passive security tools fingerprinting framework an open source tool like i wrote for this purpose again available in this link it is fully dockerized really easy to use it's just like a fancy hdp server i customized a bit it includes fingerprinting for all of
the different layers i've mentioned so far and also it is easily extendable lots of configuration files available in the ml i really enjoyed working on it okay so we've spoke about fingerprinting web fingerprinting and we've even spoke about you know what's link scanners and how we can circumvent them using fingerprints but what's next i don't know how many of you are familiar with this lovely video of steve balmer but in that spirit i'll just let go protocols protocols protocols protocols protocols protocols because that's the future of this area of research know your protocols and you'll have much more knowledge i've actually inspected 15 different vendors as i mentioned in my research and i wasn't like
required to go that far because i was able to evade detection from any of them but they might catch up one day so go and learn new protocols and see whether or not there's an opportunity in this area there are a lot of existing papers by akamai in this area i highly recommend going through them there's a nice research from 2017 by electricity and ori segal about one protocol there's also a really nice page called the cypher stunting i highly recommend reading it's two those are areas which should be explored if you want to improve your knowledge and about protocols which might be usable and the funny thing is that actors are actually more aware of those protocols
than security vendors if i need to fingerprint today an actor which is writing a bot it will be actually more difficult to me to fingerprint him than to fingerprint a security vendor writing something which should fetch http content over http actors are way ahead of security vendors in that regard or maybe we should go active but we've said it is noisy right so what about semi-active or maybe we should call it active by proxy or semi-passive what if i have a subscription for showdown for example and i proxy my request to check whether or not this client is kind of suspicious or not via showdown or any other equivalent service i am not sending any direct
request to show the to sorry to the client and just querying showden's offline existing database which should be kind of accurate and might be interesting for example i might see parameters or open ports which are not typical to windows machine which this client tries to look like and i won't be active in an active way i'll just be active by proxy showdown will be active but not me so so far we've been [Music] a bit kind of pessimistic but how we can get better we're trying to fix stuff not only to break stuff right sadly most of the stuff you've seen here today will allow you to detect any tool scanning urls i've reported it
to so many vendors most of them fixed at least some of the stuff one of them actually had some legal threats against me which is a funny thing but yeah but how we can make this stuff better let's try to be actionable first learn your fingerprint run my tool inspect it and try to understand your fingerprint or your footprint whether you're a vendor or a blue team know your limitations know how you can be spotted and more importantly even manage your kind of footprint by trying to understand the level of effort for a red teamer to detect you or for blue team to fix this issue for example free fixes you're a blue team you should never have
a user agent which is over a couple of months old of the link scanning components in your system any of your systems any of your modules it should look like a human not like a security product i think it is given but as it turns out it is not on the other hand some stuff are more expensive like going through proxies all day long so yeah i kind of it kind of sucks that microsoft scans you from their own ip range but is it actually feasible from the financial point of your technical point of view to exit from residential proxies all day long they need a data plan maybe which is huge and they might be forced into reusing
proxies which is very interesting maybe it can lead to them being detected so it is not that easy to exit from different proxies and also there's this area of bad practices you should never disclose the internal structure of your lab by misconfiguring headers inspect if you've even like inspected the outgoing http request once this would never have happened just like you know your tools manager stuff it makes no sense to just like you know forget about it and yeah let's have a ptr record which associates our link scanner ip with the home page of the company which was in some cases the the homepage was so obsolete that it never existed like for the last
five years i was able to find a wayback machine but it was so absolutely that there was no homepage it referred to it was just like a ptr record that they might have forgot about years ago that's negligence on the other hand understand that tcp is is a challenge and not everyone knows how to operate pof and to do all this correlation shenanigans um you've seen phishing kids fishing it's don't use it they don't need to but it is also possible that they simply can't you have script kitty and knowledgeable people you're trying to at least be better than some so maybe try to balance between the stuff you can and should fix like user agents for
example and other stuff which are much more tricky to conclude and this is a picture of my cat again just to grab your attention in case you fell asleep we know fingerprints are we know what web fingerprints are we know what link scanners are we know how we can evade them now we also know how you can face palm when you see all the different fails of vendors and we've seen some potential stuff for the next big thing in this area and also we try to fix some stuff and i don't think we have time for questions or do we have or i don't know no yes no okay no