
[Music] So uh first of all who am I? Uh we had a lovely uh introduction uh but uh very shortly I'm a data scientist uh at Kato networks. Uh Kato is a cyber security company that delivers a sassy uh platform. Sassy stands for secure access service edge. Uh which is the convergence of networking and security in a single uh global service. uh we have a global backbone of POPs uh and it's cloud native. Um I uh uh came to Kato networks after uh some uh work as a data scientist in diverse fields as mentioned uh and uh this uh talk will be about one of my main interests which is networks uh asset uh assets discovery. Uh so what uh is the
agenda for this talk? uh we'll have two main parts uh passive fingerprinting and statistical fingerprinting. Uh I'll start with uh general motivation focusing on uh the issues with uh IoT and unmanaged devices as a motivator. Uh then we'll dive into passive fingerprinting. Uh we'll start with some definitions talk about the setup of fingerprinting system, which protocols uh can be used. um a lot of examples and we'll summarize with the pros and cons. Uh this will give us the motivation to speak about statistical fingerprinting. Uh I'll introduce the concept of classification confidence which also uh will be necessary uh for data preparation. Uh and then we'll dive into statistical fingerprinting. Uh so again we'll uh uh
discuss some definitions and remind ourselves the structure of MAC addresses and uh then focus on uh the MAC lookup algorithm. Um uh starting with an overview we'll talk a lot about the preparation of the lookup file uh how classification uh is based on it uh summarize with pros and cons and uh connect it all together. So we have a lot to talk about. Let's start. Uh so as I mentioned uh our motivation is the prevalence of unmanaged devices. Uh we all know that the network is only as secure as its weakest link. Um the problem is that enterprises today are uh full with unmanaged devices. By unmanaged devices um I mean uh both IoT and bring your own
devices. uh these uh do not contain some agent and are not tracked by uh the company. Um our solution uh for this problem uh is of course uh our hero device fingerprinting um uh which enables us to manage our inventory, assess risk per device and um apply policies per device type. Uh let's dive deeper in and focus on IoT security. So why does it security matters to us? Um as mentioned it has become a very prevalent um issue. Um there are massive number of IoT devices that are spread across sites uh of uh companies everywhere around the world etc. Um are these devices pose some security weakness? Yes, they have with encryptions and uh security uh they are most of the time
aren't uh patched etc. Um are they commonly exploited? Yes, of course. Um uh they're used for for breaching companies and uh creating botn nets and uh causing denial of service. Um are these uh exploits um prevalent as well? Yes. Looking at the dark web uh there's a thriving market uh for such uh um hacking tools for IoT devices. Um so let's get the terminology uh uh precise. uh by passive uh fingerprinting uh I mean the identification of properties uh of network entities. Um by only passively observing uh the network traffic. Um this is uh as as opposed to active fingerprinting in which we uh probe the network um with crafted packets and uh analyze the
responses. Uh an example for passive fingerprinting would be uh looking at the user agent and HTTP headers. Uh while crafted packets could be for example TCP SIM uh packet. Um the disadvantages that we'll see with these two methods can be somewhat uh remediated with statistical fingerprinting uh which I'm here to to sell to you. Um we can identify properties with statistical or machine learning based uh uh methods er which are inferred from a large data sets. Um in uh this uh lecture we'll dive into statistics about MAC addresses and what can be learned from them. Um so let's continue with passive fingerprinting. Um um an example fingerprint um uh analogously to um the fingerprint of a
human being is some uh specific uh collection of conditions which helps us identify device. Uh in this case uh this is a fingerprint for Amazon Echo the uh smart speaker uh controlled by a voice assistant. And we can see that uh by looking at different OSI model layers at specific fields uh this device can be uh precisely identificated for identified for example user agent and HTTP using this regular expression uh some TCP uh option combinations and the time to live uh in the IP uh level. Um where does a fingerprinting uh system uh sit uh on the network? Um um well this depends on the network. So in simple uh networks uh the modem router uh has visibility into all of the
uh devices whether they're connected through Ethernet or uh uh wireless uh communication. Uh so our agent uh or classifier can just sit on the router. Uh unfortunately this isn't usually the case. Um in Kato we have about 3,000 customers. They have sites all around the world uh connected by a by the one and each land can be uh divided into subnets um for example like a Wi-Fi guest subnet or printers or workstations. uh so uh sitting on the uh router will not be completely u not give us full visibility for example we won't see the MAC addresses of the workstations this calls for um either putting agents on a more sophisticated um deployment so either putting uh
agents uh on each subnet or using um uh specific uh um agents on endpoints. Um when we have successfully deployed such a classifier, uh we have a plethora of uh protocols to um to use for classification. Uh as a general rule of thumb, um the higher we go in the OSI model, the more granular classification can be h but the easier it can be spoof and the less reliable it will become. for example uh as we saw with the user agent. Um and now let's dive into a bunch of examples which so show us which uh um classifications can be made at which uh granularity level. Uh so starting from the bottom up looking at the MAC address um if you
recall uh the first half uh indicates on the organization the OUI and looking at this uh this packet capture I hope you can see it uh if not I can zoom in but I think it's clear from the signature uh we can identify a dealink device obviously not very granular uh Continuing to the TCP IP uh layer and specifically looking for example uh at the uh first packet the sin packet in the the three uh part handshake. Uh we can uh look at the options which we already saw in the uh in the echo and uh for example the time to leave uh at the IP level. H this allows us to classify devices such as this Android TV. Um
it's hard to see but trust me that the time to live is 64 and uh these options are uh found there. um continuing to the um um DHCP protocol IoT uh devices when they connect to a network uh they want to get some configurations for example what's their IP what's the gateway IP uh the DNS server um so uh they send DCP packets uh which can contain uh very useful information um in this case uh this pattern can identify for us looking at the vendor class uh the life size icon uh this is the option number 60 and uh looking at option 12 we can even uh find uh that the host name contains like
a very specific name in this case the lightifi uh these are smart walls continuing uh with uh uh the application layer. HTTP is also a very common protocol used uh for the communication uh of IoT devices to their servers. Uh this can be uh for firmware updates um getting like time settings uh um all kinds of updates. So it's uh it's still very common to see IOT's using HTTP over the web and and in this case just looking at the uh get request uh we can identify for example this motion sensor uh when it asks for a firmware update. Uh here we look at the host uh and the URI and uh lo and behold the host uh
tells us uh the uh vendor basically uh the URI contains the specific model. Another example identifying uh the Samsung Galaxy based on its user agent. Um uh last but not least uh let's look at the the DNS protocol. Um devices uh anti device need to resolve domain names like every device and uh in their case they need to know uh the IP of their vendor uh server. Uh so they send uh DNS address requests uh which go through our router to the uh DNS uh server that is configured uh for them. Uh they usually have some uh pre uh configured uh DNS server on them uh which uh uh very help helpful to us and contain uh
very useful information. So let's look at this Philips U bridge. Uh this is a uh device that talks to smart bulbs, smart light bulbs and it is connected to the internet. Um and we can see uh both uh the bridge here and uh the Philips uh company name. So we saw a lot of uh protocols um but using uh these protocols alone won't give us the best result. We should obviously rely on a combination of protocols and this way we'll have much stronger signatures. Uh so um this combination uh can be thought of uh in in two uh ways with two methods either either looking at uh single flows or multi- flows meaning either using a stateless
classification or stateful one. Uh in this case we see a large combinations of uh uh uh conditions and um starting from the bottom up the MAC tells us using the OUI vendor uh the uh the vendor Samsung um the TCP IP tells us it's an Android uh based operating system and we can see in the user agent uh the exact model revealing it's a Galaxy watch. Um as for the stateful u um setup um we can start by u figuring out from uh the TCP uh packet that this is an Amazon uh device of an Android OS and then look for example at the DNS to understand it is specifically the Amazon device uh Fire TV.
Um so uh wrapping uh this section up uh let's uh further discuss the uh pros and cons of passive fingerprinting uh as a motivation uh for uh the next uh section. Um well obviously it's very simple once I configure some signature um it will just uh work unless something is changed of course uh and it can be very reliable especially when we combine a lot of protocols and use uh different indicators also um as mentioned compared to active uh fingerprinting uh it's much less intrusive we don't fill the network of our customers with uh packets uh and uh watch them. Um as you can see there's a much longer list of cons. Um but we work with what we got. Um first
of all as we discussed there's a limited visibility into the sub networks. Uh in addition it is not always possible or not every protocol enables us uh inspection uh of its content. Two um main key examples are first of all the subject of mech randomization as you all know mech uh in mobiles is randomized by default. We'll touch about it later. Uh in addition uh for example HTTP secure uh is uh limits us to the uh to perform the packet inspection. Um in addition as we saw traffic isn't obviously um 100% uh reliable and can be spoofed. Uh unfortunately uh as a rule of thumb um low granularity protocols um are the more reliable ones. Um
another uh issue uh is that we have uh to prepare signatures which are more robust if you want uh more reliable uh detection. Uh so it's more work for us. Um another issue um is the issue of scalability and this is a big one. Um just imagine that you have to prepare signature for every device every device ever created in the world. Of course you can auto automate some aspects we have AI to help us today EA for us less EA for best asset managers. Um final uh con is that uh this method is delayed. Um in particular uh when a device connects or even after it has been connected for a while we have to wait
for a specific packet and only then uh assuming this is the packet required for classification will we be able to classify that device. um these two issues uh um will uh be remedied statistical fingerprinting. Um so um I want to touch on the subject of classification confidence. Um obviously as we saw some signatures vary in our confidence um regarding uh them uh I won't get into formulas here but again uh rules of thumbs for how to calculate this confidence. Um first of all uh we will use a combination of different classification indicators. So by indicator uh just to make clear I mean stuff like the the user agent or the DHCP vendor class um we will need to
assign different weights uh per indicator uh strong ones will be given uh higher uh weights of course it's kind of an art um error class as mentioned is a strong one indicators that come from clients uh on devices are obviously given higher priority. Uh in general uh we will be uh giving higher confidence uh when multiple indicators uh classify a device with the same class. H and more important than that that there are no clashes between the classifications no contradictions. uh if there are contradictions we can uh obviously use uh some majority vote uh and set some threshold like for example we will only uh finalize this classification if 9 over 95% of the devices of the uh indicators um give the
same classification. Um another um erh um reason to raise the confidence is that uh a single indicator has repeatedly classified given the same classification. It's you will agree with me it is very uh strange if uh uh based on the user agent we keep seeing different devices. Um similarly uh we would like to see the same classifications over multiple attempts with uh for example complex uh signatures. Why is it so important uh to us in the context of uh statistical fingerprinting is obviously very important uh to begin with h because we can use these confident classifications as our label data. you have to set some threshold and classifications that are you know we're I won't say 99% confident
but uh uh you get the idea can be used as data we can perform statistics on um so this brings us uh to statistical fingerprinting um we will define it uh it's a very general eneral definition datadriven creation of signatures. But what does it does it actually mean? Um basically uh the creation of signatures using data uh can be uh put in simple terms of three steps. Uh first of all we have to collect our data set uh which contains some features of some network entity. Um based on this uh data set uh we will create um rules uh which uh which which are generalized um about uh unseen uh feature values um uh our
feature uh as I mentioned would be the MAC address. So we will literally try to predict devices uh uh device classifications for devices with un unprely seen MACs. Um the last step is to infer this classification and also calculate um some um confidence score. So a quick reminder about MAC addresses. Um they are built out of 12 hexas the uniquely identified network interface cards and they are assigned to them by IE E the Institute of Electrical Electronics Engineers. Um the first half uh uniquely identifies the organization and this is publicly available and published and the second half uh is basically just um a random or less random uh serial number. uh uh it is very important uh for
for cause it basically uh makes it possible to begin with to understand that manufacturers buy bulk uh max and these max are sequential so like uh Samsung can buy I don't know thousands 10,00ands uh max with sequential orders and as I mentioned although the second uh half of the address doesn't hold any meaning it is very crucial for us. So I I have given you some hints h take a few seconds to think about uh any idea for how would you use this um to classify a new device which you have not seen its mech before. So I trust that everyone has figured the entire algorithm but uh just for fun let's uh go over it. Oh I'm sorry uh
just one note um about macronomization. Um basically uh there is an issue with the uniqueness of max. It can violate privacy especially for for example a mobile device that travels around the world. we can uh geoloccate it. Uh the way to uh protect uh user privacy is by randomizing uh uh the MAC. Um uh this is done uh by default in mobile devices when they connect to uh uh networks. Um the randomization uh ruins our plans. uh we can't use this uh for statistical fingerprinting but it is very easily easily detected and we just exclude it from our data set. uh the second hexodimal give gives it away. And although this looks uh like a big
issue um and fingerprinting is limited for mobiles, it is still very practable practical for uh static devices. Um so uh let's now see uh these three three steps we have defined before uh applied uh to the algorithm of uh net lookup. Um so step number one uh we need to collect data. um data in this case is based on passively classified devices. Um for our cause let's assume we have as we do have in ktos on engine or collection of uh systems which gives us h classifications and uh their confidence. Um we will use this uh format for data. Uh we will have the MAC address on the first column. Uh then a few layers uh of
device uh uh class classification hierarchy. I'll touch on it in a second. Uh and finally the passive confidence. um important to differentiate it from uh the statistical confidence which is calculated differently. We'll touch on that as well. Um the next step um is creating some generaliz generalizing rules. Uh in our case we create a lookup table. Um it's it's uh it can be as simple as a CSV uh file. H this file will be created periodically. Uh we do it uh every week based on our uh collected uh passively classified data. uh while the uh uh passively classified data uh will be uh our source separate file and it will only uh keep on uh uh
renewing every uh ever some period. So, so just to clarif clarify our lookup table will keep on growing forever. Uh while our passively classified uh devices file is something that has some retention rate uh because uh new devices uh are classified for some period of time and we don't keep every device of every customer etc. Um the uh lookup file uh will be used uh for for a lookup for a search. Uh we will uh uh base it on the first few hexa decimals of the MAC. Uh I will call this submac. H as you can see on the left column uh we use a subme length for example of 11 hexes ending with six and
seven uh and then we use a subnac length of 10 uh the one ending with five f and each of these uh um rows indicates a different classification with a different confidence for a different uh value of a sub of the submech. The third uh steps uh is that when given a new address uh we uh calculate uh its classification uh by searching this file. So in this example uh when given something that ends with 6 uh obviously we would all guess that the correct classification is the first row. Uh how does how do we prepare this file? So as mentioned before we have to collect a large data set. Um the data set contains classifications uh with
different uh hierarchies. Uh in cap we use uh the class for example laptop uh the product for example MacBook and the model and the model like pro or air. Um uh this will be our running example. We then create a subme table which is uh which is based on just taking the original data and removing uh the required number of um hexas from the end. Uh notice that the more hex we removed uh the more predictions are possible based on this role. uh the more devices we will have the more uh subex we lengths we could use uh we then uh prune we will see a lot of pruning here and by prune I just mean
the leaf uh submex uh that don't have enough uh devices classifications for a specific subme length and value uh we don't want to keep uh unneeded uh stuff that is covered by other rows in the table Um and uh in this force pruning we require at least two out of the 16 uh uh devices for the sub length 11 and 25 devices at least for the submit length of 10. So this uh gives us uh in the following uh table uh well actually we don't prune any anything in the following table. uh we then explode uh meaning add rows uh to the table. This is done for every layer per specific classification. So uh for example the first row uh marked with
bold uh has three um layers. Uh this turns into three um rows after the in the exploding excuse me. The second uh row mark of italics uh turns into two rows. uh we then group uh these rows um by uh unique subnax and classification. So classifications are per layer. Um so and we can perform uh some aggregation in this uh case. Uh the goal of this is to be able to calculate the proportion of uh devices. Uh so let me go on this u carefully. Um we can see that there are three um unique classifications per layer. Lapbook uh laptop MacBook Pro laptop uh MacBook uh which will uh create the second uh row and lap laptop only. Uh
looking at the row I'm sorry the column of exploded classifications per specific submachine layer. uh for the first row there is only one uh corresponding row uh in the initial table. Uh and for the second and third row there are two uh uh corresponding rows after the grouping. Uh we also add a row of the the number of uh passive classifications for this uh specific subme. So this was uh this is based basically on the the the original data and in this case we have exactly two uh rows for for these subnax. The proportion is then uh the division uh between the exploded classifications and uh the passive classification. Uh so one over two gives us a half two over two
gives us one. Uh we prune again. This time we remove rows that have too low a proportion. So for example if a threshold for pruning is 0.6 six. We will remove the first uh uh row. We then prune yet again. Uh this one this time uh we wish to keep only a single results for each subnet value. Uh so if uh if there are two rows with the same proportion, we just remove the lower uh the lower layers which is just less granular. Uh in this example you see laptop, MacBook and laptop. Uh so we will uh remove uh the second row of only laptop. Uh there are of course cases where there are different proportion
proportions. In this case we will uh keep uh the row with the most granular classification only if the proportion is above some threshold say 0.8. Uh, of course we have to repeat all of this uh process for all of the subme lengths. Uh, we got something for our length 11 example. So say we get some something else for our length 10 subme. We then perform perform a step uh which we call neighboring. Basically means filling out rows between uh the known rows. We do this we'll do this for only for the shortest subme length the length 10 subme and we fill uh these intermediate rows with the most granular shared classification. So uh the the example
will make it uh much clearer. We have for sub length 10 these two bolded max ending with 55 and 58 and we just fill in the gap of 56 and 57. Uh we fill it with the lowest ch layer in this case the product layer. So the classification is laptop MacBook without pro. Um now we uh concatenate all of the subme length tables. uh we place the longer subx on top uh to sort them first. They can contain more granular uh classifications or it is more more likely that we will have enough devices for that. Uh so we just put the plate uh the length 11 uh before the 10 and we prune again like pruning.
I swear this this is the last time I think. Uh in this case we remove uh longer uh subnetss uh in case they are uh covered by a shorter subnets meaning that the shorter subme can classify the same the same thing when we look at the same uh classification. So uh let's uh look again at an example u um in this case in the first case we delete the submake 11 uh and in the second case because there are different uh classifications there's the pro there keep both of them uh now uh how do we classify we literally just search and find the first uh uh row that matches the prefix of the new Mac Um
so if our final table is uh as uh seen in this example and we're uh given something that ends with six something uh we will take the first row. This is the first match. Uh in addition we will calculate uh the confidence uh similarly uh uh to the case of the statistical uh fingerprinting but based on different uh uristics. So what would raise our confidence? It's written but don't look think about it for a second. Um first of all the subme length uh obviously the longer it is uh uh the mech by which we have made the match uh the more confident we are. Second is the proportion when more devices were used to create this uh uh
classification we are more confident. Uh and another point uh is uh did we classify uh using a neighbor? Uh if we did, we should lower our confidence. This is only relevant for for the short for for the shortest subnap. Um a couple more points about the classification. So first of all we have to integrate um both engines of passive and statistical classifications together based on the confidences. Um uh I won't go into this too much but uh I'll just mention that if there is the same uh if there is the same confidence we will uh rely on the statistical fingerprint in case the classifications are uh different. uh how does this erh um how how well does this work? So in Kato
uh about 6% of all of our uh discovered devices uh originate from statistical fingerprinting and here I include mobile devices and I uh also take into account that other classification engines can override uh these conclusions. uh how do we improve uh upon this algorithm? Well, basically we need more data. Something that's every uh data scientist will uh will tell you in every conversation somehow. Uh they'll always get to that. Um the more uh fingerprinted uh devices we have statistically fingerprinted um the more uh uh confidence we'll have uh etc etc. And we could add even shorter subnet lengths, subn 9, etc. Uh, one small note, this is not a very efficient search. So, turn your CSV into a dictionary. Um,
what are the pros and cons? So, first of all, we saw that this helps us increase uh coverage in cases that are harder for passive finger printing. um sanctable we saw how the scalability issue is solved. Uh we uh can just run this algorithm. We don't have to do anything. Uh there's a an airflow dug that runs periodically and uh the system keeps improving. Uh it isn't delayed the moment uh I get some mech. I don't have uh to look for any other uh packets. I can just uh run the lookup uh classification. Uh we can use a single uh feature uh and uh it is uh uh is not intrusive as active fingerprinting. Of course there is the
one main con which is also just a requirement. we need a sufficiently large and uh sufficiently representative data set of confidently classified uh devices. Uh so summarizing uh we saw the challenge uh in IoT asset management we understood that it is uh very crucial for security. Erh we uh discussed uh so many examples of passive fingerprinting uh so how uh simple yet reliable it can be. Uh in addition uh if you want um good fingerprints use multiple protocols. Uh we then discussed statistical fingerp printinting uh which can uh join forces with uh passive fingerprinting and overcome some of its uh cons. Um uh as mentioned a moment ago of course we need uh enough data and reliable data.
Erh but uh given uh that data uh we have a long list of pros uh uh that make it uh worthwhile. Um thank you
Thank you so much.