
okay so i'm miller thief and with me is albert calvo and we will be presenting open uva it's a framework to do some user and entity behavior analytics and as is the name it implies it's an open open source framework but before we start uh a little bit of introduction of where we came like we are working on i2cap it's a foundation for non-profit innovation center so basically we do research and innovation on a lot of topics and well these are mainly our areas of of research like we have a lot of 5g cyber security artificial intelligence blockchain but we also do some other kind of projects like virtual reality or right now we are delving into the the
new space area and mainly these projects are motivated from public collaborations with administrations within the european commission or by companies we try to expand as much as we can and foster this kind of innovation ecosystem so the technologies can reach all places and well this is a little bit the team that is involved in this project as you will see i don't look like on the picture but it was taken a long time ago and now a little bit about the the project so open uava as interesting price it's open and it's one of the the keys of the of our project because we've seen that sure there are a lot of ubi solutions and projects and basically
almost any technology is integrating uva technologies but it's always into this kind of black box where you don't really know what's going on inside you don't know how anything is being calculated they just tell you it's it's using artificial intelligence and that you should just trust it but it's very hard especially in the cyber security context to to really assess if this output that the ai is giving you it's actually correct or or not because it can actually lead you to to misinterpret or to apply the wrong mitigations or the wrong solutions to your problems so for us it's very important also because since ai is being kind of the keyword that floats every every product every technology
everything right now is with it working with ai and being able to have it exposed and that anyone can just go and check the code see what's the logic behind the logistics and how it actually works it really helps building our trust with the output of this ai plus we are also integrating the cyber fire intelligence key that way we are kind of deviating from the the usual model where all these tools that usually works with ai they just focus on detecting anomalies or seeing variations of on a specific time scale or even just implementing some strategic rules but we go beyond that and insert this part of the intelligence that we will see a little bit deeper later
and now to to talk about the high part i will leave the floor to my colleague over okay so thanks nick so i will present a little bit the framework that we are working on [Music] okay so mainly the freckle that we presented today is focus on user and entity behavioral analytics which is targeted on monitoring the user behavior to detect anomalies activities and highlight malicious patterns so in marketing behavioral studies has been done for years so based on the analysis in which always the customer stops for for example is it possible to predict if the customer will purchase a certain emitter or or we can offer them personal asset discounts this behavior and analysis can be
applied into the cyber security domain analyzing the the baseline user behavior we can highlight deviations from the normal behavior so let me give an example if an user sends a periodic image to all coworkers it could be a normal activity saying it is from human resources but if another user for example from the deep team has this kind of activity it can be flycast anomalous to this end it is conceptualized to build a framework for behavioral analysis mainly the framework is that is divided into three different components the user profiles well the user profiling so the multimodal sources are joined into a feature vector represent in the user behavior the thread profiling so from the
different thread intelligence sources are extracted indicators which represent the different attack vectors and finally the risk calculation with a compute alignment between the user profiles and threats so in detail the main benefits of our proposed framework are the following the efficiency so the ability to scale up from small scenarios to complex institutions with thousands of hundred entities transparency by design so the framework is built using the principle the principle of transparency so to this it will be open source allowing third parties to modify the software as they need are standardization so we use sticks and capex frameworks to represent and share thread data
okay so to build and evaluate the of the opening with uva framework we have access to a real life data set in detail is data from a local university with our own around 20 000 users including student professors and staff in detail to perform the first version of framework we take into account the following multiple of sources the network data which is information captured on premises hardware informed data which is information captured using ganache and application data which is log data or forward the data from asia
okay so the user profiling steps allows the extraction of relevant information from sources and structure them into future vectors so here to perform these analysis we use autocoders which allows us to build representations of the different data sources and after that are merged into the same filter space these representations allow us to compare the user behavior at the certain moment against the historical behavior and also the comparison against similar users in the organization so next or next model that we are presenting is the thread profile that is started to extract and structure knowledge from threat intelligence sources here what we want to compute are rotations which are deviation from node threads so we compute these mutations using
historical data from certain intelligent sources so for instance the indicators of a phishing attack are the following the user receives an email with an attachment then the user extract and open our document and finally enable some macros the proposed votations know the skill chains are the following so they change they check use a similar indicator in this case it receive an excel or transformation two where some optional indicators are missing and finally the transformation tree where some parts of the atom vector are replaced according to historical indicators well so the final step of our framework it it could be the the alignment so we align the user profile with the different attack vectors that we compute in the thread
profiling model with this the output of our framework is a score where for each user it is calculate the exposure to mean 3x as well as we can see here an example of the output so so well so no need will explain an example of how the fever threat intelligence can be applied to behavioral analysis so nil the floor is yours [Applause] thanks
yeah so how do we apply this cell intelligence into our platform because there are many ways of consuming certain intelligence right you can just just take your diocese and fit them to the cm or just do reports but in our case we are interested in more than just iocs we are also interested in the in the tdps and the indicators of attack so we to detect this behavior we go one level higher on the on the pyramid of pain that way even if the indicators of compromise change we can still correlate the same behavior of some pattern or some attack to the same kind of reactor or even to detect similar behaviors so in this case uh we gather ultra
intelligence from multiple sources it could be like five hours or the future and this kind of premium sources which have a high high reach intelligence about how so malware operates or how a campaign is set up or even how other doctor kind of works and all the tools that it uses and with this we use it to open up a using using sticks as format and when we have it on open uva you we use it to do the set profiling that way we can perform a model of the thread that we are trying to to match with the model of the user profile that it's being monitored that way we can see if the activity of one user
kind of resembles or it's similar to a specific threat and of course the way detection of monitoring usually works is that whenever you see an indicator or on your environment then you just raise a flag but in here we don't really need to to have the full spectrum of the attack or even have all the steps being successfully seen this way if we see that some user has a behavior that kind of resembles some attack like maybe he's just performed 80 percent of the actions or 60 of the actions that the threat or the malware or the fishing campaign usually does we can already assess that this user has a high a high risk towards this kind
of threat and we can even use the our knowledge that we have observed of this user prior to the actual events that we have seen to enrich with further observables our previous intelligence right so if we've seen that some phishing campaign usually uses certain domains but in this case we see that the user followed every step but instead of receiving a mail he got into this introduced domain from a team's link or some icon the limitation we can also add this into also intelligence and enrich it even more and well even further we can add also mutations to known protectors using iocs from other feeds like you would have on your miss or your taxi or your
yeti or any other platform that uses this kind of fit aggregators to to benefit from different religions and well let's see for example in this case we have the behavioral model of when i cry i hope you can see it properly but for example in here we have detailed every every step that when i cry is usually performing so we see the name of the services that uses the name of the processors that it uses which commands is using with each process what kind of actions each process is performing and what kind of traffic these when i cry it's generating so you know rotation of cm or antivirus or any kind of standard solution we will have to detect
something specific from these three but in our case whenever we see a certain amount of steps of this workflow being actually processed we can we can already raise the flag and said hey this user it's kind of behaving like when i cry you might wanna look into the further and even if tomorrow this model were to change and instead of using a specific process or instead of using summer traffic it were to use another protocol we could also just uh tweak the this model very quickly and that or we could maybe even detect it directly from our environment and already uh enrich it further into into our system and of course as everything this can also be
modeled into the meters direct framework this gives us and huge context to to be able to actually track and map every single of these of these steps into like a proper framework and this way it's also very easy for companies and for pretty much everyone to to integrate and be able to extract context out of every single step right you don't need to to know exactly what everything is actually doing like you don't need to know the exact command or the exact process what it is but you already know that it's related to a specific task of the metre attack framework and this way you you already have a very clear view of what's happening
and for example this could be uh the same information as we've seen before but in a graph like in this case we have a very a very standard or a very small sample of the im ddos model so this was a us service that was running it was pretty popular like 10 years ago but it kind of uses a pattern that it's adopted by many other dls service nowadays so in this way we only have for example the generating to some c2 traffic that it has a specific threshold on the traffic and a specific threshold on the on the instances and we could have some indicators of some effective host but translate this to the behavior
if we see some connections falling within these thresholds and following this pattern we could we could already say that it's similar to this and start enriching this model even if we if we start from something very very small right [Applause] and in the end like this could be a representation of some tool or some malware in this case it's poison ivy that we have a lot of intelligence probably on this one it's one of the most well documented malware tools so we have many many indicators we have a lot of information about it and this give us a representation of what we should see on our network and of course we will never see like all the
all the hashes or all the samples because is not realistic right to see it but maybe just seeing similar behavior to this already can give us a pretty a pretty good understanding of if the user eats or might be at risk of the kind of tool and now we will speak a little bit about the ecosystem that we are building so this is as we said before this is not a specific tool or kind of a script or a software like you can just install this is a whole framework and in this case we're using the elastic stack with gracken slot manager and on the ai engine we're using flask canal flow chaos many ai technologies
for the intelligent platform is using opencti to store all those intelligence information but this could be exchanged with any other tool that you use for for this purpose as long as it uses the same kind of format on cell intelligence if you can handle sticks you can plug it into anything else for us elementary for example we're using open mac enterprise which is from a company named open cloud factory that we're partnering with them to develop this framework well also since aiu really needs some historical data to function properly we're also developing a ticketing system collector which would extract doing some natural language processing will extract information from past incidents that you have on tickets
that usually mostly getting systems just have the information on plain text and you don't really have this kind of a standardized fields or in case you do with like the hive then you could fit this data directly to the core note that it's actually kind of the the keystone of the of the framework and the one that handles all the analysis being done between the ai engine and everything stored on the log manager and well this is just a little preview of the ui what we could see like just logging into the under onto the framework you will see already like the the group of users that are the most at risk the threats that are affecting you the
most uh a little bit of threat clustering of your the current threads on your network also another way of searching for different threads or different groups within your network here when checking a specific loop we would see a graph of the activity of the group of users how do the relation between them what's the relationship between other groups of users or between the users themselves and a little bit of kind of a fierce glance view of what's the state of this group of users and similarly we would see the same for for slides like which group of users are most affected for this threat and see multiple kpis see graphs of activity of the threat and how the
threat behaves
and well basically right now we're also trying to build a community based upon this open model since we see that one of the main issues that we find when trying to work with like solutions that use ai from the cyber security domain is that the people specialized are specialized in artificial intelligence of course have no idea of cyber security and the people specialize on service security have no idea of artificial intelligence so this framework is also a tool to help both both sides to kind of join and work together and since it's open source both both sides can see and try to understand what's being done on on the other side uh well this is either contact reference
if you wanna you wanna try this framework and schedule a proof of concept with us or directly join our team we're always looking for people not only on the cyber security domain but we have many other areas where we're looking for people or even if you are doing some master's degree or some phd and you wanna perform research with us you can even send us an email to contact at opengba.org and this will be it for the talk thanks for everyone for staying here if anyone has any questions we will be very happy to answer anything yes great thank you uh neil and albert uh well we give a couple of minutes uh people again attendees uh
or people in youtube that want to send you a question against luck or hear zoom um i have a question actually uh this is something interesting and actually it was an area of that i was investigating uh back then uh i was actually wondering you know you say that your solution basically requires a network agent right on top of uh you know an agent like osquery that runs on a computer right um you know how would that work in a in a cloud environment where more than again an analyzing uh user in the sense of you know physical users but you may want to analyze service accounts or users within servers but in a cloud environment like again
admin or routes activity or accounts that are you know not proper human beings but uh we and then again i mentioned in the network uh i was mentioning the network part because i feel like that's perhaps the most trickiest part in cloud to get the kind of information so i'm not sure if you have this use case if you thought about that and if there is any uh i know any way that we can do the same um i mean we're not treating right now this kind of use case but the agent and the sensor it's like one of the the main ways that we expect information because it's kind of our own means but
the idea is that you can you can input almost any kind of information about user activity that you have on your system like the same way you would do on your cm if you if you can analyze these logs on on the cm you can just forward it to the elastic sack that we're using and we'll perform kind of a similar analysis maybe you will have to do some some feature engineering to alter the fields that you're using to to comply with the elastic common scheme that that we chose to use to keep on the standards for for elastic stack but but yeah i don't think it would change that much since the the same models can be applied
to to kind of uh i don't want to say everything but but in this in this context yeah it could be it would be the same really because we're also doing not only analysis or on humans or as users but also entities within within the network that could be a server account or maybe just a script that is doing the same actions every day and someday you see deviating for some reason i don't know if this answers your input no no yeah no no absolutely absolutely again my era of expertise back then my research was more about how could identify um you know behaviors in servers mostly from admin activities right uh because you know if you know i get someone into
my server as a root accounts i mean i understand that a legit admin would perform a typical set of actions right i don't know running a specific set of commands they could be ls pwd whatever right but if the same user start using you know and then i cluster together those commands and say okay these commands have some kind of like weight and i was like okay if you use i don't know pwd ls and you know you name uh you know uh sorry in id or um yeah you know in a specific time frame it has some kind of like it's a behavior that i may be okay with and i assign like a weight but then
if in between someone run on a ps run like a netstat or like a nc then i definitely wants to flag that i really want to fly there because it's kind of like the admin is going outside the behavior that i would expect um and therefore yeah i mean and that was mostly about but this happens in the cloud right and without having pretty much no network visibility uh it's quite expensive to have that type of you know data in the cloud so again i was also figuring i was thinking if you know i i guess i guess what your answer was about is that as long as i comply with the format in the in the elect in the you
know elasticstar stack the model is trained to work on that data so as long as i provide them that type of model i should be good you know in anything right so then at that point would be me to create some sort of like transformation data in layer in between right to then prepare the data for your platform if i understand correctly then yeah because in this case you would detect it with a simple anomaly model yeah yeah my one is a very yeah mine was a very simple use case yeah our point in here is mostly focused on not only doing this kind of detection on anomalies on user behavior but also in introducing the intelligent aspect
and mapping these behaviors into stress behaviors to also give more context and also this is something that's something overlooked but in real world scenarios you rarely see or have all the logs and all the data that you would like to have and even if you are actually logging it most of the time it doesn't reach the cyber security areas since forwarding and storing locks it's really expensive and this way you don't really need to see the whole pattern or every indicator to assess that something is a specific thread if you see that the behavior kind of resembles what you are expecting to see from a thread you can already tell that it could be affected by it
all right yeah thank uh thank you guys i don't see any questions so far so what i would do is again if you still have any questions you can write me slack albert and neil are there i would i would thank you guys again for having this talk it was cool interesting i really enjoy it