
thank you [Applause] um hello everyone um so first he has presentation for me so I guess we'll see how it goes uh if you guys do want to follow along yourself this is a totally non-sus QR code that will lead to my my GitHub and there is a PDF there with all the slides and my notes there's also a human notebook there so you can have a look at the code but don't look too happily please um originally I put together this presentation velocity as many sides so I didn't have a look at the color it again until uh this week and I was like oh [ __ ] uh so who am I uh I call myself an
unqualified Gardener I'm I unfortunately found myself in the niche of sort of working with themes and seems only basically so um I know a lot about getting weird esoteric data and putting into formats that I never really thought of you know getting a binary data from a HMI that's been out of service since the 90s and somehow being able to shut that into a system that we need so that's that's where I found myself now I work for the tactics Center um just as a consultant currently work on autonomous trains so that's kind of cool um proud member professionals Australia Union talking tomorrow it's also a member look there's a lot of layoffs in the industry
at the moment we're in Union that support I.T cyber security workers you know nccs just dropped a whole bunch of people just something you're interested in I'll speak with me and as Michelle says uh why get back so uh why should I care about metadata this really comes from a conversation I've had with my mom last year in Australia Day um and you know it's it's something that I think a lot of us are very familiar with it's like okay well you know there's there's people able to track what we do by using the the data around the data that we have and it starts from 2011 and stuff about Target tracking people's uh purchases and being able to
identify okay well uh this person is likely pregnant before the person even knows and that's you know a decade ago and then your Wikileaks Etc we also understand these issues but we want to relate that to my mom it's kind of hard and especially when it's like a theoretically you know these type of things are possible but um really what can you do with it how can I do this and then uh fortunately or I guess unfortunately um a big one event happened last year right through Russia and Miss conti and this isn't a threat Intel profile of crunchy but you know there are some kind of service group active since 2011. when when their
Invasion happened they said yeah hell yeah we support the Russian government with this and then evidently some of their members didn't um left all of their data but they presented us with this wide range of data which really gave us gave me the opportunity to play with data as I wanted to play with the data and so um it gave us about 168 000 uh unique interactions in uh some chat logs which they which they dropped and that's 168 000 interactions between 465 actors and that's a lot of data for any hours to have a look at huge task but if we could model the metadata about that data it gives a place to start and then Target
our analysis ethics for the future so diving into logs an example of the log there is actually saying hello is this Patrick um but it provides us with enough information to start with our modeling that we want to have a look at because it's a timestamp that gives us a couple of actors who are working from um to and for us and we can use this to have a look at the um the relationships between the actors by looking at relationships between the actors we can start building node graphs and no graphs gives us a lot of information good things to go down a chain no grass to use for all sorts of different things in our lives
um Google Maps I do have references at the end you guys can check out uh page Rangers there's no grass the little Linguistics I think there is quite unique it's for tracking new words used by schizophrenic patients who aren't talkative and what's the most likely other word that they're going to say depending on who they're talking to so there's tons of different uses for it and some uses that we might be familiar with in this room are you know blood now blood element is no grounds to look at the relationships between different domain objects people have shitloads of same data you can build yourself with no ground type of understanding of what's going on in your network
basics of node graphs uh let's just say Alice Bob Cody Dan Herrick girl at a convention right uh Alice shakespot Sam uh Cody shakes hands with both Bob and Dan and Eric Shakespeare
between some of these people and we can do uh directed graphs we show dependencies we can do weighted Graphics show okay with how long did they shake hands or depending on the data you're collecting depends sort of on the amount of time that you have to sorry the waiting information can be added up so for myself uh some of the tools that I used here three primary tools are python because it's better than doing this in Powershell and you might laugh but that's kind of what I had to do for a while in the whole job because we weren't allowed to use Python computers so I've been doing all my data analysis Powershell like that this one
specifically that's a ball python they're pretty cool uh pandas platform package allows you to manipulate data if you don't know what it is I highly recommend it you can take stuff from Json put it into XML or whatever you can push everything to a standard useful format and then Network X which is a doesn't have a cute animal and they should be ashamed of themselves and this is a package made for basically drafting um complex networks right so we've got a ton of data um basically let's get modeling I allocated myself about 24 hours to get cracking on this project it was totally not because I had a university if I'm going to do the next day so I figured
okay on right let's crack a Red Bull at home uh [Music] okay so uh 18 hours until 20 24 hours I finally complete the work I should have done before starting modeling and my housemate apparently crying in the shower so um just a quick tangent lesson for people and someone over here asked a question to Tristan before about sort of the work you should uh the process and things like that if you aren't familiar with Chris DM this is when you're dealing with data of any kind check out this model it's got a lot of lessons learned but how you take data you've put it into something really what we what I should have been doing before
going to data modeling um no I tried to just jump straight into it is going through this process problem three key steps that you guys should understand choose the right Donna understand your data claim your data collected data Hub applied explore uh describe uh understand it make sure you know what you're talking about and then finally uh clean data has always been like this task it's getting rid of all the junk it's making sure you you know what you should be looking at as a side note there is a standard adjacent it's a ISO 2177a developers don't follow it just make their own [ __ ] up that looks my case I should be shot and I'm not taking
any questions on that foreign
[Music] Network decks our first one is okay identifying the node for the type which we're saying okay everybody who's who's to and from there are unique actors their aren't knows that the people who are important and then we create grass sector and then kind of you know we want the people that have had the most conversations would be bigger uh circles so that's something at the bottom there um steps seven eight and so great now we're gonna have a pretty graft and it's going to give us all the information that we ever want and
yeah perhaps with this many knives really becomes tricky to purely use as analysis tools the magical answers that you really think you're going to get into when you start graphing like this uh aren't really gonna you're gonna get from those nine loads of codes but there are some small tricks that we can look at um we can manipulate data there's tons of things out there and this is before chapter came along so I was trolling stuck over for hours trying to find the right things that I was supposed to do but it's a lot more easier now and now we've changed into sort of a directed graph with okay we still have our larger nodes that are important but we also
have a weighted lines about how many how often people talk to each other I mean we can clearly say that okay there's a lot of these tiny dots which are actually relevant we have some big players who are significant assets we should look at we have some people in in between as well who might be team leaders or something like that no but uh racking is is really just the means to an end what you really want to do when you're cracking stuff when you're looking at this amount of data and having our assessment of these relationships is you want the statistics and you want to understand centrality uh centrality tells you who is important in a network
and this gives you the starting location if you're going to conductor analysis it doesn't matter if you're doing your analysis on a group of people are you doing your analysis on uh domain objects or you're doing analysis on network traffic or whatever decentrality statistics will tell you hey this is probably a good place to start and then I can build my uh hypothesis from there the grave represents their exposure that a node or an accurate and the person will have on a network I it essentially is that opportunity to directly influence the rest of the network Behavior closeness that really shows uh whose diffuses information to the network essentially between us is a good show of
informal power so if they're sort of in between another thing it's really good for condense networks and this is where you start getting some really heavy algorithms that we don't have to worry about because we have pipes packages that do it for us but that's really a great way to show another way to show informal power looks like is this so here's a semi-complex network the the node with the highest degree the most connections that's the red node there the the node that's technically closest to all other nodes that's the yellow node and the purple node is the one that's in between um those people have to travel through if they're going from one side of the
network to another so if I was to have a graph like this with some data these the the nodes that I was looking at to Target my analysis and maybe okay now I can go have a look at these people's chats now I can go have a look at what database and nodes are generating that type of stuff uh got our statistics again trivial because we just let apart from package do it for ourselves and I think about maths and we come out with uh some statistics like this now this is uh from the actual County data and we can start pulling up stuff about like the gray closeness between us um there's a few nodes with very high
Authority uh the degree you can see that as you can see Defender and Stern huge Authority and margins over everyone else really that's showing these are the highly influential networks they're the center of information these are probably the bosses of the of the crew it also shows the network probably isn't highly resilient you know if you might have been a law enforcement person looking at a starter if you say okay well these are the noise that I could whoops sorry these are notes I could take out uh to try and take down this network um the closest kind of starts middling but um trains down rapidly and kind of shows that this um there is not a lot of regular contact
across Network right it's probably highly in sales where these individuals here have control of their own groups interestingly it doesn't look statistically significant but I had to cut it down and figure on PowerPoint but there was like nine um digits there but you know eventually jumped up compared to his degree and we have a professor coming in and why are they more important um when they had lower degrades and that's now something that is you as an analyst could say okay these figures don't match up is this someone I should look at are they important maybe they serve as middle managers of some kind maybe they have larger teams than other people that aren't as important though
and then between us The Defenders said basically blow over and out of the water this is this is a network where these two individuals have control of everyone else and no one thoughts or anyone else right to go from one side of the network to another you have to go through DC people so already we're starting to categorize the quantity group in understanding well based on these statistics we can we can see what they're all about more than that we can start looking at trying to find communities within a network this graph pretty clearly has three different communities right but it's a little bit harder when you have a dense graph with 460 nodes or
so you're going to try and do a graph like that it'll look something like this I've pulled out all the names because it made it really confusing so I've only included the top eight third degree based on what we had but it's um still gives us some very interesting data because we have so many nodes Community detection is is difficult to apply and what community detection looks at is okay who does the Note mostly speak to who do they mostly interact with who do they um mostly talk to and they're allow us to say these people probably have the same team same community so um algorithms don't work well when you have sort of low for trainers and that type
of stuff so we can start saying okay as an analyst what happens if I drop certain notes what happens like Drop Stern or Defender from this list how does the community shake out well I start to see more communities of lower ranked individuals congregating together because uh Stone Defender heavily interact with every other no it's really shaking out the data um Mangrove and green uh you can see at the top uh the only ones who aren't either too poor groups so you can see Defender off uh buzzer they're all part of the same Community Stern Bentley refers they're all part of the same Community but we have mango and green at their own ones so that sort of lends
heavily to the theory that we put together before that hey these are middle managers probably they probably have their own teams that they managed right but because the high Authority for those few other individuals it meant that they're all part of the same sort of leadership group and then there's other interesting things okay we have orange um I'm not actually sure how well that color shows up in the back but there's a couple of orange dots out there which don't have uh a name for them right why are they there um who's important in this group and that's another um thing that we need to start having a look at to analyze uh fortunately as I said I um
I had the store for last year and then since then there's been tons of analysis on this uh on this topic so I went through I had a good credit on security did like a four-piece massive deep dive into Conte uh four Scout checkpoint rapid seven all release in-depth reports on on the data from the legs and these are some of the um the takeaways from it and we can see that our statistical file analysis based on statistics and on the no shakes out pretty well with what people having a look at when they're actually looking at the content of the chat right Stern identified as a leader of the group described as a CEO right Defender
another senior member described as the CEO and who works offside um Stern they handle the finance Logistics Etc these two are our leaders finally buzzer mango High Authority individuals likely um shown to be sort of middle managers with their own community So based just on what we looked at at a high level with a couple of lines of code and a couple of statistical things ignoring the crying in the shower um we were able to you know identify hey these are the important individuals without going through all the effort of translating all the chats and reading through every single one right and we could have said Okay I want to know more about these two leaders I've come to go
through and just translate that Palm it off to one analyst and say Hey you can deal with that problem that's yours now my work is done okay so if there's there's any takeaways from this what I really want you to think about is consider these these no grasses and having a look at the metadata and things like that when you're building your own anilingual methods right consider when you have a ton of data the best thing to do is just play with the dog right you want to explore what you're looking at hard to remove chop change try new things um you quite often you'll find yourself hey you know I have all this data I'm
coming up with a conclusion well what happens is you remove the the key data for the conclusion what's the next thing that comes up if you're going to grasp stuff centrality is the most important thing not looking at the actual graph centrality gives you the metrics um for you to conduct further analysis you remove key nodes what units pathways are going to discover what new communities might you discover what key individuals are showing up in the statistics that you can pull together and then finally trying to understand the communities within the group you can tear it apart you can find okay who's the core of this group what outside communities are there and you can take
these little packages together and consider how well this is going to be one hour scrub and this one's going to be another Analyst job really easier way to divide the work and rather than going through all the data itself references again if you were great enough to do QR code this is under um if you want to have a look at that honestly as of the 24 hours theme wasn't alive I learned how to you know go through a do all this in 24 hours you can imagine people who if you are concerned about your metadata is their entire job is to do that what they can know what they can Discover right but this is also also
something that's very easily accessible due to the fact of how much work is done by the standard packages out there these days foreign ERS have been enlightening for some people it's at least for a new technique or a new tool that you could start incorporating with your own workflow are there any questions
yep yeah yeah communicate
um I'm sorry I don't know widely what what other groups are doing the sort of purely data analysis exercise into this group sorry yeah I think it's important to understand the the fundamental assumptions the analysis you're looking at sort of suggests that the CEO is perhaps it might be the other way around really the CEO needs one instructions
those Communications get more chat yeah uh now that's a that would definitely be impactful for a lot of things that we say and that's um you know the data doesn't always tell the full story the students don't always tell the full story so you could say you know you would do the graph you pull out the centrality and stuff like that and say hey this rather than saying okay this person's the the biggest million here so they're in the past right and you say hey this this node is has the biggest person they are someone important in the network they might not be advice there might be a technical leader there might be some sort of SME
or an expert or stuff like that or they might just simply be a middle manager and get their instruction from somewhere else but now you have a Target to focus on and you can focus your analysis on that rather than trolling through the data itself yeah it's very cool there's plenty everything into it and it will actually points
yeah um we're becoming you know a more and more connected world there's tons of data coming out there for those of us who've been security operations and collecting data from different places um you gotta you know you're getting buckets and buckets of a understanding these techniques and tools which might not be built into your Standard Security tooling and packages that you can incorporate into your analysis and into your security workflows I think is important for everyone and I really hope some some people have found something interesting that I could find time to look into it anything else or I'll let you go to lunch yeah
at the time when I did hear original data it was all in Russian and I didn't have uh the the effort to go through and actually do the translation which is why I was focusing on the metadata itself to say okay if I wanted to do photo analysis on this who would I focus on but now there's tons of good health repos out there where people have already gone through and translated it so if you wanted to you could do that analysis so all of them all right um thank you thank you