← All talks

The Contemplator Approach: A Tale of Data Enrichment

BSides Las Vegas · 201952:4415 viewsPublished 2019-10Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Rodrigo Brenes and Pedro Rodriguez explore practical data enrichment techniques for security analytics using the Elastic Stack. The talk covers classification methods (format intelligence, labeling, correlation), logstash plugins for enriching log data from firewalls and network sources, and real-world dashboards demonstrating enrichment in threat detection, compliance monitoring, and behavioral analytics.
Show original YouTube description
CG - The Contemplator Approach: A Tale of Data Enrichment - Rodrigo Brenes & Pedro Rodriguez Common Ground BSidesLV 2019 - Tuscany Hotel - Aug 06, 2019
Show transcript [en]

all right good afternoon this is common ground it is a 3 p.m. in the afternoon it besides 10th 10th Las Vegas conference just a few announcements before we begin first of all as always you have to thank our sponsors along with you by participating make this possible this year our inner circle sponsors are critical stack and Bala mail and our stellar sponsors include a number of great companies but to pick 3 we will thank blackberry silence and paranoids so it's before we begin just a reminder that these talks are streaming so if you could make sure your phones are off if you have questions make sure you answer the microphones and thanks for participation please welcome our

speakers Pedro and Rodrigo I guess yeah cool so welcome this is the contemplator approach is the TEL of Baden richmond throw elastic who da vida I don't know if you know that but whatever that means that we are from Costa Rica so yeah we like to feed our crocodiles and this is Pedro yeah Pedro so graduate from the tech Costa Rica & Technik c2 technology and I'm Rodrigo I'm Latino me Dark Souls and blood-borne which is equivalent in our suspend invest we work for National Instruments in Austin Texas but we live in Costa Rica so what are we doing here what are we going to start talking about is all the data sources and I'm not

going to explain what is a next-generation firewall what is net flow what is see they are look those as a data source those contain fields when you start getting the logs from each of those tools those fields might be IP addresses phone numbers domains agents protocols methods in what people is normally doing they are using those fields creating visuals reports dashboards alerts and even much in learning jobs so what we are covering here is the green part which is their enrichment and we are using three pockets to classify the differing enrichment that we are going to apply to the different data sets and different data sources for example format intelligence and labeling that that's something that we are going to

speak later in this presentation and we are doing that for one correlation and two to tell more than what the lock is saying alright so when we are planning to receive send our logs to a certain platform whatever platform you want to choose we have to consider the verse type the types of data sources for example we have the open sources VPN web filtering next-generation firewall gateways for email security etc and these are some of the examples especially when we're talking about information security so we get all of this as an input and then the output is something you choose in our case we are working with the elastic so I will give a brief overview of how the elastic

works and the idea is that we have a central and search engine platform which is a lossy search so a lossy search will be the one in charge of all the data it is you can do the search through there maybe through an API or using Cabana Cabana is the front end where you can manipulate or generate analytics from the information in having elasti search so this elastic search isn't is what central through the whole stack but with the addition of Cabana and pretty much is what you are going to use for an elite analytics later then where do we get all this information from the input and send it through a lossy search so we

have two things here that works for you which is the beats beats our single purpose and tools that for example they grab information from ports TCP UDP even net load ports mental information sorry then we have what is log stash log stash is a multi-purpose it has lots of plugins where you can connect some kind of input you need and process all the information and then send it through whatever output you want to work with log stash is not mandatory to choose with the last search blocks that you can send to another platforms so the data Richmond part will be focusing on using log stash mainly and the analytics part would be mostly related to Cabana so all right we're

seeing all of our logs to those platforms but when we are sending those logs we're just getting this one you open the interface of whatever analyzer you're using for this kind of stuff and you just get okay there is there is the the flow of the data we're getting from the last 24 hours this is a list of information we're getting from it we just expand the documents and see what's there okay that doesn't say anything at all it's like we just have documents but we are not doing anything regarding ok this means like this happened here and this is related to something no this has no correlation at all this doesn't have any context at all some people even do some

dashboards but not not as advanced as the ones shown in the picture maybe they just have a list of things they just have like a bar chart with a date histogram and that's it they don't do more analytics from this it's something pretty simple so what happens here is that also not all the tools sorry all the tools may not speak the same language or what is something for this tool means has another name of another tool so for example maybe you have a field called source ipv4 4x2 but in the other one is source ipv4 da de yada I didn't mean the same thing but when you just received those logs and you don't change the name or you don't

find a way to find a common ground between them you're just getting two different fields that maybe you cannot do correlation at all from them and it's not useful for you and there are lots of cases we have been we have a huge list for example this list here it's one of the examples that we have - we without before like trying to find a common name for each field by grabbing all of our sources and then classifying them so why a Richmond so what's the importance here is that we want to correlate like events that are happening in one source with the other may help you with better incident response or having an idea of

what's going on then we have the context again the idea is you get more of our hours your logs you have a better panorama of what's going on and then the analytics for if you need to do reports you have to search or alright somebody is going to see your your platform and work on your platform having those analytics ready and in a way they can understand that do very advanced search on it is a huge plus to detect any kind of issues later

to cover these in three parts the first one is theoretical so it's going to be very boring because I'm going to try to explain what we are trying to achieve Theory correct and then we move to the technical part which is the magic that Pedro is making with the scripts and repositories and all the things that I don't understand okay so contemplate our approach so we have the data sources we have the fields we already covered is the different values and then to start explaining these let's use three pockets they use four format intelligence and labeling so in format is everything related to the format of the field if this field could mean something that is

already unknown value using dictionaries or even taking a string and fractioning that string in different values then we have intelligence with it which is external information we are going to inject to the locks we already have to some fields in the locks and then labeling is known values for us that we are going to start tagging so this lock for example it's coming from administrator keying Azure Active Directory so I'm going to tag this as an admin lock so let's talk let's start with the format common name so if you are ingesting in the Splunk or elastic or any other tool CDN blocks for example content delivery network from the cloud fair or Akamai and if you are also

extracting logs from the web server let's say Apache they both contain the same true client IP which is the true the AAP of the client behind that is making the request and the city and we'll call it flying IP and the web server will call it through client IP so the meaning is the same it is the same IP and you can correlate that but the field is separate and this is really basic but most of the time people are not removing or modifying the field names in this case what we want is to call it the same if they mean the same known format for this if you are processing IP addresses then make your

Splunk or elastic understand that field as an IP address there are plugins for that because sometimes you are not interpreting that as an IP it is just a plain string and you are not able to query based on CDR notation for example severe ranges for example the other part a cool one that I don't know if you guys know is the e.164 numbering that's for phone numbers so that's basically taking from call data record record CDR from Cole manager onion is taking all the phone numbers and it is enriching those in a format that you can read easily basically the plus country area code and the rest dictionary this is about known values for I'm going to explain this with an

example if you play with NetFlow run at flow you see that the net flow TCP flag is X a decimal so why you keep the extra decimal I mean it is easier for us to basically move from the extra decimal value of TCP sing to actual the word sing and looks pretty in the dashboards and it is easier to manipulate also another example a net flow is the IP protocol number so is extra decimal for the TCP protocol or the IP protocol and you see the number there just enrich that with what it means which is TCP ICMP or UDP the other one that I want to mention mention in the dictionary is there are repositories that will take

your user aging from web from Apache let's say they take the stream from the web agent user agent and they convert convert that to for example in this one you see Mozilla's five dot up and string there that it is not telling you much but then if you enrich that you will resolve that as chrome 18 on Android jellybean Samsung so it's giving you the browser and it's giving you the platform decompose in this case we are taking the URL and we are looking at the URL at different we are taking the slash and each slash will be now a field and perhaps you're like why you're doing this because it is easier to loop if

botnets are targeting a specific sections of the network you will see it spikes in different URL sections that's an example the other one is pricing but we will cover that later intelligence this is the second bucket so geospatial is basically taking the IPS and the phone numbers and enriching that with the location a lot of times people are looking at the locks and they see a public IP they take that public IP to our ring or apni or reap or whatever and they look for the owner of the IP why you don't do that automatically for every single log that contains an a public IP address identities same you're looking at IP you want to see who is a registrar instead

of doing that just enrich that data so every time you have a public IP in whatever source at the year location and at the AES egg number based on BGP Pedro will cover how to do it later security and others use the intelligence you receive from us earth from New Zealand cert from I don't know every country has a cert if you're subscribed to it and get all the IOC s the IPS domains start creating a repository and start enriching your data with those so if you see a hash and you are collecting a hash then I started enriched that I started from so if you don't know and I if the u.s. cert send you a list of IP

addresses that are being there are related to North Korea North Korea thing just take that create a dictionary and enrich that with your current data going forward and also if you can retrospective to see the past labeling so in labeling we have different approaches here and they are most of them are based on the previous enrichments so flow direction if I want to see inbound and outbound in elastic or Splunk what I'm doing is I'm taking the private range of my my private range or even my public range and if it is inserts IP filled them I target a source company or it's my source if I see it in the destination the destination company

so if I want to this as an example if I want to see company sirs but remove company destination I'm asking to see outbound traffic yeah or no [Music] yeah album okay IOC is same example takes Cisco Talos information from blacklist at eyepiece it is available you can go to Cisco tell us and see the blacklist domains and IPS you can download that every day every hour and it's starting and use that to label as you'll see from sap from Cisco or IOC from Talos each time you see that IP in your net flow in your next-generation firewall locks compliance since you are enriching the data with the geolocation now you can see traffic from embargo countries so

sanction area destinations everything related to cover Iran North Korea Sudan and Syria you can look for I emails coming from those countries flows coming from those countries or even traffic in the web site coming from those countries you can since you are adding the ASN numbers number which is the owner of that IP you can even start tracking for the embargo companies the interior list business intelligence this is just tagging based on competitors investors and clients and this is because you are now able to move from an IP to who owns that IP and now you can if you start tagging competitors investors and clients you will start looking at thing answering like what are they doing in

our website so if you want to see a competitor you just filter by the competitor and you see all the activity they are doing in the website people similar in this case we are going to since we know the email addresses and the accounts of the sea level and also the system administrator we are going to start tagging as admin lock each time we see a lock containing that string [Music] starting with this is kind of difficult so I base everything in Excel at the beginning so for example you see all the data sources there and then the different categories so categories in what I want to achieve with with each of them so for example if I want to say

IP which data sources include an IP all of those and now let's see geospatial which data sources can i enrich with geospatial information based on IP and I start marking an X in all data sources where it applies this is just a map to a star live okay I want to enrich where do I start this is an example of the IP base so I have the different IP formats the data sources and the description of each field containing an IP then at the end I'm suggesting a change of the field based on what it means so look at all the different numbers different names a value a field might have and in the last

column you can see that actually they might mean the same and they need to be changed in order to correlate and to facilitate the stuff and then start marking with X like where or which and reach me applies to this for example your location every single IP public IP area saying every single a public IP flow direction so if the source if the P is source IP is private thing market as your company source IP same for destination if it is public and it is within the embargo countries then target as embargo saying with competitors or major investors or clients and also the final one if it is public IP staying also target using the intelligence you

are receiving front from the external world related to Isis yeah I told you that was the theory code part so it's practical here so you see that Rodrigo simply designed everything that he wants he was going to analyze this data later so you have a lots of requirements that you have to keep up and try to implement them on your platform but you see there's lots of sources there are lots to implement and you think well this is very overwhelming because where should I start so you may feel like overwhelmed but in the end if you go step-by-step everything starts with a simple step and then you think about what other choices you have to

implement this idea and improve it you make me feel this but if I like it the first days okay so just start everything you can find on the net can be useful for starting this up like when I was in college I there was something like everyone was pretty lazy to do the stuff on their own oh if there was something on the net okay there is something done we may try to use that so we have lots of sources where we can retrieve information like lists products or whatever you can connect with with the platform so for example public repositories be a github yet lab they have like lists or scripts that may help you to retrieve the

information maybe save it on a file for later being grabbed by one of the beats that can read a file or just an API that you can contact with through lots - and retrieve out information so there are also newsletters or trustable sources where they for example each day they update the the list of IPs that are related to a certain malicious activity mining activity so maybe in this case you just have okay and I just have to adapt this list but it's a simple task I just download this list and create a small script to adapt it to a acceptable format by the by logstash and that's it then for example for the geoip part we

use Mac's mine for this they have GIP databases that with the IP and you can retrieve where it is located what what is the ASM they sent me like if this IP is from a company like Microsoft Amazon whatever so if you can just narrow what you need like okay Rodrigo asked this maybe I should look at this and try to narrow all your earlier years research before doing everything you find very good results then again logstash logstash has lots of filters to apply so doesn't matter where you are trying to retreat the sort the source of what you want to get to enrich the data log stash may already have a filter for this so for example maybe you

want to retrieve information from a certain database log stash offers a way to connect with the database and you specify what kind of field you want to compare and then it contacts the database and there's there's results it will generate a new field with the with the result it found so when you try to work with log stashes works in a very simple way because generally they work through plugins so you get the plugin you specify which attributes are the plug-in you want to use and that's it generally you you have a set of plugins that good work like in the Hat it's work load all the documents that are related to let's say this is a patch information

so I want to apply to this a patch information all these plugins like for enrich the data and that's it it's very simple to to use it there is enough documentation to understand how it works so it's up to you to to see what suits you for what you need and then apply it and okay it comes the the point that ok logstash didn't work for me or what it offers isn't enough or cannot do what I have been asked so logstash offers something like which is the Ruby plot so if you're not familiar with Ruby the Ruby language is not a difficult language to learn I when I didn't know Ruby and when I had to do some tasks

that are related to like very complicated stuff that I couldn't do we just want plugin I had to learn Ruby and in two weeks I had the basics already done and it was kind of easy to code from this side all right you may not have a background in coding maybe you can get whatever I don't know a programmer from for an other area and just okay can you please help me with this project is not something complicated to do so in the there are cases that we had to do something for example one issued with the last search with nested fields is that it doesn't process them well or is not that well supported so there was a time that with

one of our sources we get like the list of a number of ulnar abilities after doing like a scan of the machines and we only get the score as a number but we wanted to classify them through the CBS s like okay and this vulnerability here has this score it is a moderate it is moderate so this wasn't on the logs so we had to do it it was kind of easy to do because how the how this plug-in works so you get works through simple crude actions like you get a field you set a field you can delete a document you can clone them and even tag it so in this case we just needed to grab the

field where the score was and from them we have to compare with a range like here so if the we just compare okay this number and associates with a height so we just need to create a new field that will show that we showed the result of that script so for example some think guarding - there was a document that was showing up on availability of 99.1 we should be high so this script will just grab this number and then convert it to the to the CBS s and doesn't matter if it's a list of 100 vulnerabilities it would do its job as expected and also something to add is that if you are kind of worried about

okay maybe this script doesn't work or maybe who crashed the whole log stash application and there is the possibility that you can just run a unit testing on it which is a good practice and then you can say okay maybe if I put this case here and let's see how it runs it runs a suspect or not and then you can get sure okay this script works as expected I think I can just run it on a log stash so you are you can get really sure of what you're coding in this case is helpful when working for you so let's let's put all together I'm going to show you examples in dashboards with the

enrichments and then Pedro will explain how he is doing that yeah analytics so let's play with NetFlow since I'm enriching the data with I'm removing the exit decimal and I'm asking TCP and I'm rich in removing the TCP numeric value and resolving as TCP hexadecimal value for TCP flags and resulting us what flags are there in seams I'm tagging for private versus public IP addresses I'm able to say okay in this case give me everything that is TCP everything that is TCP flag and everything that is not coming from any source so I'm looking to our inbound traffic long net flow this is TCP sings what I'm trying to do here is to check all the scanning activity

people is scanning our network our public ranges and also I mean reaching that source IPS with the country they are coming so you can see for example if you see we're activity from a specific country you will see the spider saying with companies most of them will be ISPs but you will see if the you are under the OS or some somebody is this kind of your network you will see and it spike not only inserts IP addresses also in who owns that IP if we are doing the same thing for block it so we want to see if something is messing around with RDP semana if after we enrich the data I'm asking okay in net flow everything that

is disappear everything that is saying anything that is not coming from our public IP private IP ranges and the destination port is 3 3 or 89 so I'm able to see what they are DP activity and if I switch this to give me just a nice company source IPS targeting company destination IP s I'm telling I'm asking to give me all the internal traffic so I'm looking for who is a scanning RDP same for wanna cry you can set it as SMB and you will see who is scanning on SMB to Airport and that is useful to detect malware outbreaks so how do we do that in this case we are doing comparing with a dictionary so we

have dictionaries that includes for example the code of and the protocol that is associated generally reduce at the logs comes with showing the number for the TCP flag or the protocol there that is contacting and so we have to do simply is just compared with those dictionaries ok it came 15 or as a protocol so it will show a next net as a result the same happens with the flags we had a list of flags there and when it's related to the flow direction what do we do here is that we know some IP is belong to our company and what we do is okay for all the IP fields we have we just need to do like

the IPS that start with this specific pattern we know it isn't they are eternal so we just indicated okay please tag those IPs same for any other IP you want to start tagging and you just need to find a pattern for it and then you just tag it as as you need so basic data loss prevention for example what I'm doing here is I build a dashboard just to count or to assume how many bytes are going from my company to the external world and seems I'm enriched in that data with who owns the external IP is I can say okay Microsoft is normal here window.google is normal he'll here Dropbox is normal normal here so I can

start like monitoring where IP who is sending data to the external world and how much data and the cool part of this is it is very basic just to check the data that is going out I mean the amount of data but the purpose of this is to filter to remove known destinations that your company is in focus on those who we're activity that happened that perhaps somebody is sending 30 gigabytes to ISP in hunger and we have no business in Hungary so let's investigate that so in this case the magic of the geoip databases is that you indicate what kind of a field with an IP you want to use to check and you connect GOP database to it

and what you will get is this information right here so you get the name of the company where is it located the coordinates if you want to if you like to work with visualizations there are maps they are very useful so you can get they can pinpoint where specifically it is located in the map and that kind of information is also useful if you want to just do an another through the time zone and the region code etc content delivery network and web servers an e-commerce so this is security analytics for web what I'm doing here is splitting the enrichment is a format and is the decompose so what I'm taking is the URL that is actually

and I'm explaining the value in four different values what I'm trying to figure or to detect is if there is a bot or somebody focus on a specific section of the website for example the search part if you are start getting a lot of gets to the search part you will you will see the host or the section on that of the URL for example could be the third section a huge despite there in that section

so this one is pretty simple we have the URL we know that each time there is a slash we just want to separate them so what we do is separate the URL into different pieces and the idea is that we are only interested in the three first the rest we will grow group into a single field so for example we have it's not useful for us for doing a deeper search but again the idea is that we just grab the URL separated and just manipulate as you need so we continue insecurities insecurities web we are using here for mac format dictionary so in this case we are taking the string of the agent and resolving that as a

readable agent intelligence what we are doing is downloading downloading the list of Potts non boats and crowler's and we are comparing the web agent with that list of bots so the first chart is just a regular string and this one it is in our web server monitoring is the behavior from known bots so what would what do we do here is simply there are lots of resources where we can grab those user agents so for example there is a github repo from mom Paris the day that he updates the list of user agents in a frequent way so you don't have to worry about okay I have to do it by myself no I can just grab this repo and check

for updates and then download it or if I have to update and adapted I just adapted to our needs and transforming fur in our case we work with llamo dictionaries so we just adapt it and generate the dictionaries and then we connect unlocks and say ok this field here is a user agent I wanted to compare this user agent with this dictionary of user user agents I have and then I will get a result from it so this is a simple example here is just as translate and we have the field agents we indicate the destination and then we indicate where is the dictionary located in our server so this is labeling let's talk about email security gateway

perhaps in this case you have sensitive high-value targets in your company and you want to start tagging those so in this case we are taking our email security gateways and each time the recipient is a c-level we are going to attack as sea level and what you can do with this is basically load the behavior of the sea level he if he or she clicks on a link and also you can create an elastic watcher alerts because it's not the same thing with our employee hits an efficient link or your sea level that's called the CF or CEO they click in a business email compromise link for example this case is kind of complicated because this is an internal source and

we have to get the list of all sea level personnel we have to be aware of when ok somebody's new at the company as our new CFO CTO we have to update this list and update with the people that is not working anymore but the point is we have this list and we indicate again through dictionary so for example if there is a click reported by some c-level the logs will indicate ok the recipient was this level and this was a click please tag it as an origin thing so we received the alert from this we can start acting immediately okay same with sea-levels now we know our admins in usher or in next next

generation firewall in usher or even in the active directory so with this I can start tagging I won ELQ to show me every single lock related to Anatomy and it will show a lot of locks because servetus for example they will be enabling accounts or disabling accounts every day but then with that tag you can specify in your machine learning and ok matching learning started looking at each individual admin over time what's the behavior think because we are able to filter those admins by this simple tag and start looking at just what they normally do and alert each time they do something that is normal for them it's not normal this one is pretty much

similar to the previous one but now in this case we are just working with admin access so again we have a list of people there are considered admins and from them we just monitor the activity from each source and the effect ok in this case we have this login or this information modified by this admin so I get stacked with the with the yep ok blacklist in this case I'm using the tag a nicer is IP so it's our company I'm looking at NetFlow I want to see every single internal IP that compliance with attack destination Talos that means that we are downloading information from blacklist IPS in tell us we are enriched in their destinations and I'm asking elk

to show me every single private IP inside my company that is contacting blacklisted destinations so perhaps one or two TCP connections is not enough but if I see on a spike in either bites and bites receive or amount of packages exchanged then that's when we start investigating most of the time is malware other ideas is serve reports so you if you are subscribed to sisa or us-cert you will start receiving a lot of reports from the him they contain iOS's indicators of compromise that they are in this format so you can start building your own dictionaries and start taking those values and enriching every hit on elastic yeah so I said before the idea is that we have lots of dictionaries

that we collect and try to update each daily and the point is as we are just tagging that comparing okay these URL is it is on this list we just tag it as malicious activity and or coin miners for example and that's it simply we just have a list of plugins that are in charge of of just checking them all okay compliance in this case I'm taking email security gateway active directory office 365 let's say and called it the records that's phone numbers and I want to see every single logging success for an embargo country if you take that information if you see a login success and you take that to your company's lawyers that double cut their attention

anistar investigating saying wait call with phone numbers if I see a call to embargoed country that plus over 1 minutes or 2 minutes that's something that needs to be investigated yeah this one is pretty much simple because we have a list of countries generally you may say China but on these databases it doesn't show us China but People's Republic of China for example so what we use in this case are the initials or the two letters that represent each country to do this detection so simply if there's traffic coming from this country just tag it as an embargoed country okay this is the same principle with splitting the URLs to see the content and I'm asking elastic to show

me the embargo countries which content of the website they are going same could apply for business intelligence were instead of tagging okay so I have the country and I have the owner of IP of the of the IP so instead of I can start enriching elastic with okay tag every single competitor or every single major investors now I use that tag and I start looking at what a specific content in the website they are looking even you can start looking at the different products and the currencies they are trying to get it is a normal activity everybody does so yeah so there are stuff that Steel's not there's not on the web or there's nothing a lassie has

done yet so you can just start doing by yourself in this case I'm trying to create a filter that detects e.164 numbers it detects if they are complying with the standard and if they comply they you get like what kind of country belongs region and the original the local number and they meet the national number so it would be useful for if we're doing all this geolocation analysis this will be helpful now to apply for for example detecting scamp cases from other countries so this is mostly how it looks like the idea is that when you get a number it will just throw you okay this belongs to this this is the country code whatever so this is

other ideas that you might art you might find interesting or I'm curious for this so the API that ftc.gov that get will give you their litter me nations which is when a company is acquired by other so it is going to give you the acquiring company the acquired company and that quadrant party so that is business intelligence because sometimes that is not public or the press does not release notes on yet so if you have your major investors or your competitors tagged then you can use this to increase to accelerate when the business intelligence group to send that information to the business intelligent group so if I if I have a major investor that they just a quasi a company or a

competitor that acquire a company and it is now public publicly here but it's not in the news then I'm going to use that information to send it to the business intelligence group and they will start looking for which is the industry what's the sector why they invest in that place why to invest other ideas you can take the phone numbers from the CDR this is coming from the Cole manager and you can start using white pages though you have to pay you can start enriching that information the number the phone number with the type in the carrier so the type could be a mobile to the home business and the carrier which is the provider in

a given country the other thing that I would like to do and it is really difficult is in the content delivery network see the yang and the web server and e-commerce so you have the ESN using BGP you get the owner of eyghon IP I want to remove all the ISPs and I want to start looking at traffic coming to your website or coming through the CDN or through the web server I want to see that traffic based on sector based on industry and based on industry and that will be a huge query a great query to see okay these are the list of products and these are the companies that are looking at those products and these are

the different sectors industries and sub industries that are looking at those well basically what we cover is we talk about several different data sources I'm pretty sure you guys have some of those or more more than those and how using three different buckets we are trying to enrich that data and I guess in this we are open to questions if you have oh yeah [Music]

[Music] on your elk stack did you see an increased performance need while enriching this data and if so what was like the the performance hit that it took to enrich this data no we haven't seen any hits on the performance when applying this generally maybe it is mostly related to the input that you're sending to to log stash if there is too much data and you're trying to do all this enrichment it would be kind of complicated but okay we have seen some cases but we've had to discard it because for example it was trying to interpret from net flow all the internal like piece related to internal DHCP server and getting the hostname from it

it was pretty much a very complicated task for log stash that simply crashed it we had we just had to discard this one but the specific is because of the input we are getting from it it is too much data that needs to be processed each document and it's not a good idea so it looks like in your elastic stack here your you have a lot of dictionaries that you're manually maintaining and updating can you talk a little bit about the automation or some of the CI CD or user interfaces that you've developed for so that other teams and groups can go in and help you enrich that data or is this really like a small team that's

managing a whole bunch of dictionary files right now it's a small team that is managing everything is doing the simple way like for example again we get those files from different repositories we just look okay this is Jason it providers this fields here but we're just interested in this field here so we just created a wrapper that just say okay we retrieve this JSON we want to generate a yamo file that okay for this IP please tag it as this this name you know but it's very simple there is no complicated and task being done here just like as a rapper in this case we try to automate this like in the way that each hour for certain sources

because there are resources that you have access on the in 24 hours you have limited access to it so you have to regulate okay I have to do this which a one time per day or one time per hour it depends on the source in this case some of your use cases involved NetFlow data and I was wondering how the net flow ingestion works because net flow is always changing dynamically if if there's a an Alec database that also changes dynamically or if it's doing like a log line to record type of because they're new yeah yeah I get it so with net flow and what I like from log stash is that they offer at the

beginning without the the beets were haven't progressed that much they offered a codec for net flow it kind of did the job I mean it has support source providers for example it didn't have support for vittala before the you just enter in contact with the plug-in developer and okay if you get this information documentation on how can we interpret that love coming from Videla maybe we can just apply on the code and after this we really release a new version of the plugin and that's it you're getting now with teller logs being recognized by but the plugin so log stash already offers this and now file bit also has the the possibility of using a recognizing NetFlow data from it

now on the version they have Dave as NetFlow is sending too much data if they even offer now new features like you can compress the receive data to just filter what kind of information from each net flow document you need so you don't have to the in question of storage there is no compression that that gets pretty much easier to do queries you're just filtering what you need now and there is you don't have to go through all the documents and okay I need to check everyone know he just can't filter it and and get what you need right