
so uh hi everyone um yeah my name's Alex I'm a software engineer at PWC I work in the great intelligence team and a lot of my job recently has been about building a taxi server since uh it looks like a lot of people love taxi so that's what this talk is going to be about um how how you build one and why it's why it's cool so to kick things off the not appreciate not everybody might be familiar with taxi and sticks which are the two things I'm going to focus on today and this is kind of why I think it's worth paying attention to and hopefully might be of use to you so if you're an analyst looking at um alerts or maybe you work in something like detection engineering taxi can be your probably your friend because it will help get uh threat intelligence data into a scene so into something like Splunk or something like Sentinel pretty easily so you can get alerts on IPS URLs things like that the other thing which you might find quite helpful is that when you do get an alert you will with a taxi server get much richer alert data so if you're thinking of just the IP pinging an alert with a taxi server you might also get what's the confidence on that alert um when is it valid until when's it from things like that that are going to hopefully help reduce alert fatigue make your life a little bit easier and then you're also going to get richer context so again suppose you're in Sentinel you get an alert picking up and it's your job to figure out what do I do with this is this important or not very easy in a taxi server to then look at everything else associated with that campaign so from that IP you can get what are the hashes associated with that attack campaign what are the domains what are the URLs and you can also then look up some other richer data in the way I've implemented This Server you could straight away go to a threat intelligence report so I think this is cool it kind of flips the instead of going from report to indicators you can kind of go from indicator straight to the context and that's all automated in your scene so just to break down the two main Concepts here our taxi which is an acronym for trusted automated exchange of indicator information that obviously sounds quite abstract I would think of that as just how you share and send data to people um and then you also need sticks which is the format so if tax is how we share and send data then sticks is the format that we have to send the data in and it's really just a way of making indicators machine readable so say you've got something like an IP how can we kind of mix that up into a basically a flavor of Json that means all computers can read it the same way that's still sounding um a little bit abstract I'm going to pop on to the slide after this to a more concrete example but just a little bit more on sticks that's down to the structured threat information expression and I would really think of it as a serialization format so again just a flavor of Json developed by Oasis um I think you're an American NGO that made sticks with miter who will crop up a little bit later in the talk so point being it's a kind of recognize standardized format and then in Python this is what it looks like to create an indicator in sticks um so here you're just instantiating a class giving it a bunch of keyword arguments and this can all get um exported basically as Json later and you might see already here some of the things I talked about that were called for richer alerts so things like um the valid from value when it was created when it was last modified and then you can get things like a description and a name you you can build on top of this and do many more complex things have relationships between different types of objects and sticks give them TLP red values TLP Amber values lots of kind of fun stuff like that so now that hopefully all makes sense just a quick recap sticks is the way we format the data taxi is how we share it and send it to people in about hopefully 12 minutes after my talk I'm gonna run through how do you actually make this taxi is a very big topic so this is going to be just the tip of the iceberg but these are the three main challenges I had in actually implementing this so one was getting your data into sticks it might experience a lot of organizations probably have threat intelligence data in quite an unstructured format probably you've dumped it into some kind of no SQL database something like something like elasticsearch that's quite cool at the time if you're an analyst you can just Chuck it in and forget about it that's a real problem though if you have my job and are trying to get it into sticks sticks is very opinionated and really needs a specific format that's challenge One Challenge two say you've got you know 100 gigabytes of data to convert um you you probably don't ideally you don't want to convert that and have to store it in another database because then you have to pay for that extra database so a neat way of doing this is you want to convert all the data from sticks in something like elasticsearch and do that on the fly so you can keep your old database that presumably integrates all of your existing systems you don't have to create a new one and you can just send it straight into sticks but for that to work it has to be pretty reliable because again remember sticks is very opinionated and it's also going to have to be performant because a taxi server is at least in my case and presumably most of them is going to be something people are going to want to hit fairly regularly if they're getting it into something like Splunk or something like Sentinel so how are you going to do that so it scales and then Challenge three you you by this point you've only got stuff into sticks so we haven't actually got it into taxi which is the protocol for sharing stuff so that's the kind of third one so I'm going to start just with that first challenge how do you get stuff into sticks um this is a sort of made up example of the kind of thing I would deal with at work this is what an elasticsearch document roughly looks like so if you might have forgotten but if you remember on the slide I was showing in sticks for producing a six object in Python that looks pretty different to this first of all we've got some nested keys under that source source key there we've got a bunch of keys like index and type that don't exist in sticks that ID string is actually not uuid compliant it's pretty long so a whole bunch of things we have to do that is going to come to convert this into something reliable but we do have the kind of data there in all the values that we want to keep so pedantic is the python Library I'm using here that I would really recommend for this this does quite a good job of solving challenges one and two getting data into sticks and getting it done reliably and in a way that can scale the main reason is good for converting stuff into sticks is that penalty does something very cool which is that it makes python check types at runtime so python normally is not a statically typed language but you can put in type hints but when python um starts running in a python interpreter it doesn't check those type hints if you use pedantic it does check the type hints which means we're going to get a lot more correctness that's useful for sticks because we need stuff to be reliably in sticks or it's going to break it's not going to work in the taxi server and the other very cool thing about um pedantic is that it is super super performant so it really solves that problem of doing things on the Fly getting stuff straight out of something like something like elasticsearch into sticks remove the need to create a new a new database and then also you can see here uh in that that validated decorated method there you can do quite a lot of pre-processing so it um will help you convert stuff into sticks for that that way too so we've now hopefully solved challenges one and two getting stuff into sticks reliably and at scale and now we have to get it into taxi the taxi spec is very very long I haven't counted the pages I've just looked at the HTML website but it's probably several hundred pages and quite hard to read so I would basically recommend don't read this back and instead I would take some uh take basically have a look at the code on people who've written taxi servers so Oasis who came up with the stick spec have got one called Medallion I would really recommend taking a look at and then you can copy and paste the responses from that server and write your taxi server until you hit those responses so the idea here is Skip reading the spec basically take what you know are known good responses and then write your code until it hits those and so that's basically what I'm doing here I have um here copied and pasted some responses from manifest query responses which are part of the taxi protocol from someone else's server I know they're good and then I've just written a test Suite here that hits those endpoints on my server and I just wrote this until these tests passed that has all the tests on my taxes ever so there's a few things we don't need but really the idea here is you can save a lot of time and you also get to call yourself a test driven developer this is test driven development and then if you get into trouble later you can go and read the spec and and debug things but I personally found this a lot a lot more fun and a lot faster so now we've kind of got an idea of how we can get stuff into sticks and share it in taxi what does this actually look like if you're a customer um the the bit I was saying at the start how do we get rich alerts with more context what's it like to actually use what's the point of this and so this is in Sentinel what it would look like if you're adding a taxi server so if you go into Sentinel um you then go to data connectors page from there you can pick taxi and it's pretty simple you just have to put in these sort of four or five values the first one is um design you can come up with I've just called it campaigns but you could name it whatever you want to remember it by and then taxi servers typically use HTTP basic auth you can change the auth method if you want but if you use that default makes life pretty easy you just put the URL you want to hit a username and a password and then how often you want to pull the indicators and really that's it so it should take about two minutes to configure if you're a customer and you built the taxi server correctly for them um and then what you get at the end is is this this is what the view looks like in Santa now so here on um uh my left you're you're right you will see some of that richer alert context hopefully um you can see there's a confidence value of 80 for that IP you've got when it was created when it's valid until um there's a description value there which is the name of the threat actor that my team tracks for that IP but that could be anything really that you want to call it um and then the other thing that is you know the main thing that I think is pretty cool here is if you see on the on the far um left of the screen the names those names refer to our threat intelligence reports those CTO tip values each one of those is a unique identifier for a report so if you want to pivot from that IP suppose that IP is pinged an alert and get the campaign that's associated with you can literally just copy and paste the CTO tip value for that IP into Sentinel and then you'll be able to get all the IPS URLs domains and hashes all the indicators associated with that campaign straight off the bat and then if you're one of our clients you could hit our API or hit our portal and read the report and see what's in it so very very fast way to get context um for for the person kind of monitoring Sentinel that should make their life a lot easier so that's kind of really what what we've built I'm just going to wrap things up now um these are a couple of resources that I was speaking about so the the first two there are some example taxi servers this is the bit where I would say just keep reading the spec have a look at how they've done it and and kind of write code until you hit their responses first ones by Oasis who came up with the the stick specs so very reliable that's the one I used open taxi is also quite good um they're sort of eclectic IQ team are very good at Intel and then if you want to look a bit more at sticks I would recommend going to the examples there and just having a look at stuff probably in Python I find that a bit easier than reading the docs and just to summarize the the three main challenges you have in this uh how to get stuff into sticks how to then convert that reliably and at scale and then how to get that all into taxi that's quite a big job in itself to get it into that protocol to share things and that is about it for me thank you very much for listening [Applause] and I guess anyone any questions foreign thanks for the talk was interesting how would you contrast something like sticks to admire see in terms of showing for intelligence information um obviously admiracy is a bit older but there's a draft version of an updated version um I I don't know that spec that well to be honest I've mostly focused on sticks I I guess the the it seems to me that the benefit of sticks is that it seems to be what a lot of people want because you can integrate it into Sentinel that's sort of been the main use case for me at work that people want a way to pull stuff in straight away I I don't know if that's possible without other format but I know that most of this seems that people are using have a have a tax exactly the easiest way to get it in and for that you have to use sticks so um yeah so I don't really have a more thorough answer there thank you very much