KaiIyer

BSides Calgary35:4858 viewsPublished 2024-03Watch on YouTube ↗

Show transcript [en]

[Music] yeah so introduction so let's read the definition what we have for CTI CTI is knowledge that helps and malicious threat actors their motivations goal targets attack Behavior so CTI is more about collecting information sharing that information processing that information and using that information to make sure that we are ready to def against Mal threat actors um and the value of CTI increases the in a manner suppose I have a CTI Intel about threat actors that's going to Target suppose say an oil and natural gas industry if I have it in time then probably I'll be able to again taken place then the value of that information goes down so sharing CTI in a timely manner is very important and

also to consume that is as sharing in a timely manner then the one of the things that we see with CTI is the quality standards and parameters like if I give you 100 ioc's but then out of that 100 ioc's only 10 ioc's are relevant and valid the other ioc's are probably like the cloud providers IP or some IP for DNS then that's not going to be very relevant for what I have in organization so it's very important that the quality of the sh Intel or up to the standards in order to be effective then what chain blockchain is pretty much digital that's to everyone yet I say immutable but very hard or probably impossible to change so any

transaction that I do on Block is always written into the chain suppose I write something on the chain and then I want to delete that from operation is another transaction so anything that you do on blockchain is probably unmut let's keep it at

that then yeah we'll talk about Lo which is the D that I'm going to present today so that data and City standards so how to use blockchain to store the C information while storing the information I'm just going to make sure that I'm not storing any ioc or Intel information but only the ioc information or in satisfies my standards or my minimum required criteria to be in that blockchain uh the data in this system becomes a trustworthy source of information which can help assess the reputation of the sources suppose I have 10 different sources giving me Intel information and I just want to make sure that all my 10 sources are up to the standard and I want to make sure that I

trust those sources so how do you build a trust mechanism and how do you trust those sources uh just to make sure that you don't want to revalidate again and again but then want to use that ioc's in your production Right Way zero knowledge proof this is an amazing Concept in blockchain that came out in recent few years so what this actually talks about is if I have a statement with me and I'm providing that statement to someone else but then I don't really want to reveal the content of that statement but at the same time I was just want to prove that my true statement is valid so how are you proving to someone a statement

without revealing the contents of that statement suppose you have your uh national ID cards but you don't really want to share sensitive information or your pii with someone but you just want to make sure that the other person or the validator gets to know that you are the exact same person you're claiming to be so that's zero knowledge proof and we'll see how we are implementing zero knowledge proof in our application to make sure that we respect privacy and shares intels and maintain the quality of those intels as well limitations of traditional CTI model or the current existing CTI models in Advocate threat actor understanding I'd say that no not any model or I'd say any

model you can come up with it's not going to be very up to date about the CTI information I cannot predict what's going to happen in tomorrow I can make assumptions but I cannot give you for sure that okay this is what's going to happen tomorrow like I'm not FBI or NSA to give you 100% accuracy of this is going to happen tomorrow so limited understanding of that actor Behavior motives attack techniques hinders effective defense so unless I know what's my threat actor looking like uh what are the attack vectors that tractor is going to use and what are probably the areas or the attack surface where that attack is going to focus on our

exploit vulnerabilities unless I have a complete picture of my attacker I might not be able to defend it efficiently because you cannot provide a 100% more important to understand the threat actors Target your industry and accommodating your defenses according to that data overload and Scattered information as we know that we are just getting into a digital era of information we have terabytes of data generated by an endpoints uh like in a day and then the volume of that data how are you going to process all of those volume of data how are you going to take efficient I'd say valuable insights out of that data and then use that for your threat analysis or your defense so

handling large amount of data is one of the problem that we have in like any industry not CTI but then if I talk about um your normal cyber security operations or I talk about artificial intelligence and data science the problem exists anywhere how are you going to tackle or how are you going to process all of the data lack timeliness as I told again before the importance of providing that threal information and timely manner is very crucial for that to be effective and limited quality assurance I am not very sure if the information provided by my source let it be in a form of format of sticks taxi or whatever but then the source that I I'm

like receiving intels from most of them are open source standards and then I'm not very sure if the quality of that threat inel is what they say is uh like there was this one incident I'll just quote an incident so we had supply chain casaya supply chain incident and then lot of intels were like going around and one of the very good Intel sources FBI Intel sources they were providing a lot of ioc's and then one of the ioc's in their like hundreds of ioc's was 8.8.8.8 and the organizations who just plugged that in just had a like lot of followers coming in and their l1s and l2s were like oh this is way too much we are not able to

like handle all of this volume of alerts so making sure that the quality of the intels that are being provided is up to the market is very crucial when we deal with CDI lack of trust now I'm many organization who has some intels I recently dealt with an incident uh either for myself or for maybe my client but then how am I going to trust the community or trust the platform before sharing that Intel are those intels being utilized properly and uh like the trust mechanism is important because you do not want to share information to someone who do you do not trust inadequate incentives okay I'm going to give you some information but what am I

going to get in return I know open source community is very like like giving and then they don't expect anything in return but at the end of the day the intels or The crucial intels does have or does are with the uh organizations which are like in the private sector and in private sector organizations or corporates we talk about profit what am I going to get in return if I give you something so it's not more about like sharing but what am I going to get in incentive if I share you Intel then resour constraints small organizations or some organizations like a startup or a small or medium organizations might not have a threat in

tell platform like a misp or any other tool in which they can like put all of their ioc's tag them properly share it with the community or share it with the open source organization so resource constraints is one of the problem and one more problem I mean I've not mentioned it here it's not very relevant that's retaliation like I'm a CTI company I'm like giving out very valuable information about a particular threat actor out in the open what if that threat actor or some other threat actor is going to Target me and then come after me that's retaliation but not very relevant reluctance in Sharing why am I not sharing CTI what's the problem in

Sharing CTI data sensitivity and privacy concerns right so uh I'm an organization I have got some compliance since uh like I cannot share client data I cannot share some crucial data because I have to comply with the policies that my organization has laid down or some is so certification saying that okay you are not allowed to share these amount of details in the out so that's one of the reasons and then competitive Advantage if I have got more information about a threat actor than my competitor suppose I'm a service based company I'm providing an mssp and I'm going to a client and I'm saying that I can defend your organization better than my competitor because I know the threat

actors better than them so that's a competitive advantage why blockchain for CTI what are we going to do with blockchain so the problems that we have we discussed is the demand for CTI is growing really fast I scattered like I know there are standards we have like CTI sharing standards uh sticks format is there taxi is there but then are everyone following the like while sharing CTI are everyone following that standards are we having a proper what you can say uh standard or a common way of sharing CTI why share CTI intels without incentives how can sh CTI be shared without compromising privacy and quality of thread Intel data can vary across vary and often lacks

quantifiable metrics so these are the problems that we are going to solve with blockchain because there are like 10 20 problems that exist in the currenc a model and we cannot solve all of those problems with blockchain blockchain is not a silver solution to solving all of your problems but some of those problems can be solved with blockchain and we going to see how we can solve these problems with blockchain so the proposed solution is blockchain offering a comprehensive solution to the challenges in CDI sharing okay now what are we going to that ensuring data Integrity like I seen ioc the while and I want to share it across now the most important thing when

sharing that is the timeliness how do I know when was it seen for the very first time so immutable timestamps is one thing which I really want with my ioc's U utilizes a reputation mechanism to build trust within the CTI sharing Community how do I build trust and reputation like I am a source which is providing CTI information but how do my consumer trust me what is my reputation how do I build that reputation and Trust uh enforces quality standards for CTI feeds are my CT I feeds up to the Mark or satisfying a certain quality then combines the five different factors which is very crucial for CTI reliable sources sufficient context consistent data model defined process and

automation uh we'll talk about automation because at the end of the day we are like moving towards an automated model and we just want to make sure that how are we going to chunk through a large large amount of data get uh out of it share it with the community and how can we scale it up so scaling is very important when you want to do uh CTI for various organizations but not just within your organization so we'll be seeing how we can achieve all of these using the appap called Loy just a flowchart of this so this is a high level understanding of what we are going to see so I have my source

which is the submitter suppose I'm a submitter or I'm a source I have some intels I'm just going to log to the portal or log to the app and I submit my CDI or submit my intels now after uh that it's going to send that across to the validators the validators are uh anyone who has some reputation in the like CTI community and can validate those sources uh we'll talk about how are we going to select the validators and how are they going to score that um Intel then we have an Intel quality EV valuation which is going to take something called a proof of quality proof of quality proof of work proof of stake proof of value Val different

concepts in blockchain so we'll see what is proof of quality and how is an Intel evaluated against the proof of quality and after evaluation if the mean score given to that Intel information is greater than the accepted score or the minimum required quality standard that's going to be written to the blockchain uh and then from blockchain you take some information about the validator because we also want to make sure that we are not using the same validators but then we also want to make sure that are the validators good enough are there some validators who are just playing around not giving the right amount of information so validator information is also going to be extracted from

blockchain and that's going to be used to select validators for the next time so that's an iteration so this is the high level overview and once those are in the chain you literally can log into the portal like you are the consumer you just want some ioc's you log in and you extract those ioc or Intel

information the app roles so any app has got some functionality of the users that can log into that app or the users who can use that application so we'll talk about three primary users the first one is a submitter who can submit the Intel information with an attribution and proof so I have got some Intel I did some research I did some incident response I got some Intel information I want to share that now I have to share that along with the proof or the attribution saying that this is how I got this because the proof is required for validator to validate how you got that Intel and is that the right Intel that you want to share with the

community validator evaluates the Intel and assign a score if the mean score is greater than the uh score which is accepted which is required for accepting it into the blockchain it's going to be written into the block or else discarded then consumer so the consumer is any user who just want to use uh CTI information can log into the application see the first scene score first seen Date sorry so when was this Intel seen for the very first time what's the score of this Intel like out of 10 maybe and then the tax what is this Intel doing suppose this is associated with Cobalt strike this is associated with some ransomware group or maybe this is associated with

an uh uh brood Force scano something of that sort this smart contract and the validator selection so when we talk about blockchain it's always about the smart contracts and how is a smart contract going to evaluate all of these things so the first thing the ioc will have the following features the ioc will have an address of the submitter this address is the blockchain wallet address of the submitter because after submitting this ioc and this ioc is written into the chain we also want to reward the user so the incentive that I talked earlier the user is going to get rewarded for all of the which is written into the chain so how are you going to

reward that that user so that's why we take the address of that user then the ioc type um probably three types uh it's an IP address it's a domain or domain or DNS query and then you have hashes sh 256 md5 whatever then uh a Boolean value for validated is this validated and written into the chain which is true or else false then the quality parameters we have some of the parameters which are required for that quality evaluation so those parameters so this is how a roughly uh structure of an ioc is going to look like in the blockchain smart contract and then the selection of pool of validators so the selection of validators is performed by something

called a validator selection mechanism also called as vssm then the initial validators so I know how to select a validator if I have like hundreds of validators and I also the reputation of that validators in my chain but then how am I going to select the validators in this start I mean I don't know the validators reputation or the trust so how am I going to select my initial set of validators that's basically based on the trustworthiness outside the chain Network that's like you are someone who has been working in the CTI industry for a few years and you have got some good reputation in the company and you have got some good reputation in your community that's how

we select the initial validator but regardless of how you select your initial validators and who you select as your initial validator once you have all of those validators in the valid selection mechanism is going to just select the validators with the highest reputation uh so it doesn't really matter who the initial validators are because as we progress with the chain the validators who gets in are the ones who actually have a reputation in the chain Intel quality evaluation okay how are we going to evaluate the Intel quality suppose I have an ioc what are the parameters against which I'm going to evaluate that first thing false positives frequency of feed invalidations suppose I'm saying that I

had a ransomware incident recently and this is an ioc attackers hosted the C2 on an Amazon ec2 instance in a like ec2 instance and then I'm giving you a random Amazon IP is that valid is that for me to say that okay Amazon IP is a ransom C to host no that's not enough so false positive or the tendency of that Intel to be a false positive verifiability connection to the primary information sources how good of an I like how can I evaluate that ioc that this ioc actually came from this particular incident or from this particular um what you can say tag whatever tag this particular so can is it for me to verify that then

intelligence added value through associations with other data it's more like enriching that IO with some additional information that you want to provide alongside that ioc suppose I say I have a file name which is say w cry. 60 but is that an ioc uh but then if I say that I have an mty hash of a JavaScript which is going to encrypt all of the host in my network and then providing that alongside the contents of that is going to help me understand that okay this is uh some which is very relevant and I can add this to J uh interoperability add to specific data format standards if the ioc or the submitted intels are in a given format

suppose say sticks or taxi then it's easy for me to put that in the chain but if it's in a format which is not very easily understandable I'm just going to give a CSV file fine but if I'm going to give you HTML file you just have to extract that data out of it not that great XML file not that great so the formatting how you submit that and how that format is aligned with the standards which are using for your CTI model then syntatic accuracy feed compliance with standards I mean how accurate that is given to a particular syntax suppose I want to ingest all of these in a nested Json format how uh like how syntactically

accurate that is is it following a key value pair is it just giving random values in structure um how you actually submit that in a particular format uh originality the uniqueness of your feed suppose someone has submitted an ioc one year back and you're trying to submit the same thing again uh it's not your original thing it's not unique to yourself it has been submitted by someone else so how original or how unique is your Intel information timeliness how quickly you are releasing your Intel information I had an incident with a client last month I'm releasing it today well and good I had an incident with a client two years back I'm releasing an Intel today not that great

so how early or how fast you're releasing that Intel is also information uh the impact uh suppose I'm giving you an ioc which is a Brute Force scanner now I'm giving you an ioc which is a C2 that host your uh suppose say good kit malware so how impactful that is uh to an organization uh Brute Forces Port scanning uh not that great I mean I'd give a score of like two or three because I don't really want to care about the open source scanners or the N scanners quality scanners but if I have something sort of a hash of a script or an executable that has got some particular uh tactics and then I just

want to give that as an Intel I'll probably uh value that more than my own uh like BR Force scanners validators performance now we about how an Intel is going to be submitted to the app and what are the quality parameters on on which that is going to be evaluated now I want to know how are the performance of a validator is going to be analyzed so uh I have couple of validators suppose say I have 10 validators and I have one ioc which is submitted for instance now how am I going to validate the per performance I just want to make sure that I'm not using the same validators again and again I'm iterating over them but how am

I going to measure the performance of them so after validators submit their scores very first I suppose say a do um suppose I'd say a for an ioca there are 10 validators they're submitting a score out of 10 now the first guy is going to submit suppose say three out of 10 second guy four out of 10 seven out of 10 8 out of 10 10 guys have submitted the score out of 10 for the ioc a now I'm going to calculate mean score of that ioc suppose the mean score is going to be 5 out of 10 now I'll calculate the deviation of each validators rating from that mean which is basically a root

squared mean error and whatever you want to call it so it's basically a standard deviation now I'm an uh I'm a validator who has submitted five out of five and the mean came out to be five out of five good enough I'm a validator who submitted one out of five uh and then what is the deviation that so the more the deviation the less your reputation is going to be the closer the uh like the smaller the deviation the closer you are to the mean that means you are right on your money like suppose everyone has given it like a score of nine mean comes out at nine I'm just one random guy who's submitting like two out of nine so

I'm like deviating lot from that mean score so probably my reputation is not going to be that great for that particular ioc now use this deviation to update the validators reputation score use a voting procedure among the validators to finalize the status of each ioc now I have 10 validators and then I know that out out of that 10 validators I have six validators who have a high reputation those six validators are going to be final decision makers whether this ioc is going to be if satisfies the quality criteria WR it to block and update the chain what happens after an Intel is published so once the ioc's are on the Chain they become accessible to all the

users for the intended red uh now I have a database of Intel information and any user can search the Intel information from that the number of qualified iocs will be used to provide rewards to the submitter suppose I'm a submitter who subed 10 ioc's and all of my ioc's got accepted and are in chain that means the ioc's that have submitted have got have met the uh minimum quality standards are on the Chain now it's time for me to get rewarded and then the validator who's going to validate this ioc is also going to get rewarded because uh why would I go and validate ioc's if I'm not getting rewarded in that so the submitter is

going to get rewarded and the validator is also going to get rewarded initially we'll distribute it something around 50 50% and after that we'll change it accordingly to a point where the validator is not existing in that chain and that validation process is completely automated now how are you going to automate validators uh like validator performance thing and the validation thing it's more or less like having random Forest uh okay A bit uh into random Forest you have a cluster of trees and every single cluster is going to come up with some score and then out of those some trees are going to be selected to give you the final answer for that and then uh

clustering mechanism to see the reputation of those validators and the cluster with the highest reputation is going to be selected and the outliers are not going to be selected that's how you choose your validators and measuring their performances as well the consumer so I'm valid I'm giving out rewards to my submitter and my validator but where is this money going to come in from so when I talk about money it's more about a gas or a fee whenever you want to do a transaction on blockchain you have to pay a gas fee so where is this gas or the fee is going to come out from so that's going coming out from the subscription model so as a subscriber

you'll have to pay a small fee to use this one you can use what model you want to do like pay as you go model a gold subscription silver subscription Platinum subscription so you'll have to pay a annual fee or a monthly fee and then you subscribe to this model and that U like reward is actually going to be distributed between your validator and your submitter and after one point validator will not get anything because that's going to be automated so all of your rewards are directly going to your submitter takeaways so we have seen the challenges with the traditional CTI model and uh we have seen how a blockchain based CTI approach is going

to address some of those challenges not all of them enhancing quality and authenticity of uh Intel feeds how are you going to make sure that the Intel has got some standards of quality uh automated CTI sharing how are you going to automate the process and how are you going to make sure that the quality assurance is also there while automating that sharing Intel while PR preserving privacy yeah so this is something we use zero knowledge proof when you submit a Intel information or an iot information you do not want to disclose the entire context how you got it what you did maybe you have your own PC maybe you have your own research you do not want

to submit that so that's where zero knowledge proof kicks in you just have to submit that ioc and why are using zero knowledge proof that can be validated by a validator without actually looking at your content so your POC or your tool or your research stays with you only the ioc's go into the chain uh ease of adoption and integration blockchain based model and this is pretty much a dab so you can connect it to your uh seam Solutions like if you have Splunk you have uh elk then easy to integrate or you want to integrate directly into your uh CTI tools like misp yeah easy to integrate into that uh yeah I'll take some questions

now y i later can do with ethereum we can do this with uh like the chain want to select it's going to be uh your call but then ethereum placed chains are like easy for this one uh you can use Avalanche we actually used Avalanche for this one but then again it's your call where you want to deploy that uh any other questions yeah please can you go to that diagram

sure yeah

yeah might be comom how will take and second one your vator will get the reward right from work we have seen that the minor who Sol the hash get the reward and pro of St we saw that the high stacker get how about here who will get the reward div the but it will be fair to other like I have more resources I have used my more time on this one so will be fair for everyone if I speak equally to everyone that's that is yeah so your first question was uh like blockchain is a decentralized app and there is always this problem of if I have the majority I can manipulate the data results like I

have 60% majority I can make whatever results I want I have my friend submitting ioc's and I'll make a group of my own validators and they'll actually validate all of those ioc's rewards going to my friend but then the validators who are being selected is always changing and then it's always making sure that you select validators from the Clusters which are having high reputation and also making sure that even just to make sure it's fair you also give one or two validators from the lowest clusters with a reputation on one chance and then slowly you build that reputation in so it's more like the uh reputation building thing it's not like you just want to make a group and you

are able to make that group it's more like the machine deciding which is going to be the group which is uh suppose for every different Intel coming in the validat group is going to be different it's not going to be the same set of validators again and again it's always going to be randomized and I'll always make sure that out of 10 two of the validators are coming in from the cluster which is having a less reputation so ensuring that the fairness is there and your second question how are the rewards going to be distributed the rewards are for validators it's going to be like the most reputed validator in that pool of validator suppose I have 10 validators

and the uh two of the validators are having a really good reputation so the validator having the highest reputation will always get a slightly higher reward than the validator having a less reputation but then again since the validators are always rotated it's going to be more or less fair but then again you cannot give a 100% what you can say fair system in a blockchain decentralized right so always there is some chance of that chain being manipulated here or there I think that makes sense somewhat oh any more questions oh yeah imil I really don't see any benefit to having this on the blockchain something this yeah like uh one is like incentives how like for

providing your Intel things uh then ensuring quality immutability definitely yeah then Quality quality ensuring your quality like I I I don't accept any intel but only the Intel which satisfy my quality so with the validated s it seems like before the before the automation happens there's a bit of a scalability thing so how like do you think the gas fees would be enough to handle someone like trying to dos the system by just keeping like keep submitting things uh like I can keep on submitting I really don't have a limitation on the number of ioc's that I submit uh but then again uh the gas fee thing like in order to sustain this model you

need to have some sort of gas fee or funding to in order to start that and after that it's going to be like a distributed model you get that from your consumer and pay to your subscriber uh but then yeah just have to make sure that you have enough gas fee to get started on this model yeah

Val Manner and then like to to who would select them like like it seems like it's further centralized not some sort s proc uh the selection process initial selection is basically like you are like outside the chain like you are reputed in your CTI Intel community you just get selected but once you get selected you have a pool of selectors like 100 selectors and for each ioc only 10 of them are going to be selected so the ones are going to be the reputed ones and also random selection So when you say you

select

no yeah

yeah open Mach and do the and some AV here thej of them are not available just go for another iteration select another round of validators because you are selecting out of a pool randomly not necessary that all your selected guys are available so you just go another iteration and select other guys yeah waiting time like you if you SED as a validator and you have X amount of time to validate or submit your evaluation if you're not submitting that uh Second iteration someone else is going to be

selected uh that's something you'll have to like adjust according to how you model scaled how many ioc's are coming in and how many validators you have that's all guys thank [Music] you

KaiIyer

Related talks