
so so for those who don't know ipfs I'll explain it in a moment uh but just as a quick introduction to where I'm coming from from all of this I I'm in a what what's called a forward looking threat research team which means we're not really that interested in what's happening now we're interested in what's happening in the future um far future like you know well 3 to 5 years unfortunately for you you have me and I'm one of the people who whose task was looking even further into the future so uh so so I do a lot of AI stuff I do a lot of quantum Computing stuff um and I'm not going to talk about that at all today because I think most people couldn't care ddly squat about that maybe AI but the other stuff but ni because our team is functions as Scouts for uh Trend Micro um and if you don't know about Trend Micro you're in a good club because it's or Fair large Club because we're actually even though we're one of the biggest computer security security focused companies in the world we're not so big in Europe but our team focuses on the e- crime aspects and the user Behavior changes that will will lead to different um Behavior well different threat Landscapes uh and also technology changes these are the things that we focus on and as a part of that uh we a couple of years ago we started looking into the metaverse because we thought hey this could be a new um a new platform for doing business on which means we have a new attack surface etc etc well I think it really wasn't quite there yet there's no clear architecture but we just did discover that there's a lot of combinations a lot of things that um that lead into it like the whole cryptocurrency you know blockchain thing or so the distribute application ideas um and um there's smart contracts and you probably heard of nfts and things like that and they kind of all let me see does this actually work no it doesn't oh it does kind of work so there we go it kind of LED to the ipfs idea um and and this kept on popping up and we also heard about this through other people as well and so we decided maybe we should look into that a little bit more just backing up though even further ipfs and a lot of the cryptocurrency things and also the metaverse kind of bring in this idea of web 3 which you may or may not have heard of um a lot of that is about ownership and control of your own data leading to kind of like a trustless en environment permissionless environment um with a lot of decentralization ideas built inside and the part that we are mainly interested in in this whole uh landscape is what happens when you press the wrong button um so let's see yeah so is that is that area right in the begin in the middle here the kind of the data distribution protocols which is very much ipfs and rwe and it used to be csky CS C CSK CSK Sky don't really know doesn't matter they don't exist anymore um just to clarify though there is web 3.0 and there's web 3 and they're two different things this is just to keep your lives more interesting uh web 30 is based very much on Tim berner's Lee idea about um the semantic web and how we can also pull back a lot of our data into little data containers that we can then use SOL I think they're based on solid and we can interoperate with kind of a neat idea hasn't really caught on yet uh web 3 is um is similar in the way that they're also trying to decentralize things but it's based a lot more strongly on using blockchains as a as kind of like a trust layer uh as well as decentralized storage as in form of some things like uh like ipfs we'll see in a moment instead of like little personal containers that you can either host yourself or um have hosted in at a at some other company and in web 3 also this monetary aspect is everywhere in it you can't it doesn't seem to be separable from the idea as much um both at least are are concerned about getting rid of these Silo data ideas okay but what is ipfs um who's heard of bit torent before I I everybody's used bit torrent as well anybody who didn't raise their hands I do not believe you um but it's a similar it's not the same thing actually has there's some significant es but it's similar in idea so let's say you're called Alice anybody called Alice here in the audience I don't know so um so if so you're Lis and you have this document and you basically create this hash this content address um address based um hash for that document and you just just let your document sit there basically but then somebody else comes along uh let's say Bob any Bob's in the room this being Germany is probably less lightly um so Bob comes along and says says I'm interested in that document who has pieces of it and and that and the and we'll ask the cademia network which is the foundation for um for ipfs as it is also for bitor uh for for peers who may have pieces of it and then through the bit swap protocol these pieces are brought into you our distributor to BL Bob Bob says I want this piece you have it give it to me and and and then he assembles the entire thing there um Central to everything is this content identifier which looks uh like this is it's kind of a structured hash it's not I mean it contains in the in the end it contains um this hash over here um which uh which is a real hash and then the rest of is kind of boiler plate uh it's kind of neat um it's but the problem is they discovered after a while that this wasn't going to be flexible enough and so they added a another version another type of C ID now the annoying thing for us is that these are exactly these are pointing to exactly the same object um so we have to realize that even if we have one hash we still need to calculate the second one from it a little bit annoying you can go in that direction from from basically from v0 to V1 it's not always possible to go from V1 to v0 and that's just because the you can see in the bottom well actually you can't see in the bottom but you you can imagine that that they've added more Flex ability so once you've gotten beyond what v0 can do you cannot convert it back what is an ipfs object um so if I on the command line if I say ipfs cat and then all that gobbly goog or so then um I will I will just get a file like that so just here's a text file as an example it could be anything it doesn't really matter underneath the hood though you you can also do something like ipfs dag get and the same thing and then you start seeing that it's actually a data structure now this looks like Jason it's not really Jason uh in fact that's also one of the parts in V1 that that has more flexibility it can be a quite complex structure but it's mostly represented as Json like data now if the object is too big and by too big it's like I think 256k I think is the limit you have to start chunking it up into smaller pieces and this is where um this is starting to where be where where ipf is actually quite clever um so here as an example it has some of the data right in the front um you know so over here but then that wasn't enough so it continues on over here so you add more data over here and so and these things point to separate objects that can be called up and addressed separately um yeah go back to that moment and then you can even go for as far as putting like directories uh full directories can be put in there this is in fact how you can host a website um you can you can create a ipfs directory structure with all the different objects in now in separate files with separate file names which are which are put into the uh into the main root object and that way you can you know create decentralized websites kind of neat um we'll see how neat it is later so CS are not quite cryptographic hashes um but um uh you know but they do have the integr guarantees that you kind of need because buried in it is a cryptographic hash so if you're wondering okay how do TLS I thought I secure everything through TLS well you don't need to because in theory you should be taking that object uh and then and checking that object against the hash code that that's inside once you have everything completed it's a little more complicated than just like check them over the entire thing because of all these blocking so you have to do every block separately and then you do the entire block um the problem and most of the clients do this for you at least the ones that we've looked at but there's there there we'll see some exceptions later um multiple addressing now sorry mutable addressing so one of the problems that you might have noticed is that well we're using a hash a hash can only refer to one single uh file uh what if you want to change it let's say we want to update that that document I showed before well what they've invented is ipns don't know why they call it IP name server it's not really a name but it it allows you it's it's a cryptographic um uh signature that allow that can be then redirected from one to the other you just have to resign it uh or I say resign the next document or the next version of it and and that's great um and but it turns out there are other ways of doing the same thing because ipns can be very slow very slow uh in fact one of the problems is that what they do is they push the re these ipns records out into the cademan network and then uh and and then they're basically gone after an hour they they kind of circulate there and then they're gone so the client has to be pushing them out like every hour at the very least um so some people say I think the Kubo client that's a one of the standard ones I think does it every four hours and that's not always enough I've discovered so what what they've done instead is they've invented new ways of of trying to find these mutable objects and that one of is through DNS link and DNS link is basically just a DNS entry and now you kind of will scratch your head and said I thought we're trying to decentralize these things and now we centralized it again on DNS was that the point of it not sure but uh this works and this is actually what you see quite often uh this approach now um you can also use the en system on ethereum so you can put it on the blockch blockchain query the blockchain you get the um and you get the actual IP and so to so to speak ipns you get aain name to a ipfs object resolution and Unstoppable domains also provides that same sort of functionality um yeah but not but not not ideal so okay so just to summarize it what are the main differences between worldwide web and ipfs well worldwide web is very location based so we need to know the server and the path in order to um in order to get it what we want here all we need is this Content ID and we'll just ask Network hey I want that content um it's world is a very strong client client server like system whereas ipfs is is peer-to-peer like think about bitor um and uh it's and of course worldwi is very IP Centric that's like your address for everything is in some IP address whereas what we say is in ipfs is that it's it's Merkel dag Centric what does that mean um so so I I if you know anything about blockchains uh you will also know that they use a Merkel tree um internally and this it's the same thing for the documents as well I should really have a better diagram for this but Merkel dag allows you to do hashes of hashes so when we saw this directory structure before or the or the sub documents in the document uh all these things have to percolate up to that top c number and um and the reason why it's dag and a dag means directed a cyclical um graph there we go just blanking there from all was the G um by the way there will be a test on this later um the uh what's nice is that if you have sub parts like like subdocument parts that are repeated very often um those will don't have to be replicated more than once so if I let's say I push a website and the 44. HTML document has not changed since last time time it also does not get a new hash number hash code or CID because it doesn't need it and so that way we save a lot of of space this makes it a lot more useful for professional use so how great is this well as a computer scientist I think this is really interesting and fascinating stuff um unfortunately the the um the the attackers have also found it quite interesting uh but in any case uh the the reality of it is that it's actually quite a slow system uh still maybe it's just not big enough I think that's our suspicion the DHT and the bit swap protocol do take a long time to retrieve documents I've set sat there for like three qus of an hour waiting for some document to come in um so that's not so great you'll have the same experience as I'm bit torn um it does require an ipfs client um the brave browser though does have support for ipfs that's kind of interesting um there is also a little bit of support in the in the opo Opera crypto but they both at least under on the Mac they both seem to require that you're running the ipfs client locally but it's yeah so one of the things that one of the results is that um sometimes these objects you wait for them forever and and they just time out they never are there because they become globally unavailable as we call it and to combat this they actually use um uh a lot of people use file cash to try to make these things more permanent basically file sorry file coin file coin is a way of paying people to host my content uh and it's interesting because it's starting to see professional use we'll go into that maybe in a moment if we have time um so ipfs and ipns gateways are the consequence of all the slowness so it turns out that people don't want to wait for their website to load you know for more than 10 seconds so uh so instead what they what people have are our ipfs gateways and there are a whole pile of public ones but it also turns out that every node that you run can also be a Gateway so if you're if in your infrastructure and your company if somebody's running ipfs they probably have a local ipfs Gateway in your inside your company um as an example of you know of of how this would work I can say Okay first time on the first line I'm trying to get it through ipfs cat that didn't work I try to go through one of the gateways um that that was obviously over overused I could then go on to um to to a sec to a different Gateway and then finally I get it through Cloud flare Cloud flare also um has an ipfs Gateway and so I load it and it turns out that it's a virus well not virus it's a Trojan um so that's actually the good news there by the way uh so it it's not avoiding um uh antivirus Solutions or um EDR um yeah EDR Solutions in any way it still will trigger it once it hits your file system what the problem is though are blocking it at the URL level when you're talking about gateways because there are a lot of public gateways this is a list that somebody maintains uh and this changes all the time and as I said you still have the problem of local gateways as well but we want to know how big is this problem is this really something that we should be worried about um so we looked into our Telemetry data and since we have a lot of it from all over the globe that gives us a decent view of what's actually happening so we seen that ipfs kind of had been growing a little bit and then kind of slipped um a little bit it's kind of backed down to you know early um 2020 levels um and um and it's still very tiny so this is looks similar but what we're looking at here are the percentage of all of the URLs that we see and we can see it's actually a very small amount of our traffic and that seems like it's good news um however we did notice a decent amount of ipfs spear fishing based in ipfs um using URLs like this and there's a whole pile of variants of this so basically you you have the the spear fishing attacks um you're not no you're not going to see it but it's it's nearly it's not visible but you can see for instance um let me see can I yeah so you won't see it but here it's actually going through the fleek.com page for some Redmond based product um and we looked specifically into this type of Spar fishing because it's very easy to find um and we saw that while the traffic of ipfs itself was going up and down here we're seeing that IP ipfs spear fishing is steadily growing uh it had like a little bit of a blip last year end of last year um and we call that the um CS Sky uh effect because what happened was this is like an alternative to ipfs but they shut down last uh November and so we saw suddenly a spike of people of the fishing of the Fishers basically moving their attacks to ipfs during that time um just just to illuminate that a little bit more you can kind of see in the blue lines here the blue are kind of going down we that's a CS Sky um uh fishing I don't even know how to pronounce CSK sky is a CS Sky CSI noidea but in any case they're gone they you can see that their traffic kind of going going away and at the same time the ipfs fish um Rising during that time ipns is growing um very slowly um but it's still growing ipf ipns fish kind of had its moment and then it went away again probably because of Cs sky and they the attackers probably realize that oh this is just not loading fast enough nobody's going to wait that long um the intrinsic thing for my point of view was I wanted to know um um given let's say one C how many servers are hosting it at the same time and so this is what we this is what I found in the statistics it's a relatively fat tail so I'm dropping of course the one offs because obviously there's a huge pile of servers that just host one object at a time but then I've noticed that um up to about 100 servers it still stays pretty high and that's a problem because that means that once we have identified one CID that's malicious we need to uh block them on all of the servers we can't just say all right we'll just do that one server and have over with no we have to do it for every single server that we know about at the time or we have to you know create a pattern but patterns can be dangerous you know if you patterns without the domain name um blocking so as I said yeah so one just just blocking that one domain isn't enough uh takedowns are as you should realize by now are totally impossible here what we've done is we've tried to hack the protocol a little bit and try to get at the host people who are hosting it we did that in a simulated Network and we never got to that last person who actually originated the content so the OG content uh was not reachable in our experiments well we'll still keep on trying obviously see when we will find some method for doing it uh could you block 4,000 uh1 which is one of the ports um you know that's that's used for for the peer-to-peer protocol the problem is that it seems to be there seems to be legitimate use for ipfs unfortunately otherwise I would say yeah just get rid of it like most people block um bit torant because hey you know why why would that be a thing inside a corporation the problem is um apparently uh well there there's two things so first of all there is a report that uh that Netscape is using ipfs to distribute their their their containers uh software containers not these big metal boxes um the um so so that there was that and then very recently I I I heard of um various governmental uh research institutes and academic research institutes um using ipfs to distribute their data because basically and that mainly using filecoin um and uh that was that was an eyeopener because I had been seeing all of this data that did not look like ordinary data and but I'm not sure whether it was encrypted or not but that stuff seems to be coming more from these research Institutes so can you block it well you'd have to figure out what's what your usage is of ipfs internally the other good news is that um that malware is is nearly non-existent we did um a colleague of mine did a study last year uh and monitored it for like I don't know like a month or so fou