← All talks

Flaying the Blockchain Ledger for Fun, Profit and Hip Hop - Andrew Morris

BSides Las Vegas46:44345 viewsPublished 2016-09Watch on YouTube ↗
About this talk
Flaying the Blockchain Ledger for Fun, Profit and Hip HopGround - Andrew Morris Ground Truth BSidesLV 2016 - Tuscany Hotel - Aug 02, 2016
Show transcript [en]

ah hey everybody so yeah welcome back to the ground truth track uh this is andrew morris from endgame you guys already clap so that's taken care of uh i just wanted to take a moment to thank all of our sponsors who make this conference happen they're awesome you should all take some time to visit them i also want to mention this is being live streamed and recorded if you guys could turn off the ringers on your phones that would be great and also please nobody we don't have anybody back there but don't stay in the back because that's a fire line so we got to keep people out of the back okay without further ado i will turn it

over thanks all right let's get this party started so this is flying the blockchain ledger for fun profit and hip hop and that title will make a lot more sense shortly my name is andrew morris before i even get started i want to go ahead and acknowledge a bunch of people that helped me make this talk happen my buddy colin did a lot of this research with me it all kind of came from us sitting in my room drinking and like oh this would be a funny thing if we did it chris stoner is one of my co-workers who helped me a bunch with a lot of the database queries because i suck at that

kind of stuff dr richard seymour esquire junior md uh one of our data one of our data scientists um and a friend of mine who helped me provided a lot of moral support and just helped help me get through a lot of the really hard stuff my co-worker bobby who probably doesn't want his last name on this um my roommate andrew who like in a fit of disaster drove me to his office because i needed a faster internet connection than my shitty third world country south carolina internet um a guy named edward iscandarov who he wrote the library the blockchain parsing library that i actually ended up relying really heavily on and checked it into github like a

month ago which is great um oh damn it i didn't do that to do there's a guy on youtube there's a guy on youtube i'm i'm the worst uh i'm gonna have to follow up with his actual name he has a series on explaining bitcoin from beginning to end it's like a five hour youtube series where he basically like reads through the bitcoin paper and eli fives everything on there um a guy named tom gebhar i don't know how to pronounce his last name but he wrote a really awesome tutorial on like how to parse the blockchain ledger um and i'll link to all this stuff kurt barnard uh my friend and roommate who helped me again with uh making a lot of

this stuff happen getting through some of the problems i ran into satoshi nakamoto whoever the hell he is this bitcoin is genius and obviously none of this would happen without him inventing it or them or whatever the bitcoin documentation the developer documentation is exceptional the two guys that got in a fight two two shmuchans ago that led to me making the worst joke of all time and end game the company that i work out for just being supportive of the research and letting me do it and with that let's get started so um i tweeted a link just now my twitter is andrew underscore underscore underscore morris i tweeted a pastebin link that's going to have all of the

hashes and links and queries that i'm going to be referencing here just in case you guys want to actually follow along thank you amanda for making that recommendation this is the actual layout of the talk that i'm going to give today a little bit about me my name is andrew morris this is my twitter handle i work on the r d team at end game my background is in offensive computer stuff i was a pentester for a long time did a lot of red teamwork and stuff like that i've been doing computer stuff for a really long time my life considered and it's surprising how bad i still am at it i dropped out of high

school a couple years ago and i don't really feel like doing that anymore and my favorite things are computers music and tweeting stupid stuff um just to kind of like get it completely out there today i'm just talking on behalf of myself this is my own recreational research i'm not talking on behalf of my employer i'm just talking about technical observations um endgame doesn't do anything with blockchain or bitcoin so again this is just me um i am not a bitcoin expert which is i guess a little weird because i'm playing one right now but bitcoin and blockchain are massively massively massively complicated um i've read a lot about it and i understand it a decent

amount but it's still very very complex um you're probably not going to walk away from this talk with a really intricate understanding of blockchain you're going to have a better understanding of it but i really highly recommend that you actually read about it yourselves and hopefully that'll um i'll facilitate that happening and please let me know if you see anything that's inaccurate you can just shout it at me right now i don't care don't worry about embarrassing me or anything like that um so just to give this a little bit of um a little bit of framing this guy um martin shkreli tweeted on august 26th or on august 26 2015 he bought the wu-tang

clan the exclusive wu-tang clan album that they had been secretly recording for two million dollars he bought it exclusively so he bought it he got a physical copy of the record he bought it in secret at the time and it later came out that he was the one who purchased it um in february 11 2016 the same guy tried to buy the new kanye west album the life of pablo for 10 million on twitter by tweeting at kanye west and writing these letters to kanye trying to buy this um and then on february 14 2015 martin trelli tweets who the [ __ ] has my 15 million dollars i need my [ __ ] money back oh i need

my money back this isn't a [ __ ] joke wtf somebody named dequan said he was kanye's boy and i signed the deal to buy paul bo and sent the bitcoin call the police this is [ __ ] and i saw this tweet at the time and i and i saw and i saw this and i know just enough about bitcoin to know like scheme hands went on i can probably figure this thing out so then i had the question um is it possible to actually find this transaction and figure out like find it on the ledger right 15 million dollars is a lot of money it's a non-trivial amount of money and i knew just enough at the time about

bitcoin to know that all transactions are broadcast everywhere um so the thought is why don't i replicate and the again the ledger's unencrypted it's everywhere i knew just enough to know that it was maybe possible um so i'm going to replicate the entirety of the blockchain ledger and i'm going to search it somehow and i'm going to find this transaction i'm going to see if i could locate this specific transaction in the ledger so the query is basically going to look like given a date range find any transactions that fell under a certain dollar amount at the time so my approach is actually pull apart the blockchain like get the ledger get the actual blockchain ledger

parse out all of the raw bitcoin transactions put it into a format kind of like this um shove the data into a database that i can actually ask questions of write some queries and figure it out from there so before i go any further oh wait and hang on this is actually like pseudo query language of like what i actually am trying to do to locate this thing um so raise your hand if you're familiar with bitcoin if you've heard of bitcoin before okay all right great so most people have heard of bitcoin so just to give like a quick overview and i mean like really quick um bitcoin is a cryptocurrency uh it's kind of the first cryptocurrency

as far as i know at least the first of its kind it's peer-to-peer so there is no trusted third party which means there's no there's no paypal there's no venmo there's no one brokering the transactions everything actually happens from one person to another person it's distributed it's kind of like cash except on the internet whereas if you have 100 in your wallet and somebody runs up to you and they punch you in the face and they take your wallet you can't go to the us mint and ask them to refund your hundred dollars that you got stolen just the same way that if somebody takes your bitcoin private keys or if they spend your bitcoin you can't unspend the bitcoin it

is cryptographically infeasible for a number of different reasons um the ledger is distributed amongst a network of people who want to be a part of the network so that means that the actual the itemized ledger of all transactions is held by everyone um it uses something called proof of work to prevent double spending which basically just means that you can't spend a coin that's already been spent unless you put in the actual computational like they call them votes but unless you crunch the cpu and you actually brute force the proof of work in order to do so and a proof of work is basically something that is hard to get the first time but easy to verify

so basically given a hash or something like that like it's easy for me to say oh this this thing this proof of work is good once i already know the answer but it's hard to be the first one to actually achieve the proof of work i'll go into that a little bit um everything's auditable back to the genesis block which is the first ever blockchain block that ever happened all the way back to the very beginning coins are actually mined with cpu so you actually cryptographically bring them into existence the same way kind of like how you mine gold you put in work you put in cpu power and you actually are rewarded with coins um

there's actually also a non-turing complete bitcoin scripting language as a part of it which as if it couldn't get any more complicated there's a programming language that goes with it it takes about 15 new blocks are mined every 15-ish minutes which basically means that um it takes 15 minutes to validate a transaction and if you want to know more about it i really highly recommend that you read the paper so when you send somebody bitcoin you're you're you are um signing a coin and you're letting them know like hey you're basically saying now you are the one who is going to be able to sign you are the one who is going to be able to spend this coin the

transaction is broadcast everywhere and everyone knows everything that happens and the network regulates itself with something called difficulty which basically just allows the blockchain network to continue to replicate every 15 minutes when there's more people on the network it's harder to mine bitcoin when there are less people in the network it's easier to mine bitcoin and the dollar value fluctuates because that's how economics works is bitcoin anonymous kind of it's anonymous in that you don't know who's making transactions but everyone knows that every transaction takes place so in some ways it's more anonymous than cash in other ways it's less anonymous than cash um it can be tumbled to be harder to track and it basically depends on the opsec of the

person when you're talking about spending bitcoin wallets can be created offline which basically just means um you can create you can send bitcoin to a wallet without actually having to talk to the you can receive bitcoin without actually having to talk to the internet which is weird but you can't spend it bitcoin is actually very secure it uses known secure cryptographic protocols it uses things that are kind of like widely accepted as being secure in the crypto world um attacks are possible there's something called like a 51 attack which is where half of the people on the blockchain ledgers half of the members collude together to change something all at the same time it's very very unlikely um and

most of the actual attacks on bitcoin involve just having shitty operational security so talking about the actual blockchain itself the blockchain ledger is a list of every transaction that's ever happened it's a giant linked list that constantly grows in size and the longest chain the longest trusted chain the longest chain is the one that's trusted which is basically the widely acknowledged the cryptographically verified blockchain ledger the one the one that everyone agrees on that's the longest is the one that's that's trusted um everything is hashed on hashed on hash so you can't change anything that's ever happened in the past without affecting everything moving forward which basically means once something is on the ledger as soon as another block

gets on top of it you can never change the thing that happened unless 51 of people agree to roll back um the blockchain is made up of blocks as you could probably imagine blocks are made up of transactions transactions are made up of inputs and outputs an output is a wallet sending coins to somebody else it's basically you saying this person is going to be able to spend these coins at some point and an input is somebody acknowledging that referencing your transaction then making another output to somewhere else this is relevant at some point so if you want to actually access the blockchain this is like the plebian way to do it and i say that like kind of

joking but you can go to blockchain explorers like blockchain.info stuff like that which are ways that you can click through the ledger and actually go and browse wallets browse transactions stuff like that there's lots of them there's even an offline one that you can get that actually parses the ledger yourself i'll get on that a little bit this is what it looks like um it's not terribly clear here but the height basically means like which block it is in order the first block was block number one it had a height of one and then as it increases so on and so forth the age self-explanatory transactions is the amount of transactions that take place inside of that given block total

spent is the usd value of all of those transactions added up relayed by basically means like this is the this is the person this is the um group or this is who advertised the advertiser of whoever mined that block whoever brought it into existence and then the size is just the data size of the block itself uh you can look at bitcoin addresses which is bitcoin wallets um you can look at given transactions on websites like these blockchain explorers but unfortunately you can't actually do what i needed to do my use case which is basically i need to know all transactions that happen in this value at this given time there's no way to ask that of

blockchain explorers that currently exist there's no api exposed for that so this this again this is a reminder this is the kind of query that i need to do ish but there's no way to do that on blockchain explorers because they don't expose the data to you in that way there's some potential cheat codes that you could use like some shortcuts web btc.com actually has an entire dump of their database it's like once every day and they have the previous four days that you can download or you can generate it yourself with ab which i think stands for a blockchain explorer you can actually also download a docker file that does the entire thing for you

because it's a little bit of a pain in the ass to get running um somebody dockerized it and it's mega mega easy to get running i did it um but when you're parsing out the entire blockchain larger it ended up just being a shitload of data and i was like i don't necessarily want to do that so let's talk about actually ripping your apart itself so the ledger is made up of dat files uh there's currently there's about 600 about give or take 600 128 megabyte dat files um they're named as such each dat file that you get when you install like bitcoin core and you replicate the ledger each dat file is a serialized

binary blob each dat file contains blocks each block contains transactions and each transaction contains inputs and outputs the data structure is complex only if you don't know anything about data structures which i did not so this is like my military grade powerpoint um skills of like actually visualizing to you what the ledger looks like you've got the block header which is the header for the it's just some metadata about the block i'll talk about what's in each then you've got this big transaction thing an output section in an input section and then itemized inputs and outputs here and they there's blocks on blocks on blocks on blocks so if you actually want to get the

ledger the good way the right way to do it is to install bitcoin core or install like a bitcoin mining client and let it sit there and replicate the ledger from the blockchain from the bitcoin network so it means actually like pulling in um like actually building the ledger from all of the other members of the network you need to have about 100 gigs allocated for this currently at this very point in time the ledger's about 80 gigs total the easy way is i'll just upload a torrent file of all this if any of you guys actually want it just pay attention i'll tweet about it or something like that um it'll be a lot faster so this is what it

actually looks like this is a hex number it's like i don't i don't know what i don't know what any of this is but the answers are in there somewhere so i set out to actually try and learn i tried to actually parse it out myself and this is me and like them like mapping out the actual like bite structures and it was really really hard and i was basically it was just like this is so much i don't necessarily know that i'm going to be able to do this um so i started writing my own parser via tom gubharr's guide and then i realized that i suck at programming and so i searched github

really really hard and i found this guy wrote a blockchain parsing library called pi blockchain like literally a month before i started searching for this so i was like ah all right that works but these two other people wrote like this this guy um wrote an ebook and he published the code block tools it's a really good reference and then znort 987 wrote a like c plus compiled blockchain parsing tool that is good but is a little dated because it at this point it was built for when the ledger was like one gigabyte and now the ledger's like 80 so it doesn't work super well so the stuff that i actually need i need for the sake of my use case

which is figuring out where this tweet happened or where the transaction happened that the tweet is referencing i would ideally like a transaction id i want the payer wallet receiver wallet when the transaction happened how much bitcoin was moved and the equivalent u.s dollar value of how much was moved there are a lot of problems with this because of the way bitcoin transactions work there's inputs and outputs so there is no straightforward like um there's no uh given transaction it's not like when you hand somebody twenty dollars and you lose twenty dollars and they gain twenty 20 that's not how it works there's these complex sets of inputs and outputs so there's like a form of almost a kind of state that

you have to build where everything is referencing something that's previously happened so you have to have everything or you can't do it at all which i realized the hard way there's another thing uh the notion of change in blockchain in bitcoin so when if i have a bitcoin wallet i think actually i talk about this at some point so i'll come back to it the payer wallet is not explicitly stated in the ledger there's no there's no payer wallet there's no payer public key you have to derive it yourself so that was another problem that i ran into um the exchange rate there's no place in the blockchain ledger where it says this is how much

usd usd bitcoin was worth at this time that's a that's a piece of metadata that is a that is an economic thing so i am going to have to introduce that myself um big data whatever it was it's 80 gigs and it was it's a lot it's more than i could do in a text editor with a set of bash scripts and then transaction patterns this dives into why parsing addresses from the blockchain is hard because there are multiple different bitcoin clients uh have different implementations at different point in time of how to specify the address that's happening so some early on in version one would say i'm going to push this data to the stack

and then i'm going to verify it later and it's like three op codes or something like that and then later it's upgraded so there's a number of different patterns and it sucks it's a huge pain in the ass to actually parse out um the transaction id getting this is actually really easy all you do is you just shot 256 the data itself of the transaction done all right that one is actually very very easy bam take take the transact oh [ __ ] this is wrong don't don't look i need to fix that [ __ ] um sorry guys uh time this is actually also really easy the in the block header there's actually up here in the block header

there's an epic time stamp that is um when it's the time stamp of when exactly the block started to be mined the receiver wallet is a little bit of a pain in the ass it's basically this is the thing i was talking about where there's different patterns it's in the script payload which basically is here somewhere in the output and basically what it is is it's bitcoin telling this bitcoin script engine that it's it's pushing some it's executing some op codes in a way to verify um a public key and the res there's again like i said there's lots of different patterns uh there's a basically almost every transaction that i found fell under about six different patterns

fortunately tom gebhard the guy that i talked about really broke down i think all six of those patterns um version one of bitcoin multi-signature transactions stuff like that um so i ended up actually using pi blockchain which was built which was developed by um edward the guy that i was referring to earlier he had not yet implemented all of the patterns he'd only implemented one and so to get good coverage i actually had to go through and implement two other ones i did end up missing data about a fraction of a percent of all transactions ever by not writing parsers for every single address pattern format but whatever um i didn't really need to and then the change

problem that i was talking about before so if i have a bitcoin wallet with 20 bitcoin and i want to send you i don't remember writing that send five bitcoin to kanye west what i'm actually doing is i'm i'm going to sign an output to you of five bitcoin and then i'm gonna sign one to myself of 15 bitcoin so if you're just looking at outputs then this may look like two different transactions where someone is getting five bitcoin and somebody's getting 15 bitcoin so you have to actually build the state to figure out where the coins came from which is non-trivial you need the entire chain you need all the inputs to actually solve this um

and it's this thing exactly like a wallet that has a hundred million dollars worth of bitcoin like if i buy a coffee for five dollars on the ledger if you're parsing it without looking at the state it's going to look like i'm sending somebody 999 million dollars in somebody else five dollars so you have to have the state to really understand where it's coming from and where it's going to um so the sender is the public key that signed the bitcoin before before you get it so basically when i if i send you a bitcoin what i'm doing is i'm signing the transaction with your public key authorizing you then to be the one who

spends it technically the wallet doesn't actually exist it is a a way to the wallet id is just a way to represent a public key it's a different data format of the public key and it's basically like the public key chopped up hashed hashed moved around a little bit and then base 58 not base64 because that makes too much sense base 58 um which basically just does like it's a representation of like it's the same thing as basic c4 except without all the punctuation um and my buddy so like you need to link everything together to make anything right sense and my my buddy and co-worker wrote this giant big old bastard of a query that

actually makes all of those connections i'll get to that um so i was trying to actually build all of these things together and like make all of these different connections but i didn't actually have enough ram on my machine to do all this so i overnighted 32 gigabytes of ram into my desktop and i still didn't have enough so that was the stupidest like 200 i've ever spent in my entire life so i'm like freaking out because i have to link all these inputs and outputs together and i have a really shitty internet connection at home and i don't have enough ram in my desktop to do it so my roommate both of my roommates actually saved the

day and my roommate like couple days ago drove me drove me to his office where they have gigabit fiber and i uploaded the 100 gig data set of parsing stuff out spun up this ridiculously beefy aws server that cost me like five dollars an hour and so i was like oh jesus christ i gotta really make sure this thing goes down um made the giant frank inquiry and then like ripped everything back out so the actual amount of bitcoin i'll come back to that stuff and then the amount of bitcoin is in satoshi's that's actually pulled from here in the outputs this is the amount of bitcoin in a transaction and it comes in the form of

satoshi's which there are a hundred million satoshi's in a bitcoin that's the colloquial term named after the developer of bitcoin then comes the problem of the historical usd exchange rate so like i said the dollar value of a transaction at a given time is not in the ledger it is not there's no field that says hey this you know bitcoin's worth dollars this day the way it actually works is the economy and currency trades actually decide how much money a bitcoin is worth at a given day so i had to build a lookup table this is really actually probably the easiest part of this whole thing but i had to build a lookup table where given

a epic timestamp and an amount of bitcoin look up how much bitcoin was was worth at that time convert it and return me a dollar value so i downloaded um all of the historical bitcoin pricing data from blockchain.info this link which i have in the pastebin thing that i set up and i wrote just a function that basically took given an amount of satoshis it actually converted that to us dollar amount of satoshi's in an epic unix epic time stamp and it returned however much usd um the transaction would have been at the time and that was everything so i then had everything that i needed and i parsed out kind of this this is just the

outputs um there's also the inputs but uh um then my parser source code is going to be here it's not there yet it's private right now so i have to unprivate it i'll probably do that today it's really really simple because i offloaded the majority of the work to the guy edwards library that i told you about so it's like i mean it's like i don't know like 100 lines of code it's not a lot but i did have to do some work on adding code to his library to make it parse the other address formats it requires python 3.5 it will not work with python 3.4 it will definitely not work with python 27 and i tried to multi-thread it but i

stuck it programming so that's not done yet and the code sucks sorry so then it was actually like getting it into a database because i need i then now that i have this i need to get it into a place that i can actually query it and it's actually split apart so i chose to use a really hipster database technology called click house developed by yandex the russian social media company internet provider search engine search engine that's right that's i knew it um and why well it's because one of my co-workers recommended it to me and that's basically the end of that um it allows views which is really cool i don't know if this is a common thing in

databases but a view is where you build you have a query and then it basically saves that query or it treats that query as a separate table the output of that query so i used the views to actually link together the inputs and the outputs to create the state that was the thing that like ate all of the ram in my machine click house is really really fast it does everything in ram but if the query that you're doing is not going to fit in however much ram you have on your machine then it's just straight up not going to execute it so you're just like no [ __ ] okay and it writes everything to

disk which is which is nice so i ended up with after parsing the entire blockchain ledger i ended up with 98 gigabytes of plain text csv it took about 13 hours to parse on my like i7 or something took about 15 minutes to etl one time into the database um and it's i just use this basically just read it shove it into the database this is where i linking the outputs and the inputs is i'll show you the big nasty query that i ran into but that is one sql query so that's just one giant big ugly query but i end up with the transaction id the payer wallet the payee wallet the amount of satoshi's us

dollar epic timestamp and the given date um this is the query i'm just like oh please don't ask me to explain anything about this because i cannot um so once i actually got everything shoved into the database this is what it ended up looking like i have outputs i have inputs outputs inputs formatted for me all nice and then when i like mash everything together in the view i have something like this so then i started actually asking questions now that i had this nice parsed out blockchain ledger and i wanted to know okay how many transactions have happened on the ledger over one million dollars a shitload sixty thousand transactions that were over one million dollars at the time

how many transactions have happened over ten million dollars four hundred and fourteen how many transactions have happened over a hundred million dollars seven transactions that at the time were over a hundred million dollars which is ridiculous so then i asked like well what was the biggest one so what i actually did is i just said like show me all the biggest ones ever and then just graph it out for me the biggest one ever was 127 million dollars on november 22nd 2013. somebody actually wrote an article about it i don't have the link but you can just google the date um and they were like i actually think they wrote like quote somebody just moved a shitload of money

in bitcoin what a title um and so then i was like well how many transactions have ever happened 140 million bitcoin transactions have ever happened um how many un unclaimed bitcoin transactions have ever happened which means what's the largest what's the i'm sorry let me rephrase that what is the largest us dollar transaction that's happened that has not been spent that's just been sitting there waiting to be claimed there is a 34 million dollar bitcoin transaction that happened two years ago and it's just sitting there it's just sitting some whoever has the private keys for it has not spent it in two years and it's just hanging out and at the time it's now worth i think it's now worth 34

million i don't know how much it was at the time but that's the biggest wallet that's been moving around what are the days when the most u.s dollars have been traded in bitcoin so i don't have a very good like way of actually looking at this but i will read to you this was 27 trillion dollars on uh january 24th 2016 and leading that was 11 trillion dollars on uh january 23rd the day before um and then yeah there's some other ones so any any kind of queries like this um a lot a lot a lot of money moved um i didn't actually finish the slide i'm sorry um but i can also ask how much like how

much money if given like a public bitcoin wallet you can do this on blockchain.info anyway so this is a little not as exciting but how many how much money how much usd has been moved to wikileaks 125 thousand dollars in donations same thing pirate bay three thousand dollars in donations um so now we can actually ask questions like what's the average amount of transactions per month what is the most what is the reason what is the average amount of u.s dollars moved in a given day what is the most usd lost by somebody this is the saddest query of all time and i haven't actually read it but it's who bought the most bitcoin for

the most money and sold it for the least money and i'm just like i don't even know that i can bring myself to do that because i would just i would feel sad for them and using this kind of stuff you can find automated trades if you see basically like anything that happens of a certain amount in a certain like in a fixed time period you can find stuff like that and you can find bitcoin tumblers which is just people trying to hide where they got their bitcoin from something like that and of course most importantly who the [ __ ] has my 15 million dollars so i asked my system show me transactions that took place in

february and give me the biggest ones and i did not find a 15 million dollar transaction the closest that i found was a transaction that at the time was worth about eight million dollars it was four days before said tweet um it looks like it may have gone into a tumbler the day that it was bought but i don't know it jumps around a lot and is systematically cut in half a lot i highly recommend you pull this up at some point and look it up because you can see it for yourself it exhibits some strange behavior but it may be a tumblr it may be some legitimate behavior that i don't know about so i don't know that

it went into a tumblr i just it it's moving around a lot in a programmatic way i don't know this is where it actually came from i cannot confirm that it's going into a tumbler i cannot confirm that that's the transaction that he was referencing i have no idea but there was no 15 million transaction this was the closest thing i pulled it up here you can look at it this is a misnomer it says 13 million dollars but blockchain.info calculates what the value would be today the value at the time was about eight million dollars so then of course the question it's obviously it would be logical to ask well what happens if the transaction was

split into multiple smaller transactions what happened if there were 15 1 million dollar transactions that happened to the same wallet this would trigger this would avoid triggering the analytic that i just built um so i built a query show me the top 50 wallets that received the most money and i ended up oh this is nasty sorry the formatting got screwed up um i ended up with a lot of things that were really big but when i actually ended up digging in and investigating them all of these wallets have existed for quite some time and they have a high cash flow a high like bitcoin flow so they were getting bitcoin moving bitcoin very quickly but

i did find one specific transaction that happened two days after the tweet that was about it was about i think 14 million dollars at the time but that was after the tweet so i don't know but it did receive yeah this i'll have to actually go back and look at that um so um then kind of came to how do i make this thing better so some of the things left for me to do on this system is i actually want to open up like maybe a public website to allow anybody to ask any of these queries that they want maybe build a front end for it i don't know the right way to do it in terms of architecture is

not to do a one-time etl of the data and shove it in the right way to do it is to have a blockchain miner something that sits on the network that just flows into the database i really kind of missed out using the graph databases for this which would have made a ton of sense but i um i did that wasn't how i ended up doing it it's something that's kind of left on the list but graph databases would help me find relationships better and stuff like that and uh my parser sucks so i got to make it a little bit better but that's because i suck at writing code um it would be really cool to watch

everybody like watch what happens to coins when they get donated somebody because that's nice because you actually get a label at some point for the data you know that coins that go to wikileaks donation link are then held by wikileaks or whatever um it would be cool to write some analytics signatures to figure out when things go into known tumblers it's possible to signature tumblers because tumblers are programmatic they're software some are better than others some tumblers have like jitter so some will move transact move money on like a on a uh not on a fixed schedule but instead they'll move them around kind of erratically to look like a person i need to build an api for this because right

now everything's just straight up database queries and an alerting engine would be kind of cool like have like uh let me like send me a text when the when this much money moves this place or when this wallet does this thing or whatever um there's a number of different use cases for this kind of thing i mean i feel like maybe law enforcement or intelligence community people investigation investigative purposes i don't know fintech kind of stuff like just having a good grasp on um the amount of money that's actually moving every day how it's changing how it can be correlated with other stuff so with investments stuff like that i feel like hedge funds and currency exchanges

would have a lot of benefit with this data and then there's some this is stretch but anti-money laundering use cases for this um and then there's the evil ones like i can find well i guess that's not that evil but i can like find people like i can try to identify people making evil making purchases on the dark net which is a little bit harder than it sounds um i can use it as a targeting platform to find rich people and steal their money or i could violate people's privacy and that doesn't really make a lot of sense here um so to conclude i guess what i did is um the blockchain's cool it's really cool

the way the data structure the way the data is structured is it's really good for validating integrity and making sure that everything's rock solid but there's no way to ask it the kinds of questions that i wanted to ask it so i built basically just a little teeny you could even call it a search engine that just allows you to ask more intelligent questions of the blockchain to get more intelligent responses i did this based on a post that i saw to try to figure out if i could identify something that happened on a post or a tweet and tried to correlate it to some ground truth that happened on the on the ledger itself and i identified a

number of transactions i have no idea if any of those are him no clue but i do know to answer the question if anyone moved 15 million dollars at that given point in time i did not observe it i did not see one transaction or a number of transactions to a single wallet that were 15 million dollars at the time is my time range off i don't know there's a lot of details that i'm missing but i do know in that period of time i did not actually see a 15 million dollar transaction i did not observe it i don't feel like that i'm de-anonymizing blockchain somebody asked me about that um a week ago aren't you

worried about de-anonymizing blockchain not really just because if you think about it google was de-anonymizing if you use that logic google was de-anonymizing the internet when they built their search engine the data is already there so i don't really feel like i'm screwing with anybody's anonymity there's probably other people doing this a lot better than me i have no idea but it's good for people to kind of think about these kinds of things when they write about it or when they make transactions it is important to know that any transaction that you make on the blockchain ledger is going to be recorded forever and everyone can see it it's harder to figure out who owns a

wallet but it is certainly possible that everyone that people are going to write analytics like this for for whatever purpose so this database allows us to ask any question of the blockchain it allows us to figure out who has the most ends point bitcoin where is the bitcoin app who has the most money how much money was spent in a day what is the average money it allows you to correlate with stuff like the stock market allows you to correlate with stuff like current events or any other economic issues and uh that is basically it um oh crap anybody have any questions yes

uh two things for the time stamps first of all how detailed how in depth does it go nanosecond does it i'm just curious so the question is do i repeat the question or do people have the question i don't have to repeat the question um so the timestamp is in unix epic timestamp which is seconds from when the block was mined that the transaction took place on so how precise i would give it a margin of error of 15 minutes because that's the average time period so it's harder to correlate exactly when it happened but it's you basically have about a 15 minute window of being right wrong and that's just what i because that's the

average time that it takes to have a new block oh it's 10 [ __ ] it's 10 minutes thank you um and you said you had another question yeah um does it does time zones like east like have anything to do with it so time zones do not have anything to do with it because the way the unix epic time stamp works is it's only the number of seconds that have taken place since july first 1970. okay so oh but i guess what did i just say july january i'm sorry january 1st 1970 but i don't know what time's up gm gm okay yeah gmt so second it allows some variants it allows in the variance as long as it's not too

far okay for uh when you start mining like on the blockchain or with the uh if you were to do the transactions too okay okay cool thank you cool thank you yeah no problem uh any other questions yes so when you were showing the hex dump is it is it like a like you know like looking at a packet header where you know these particular parts are addressed these particular okay so that's always static and you can power that out could you then if you wanted to track a specific person's transactions socially engineer them by just saying you know hey i heard about you losing your 15 mil and wanted to give you some bitcoins and then now you've got their

address to query against them can you so you're saying like like tell him you're donating to him because you feel bad for him losing his 15 see what address you just donated to and then use that to then run against his data to say who else took from or gave to this address um so you're saying just asking him for his bitcoin wallet i want to give you some money yeah yeah you you could do that but the thing is it's the it's trivial to introduce new bitcoin wallets so it's it's you don't have to reuse and it's actually best practice to use lots of different bitcoin wallets for anonymity and emit whatever purposes okay yeah thank you

a new ad a new address for every transaction yep is the right way for to do it best practice according to experts yes based on that question there is like a known attack vector that i guess the dark net operators look out for and it's basically where somebody gives dust transactions so a few satoshi's so that you can pin that address and see where the money flows so where it ends up so there's like complicated addresses that take out like basically tainted inputs and basically let those not move out because like when you saw the inputs and outputs where it's basically by satoshi uh satoshi's like basically how many satoshis were in there because there is

people who basically spam the network with satoshi's yeah hopefully being able to track it and then um i uh yeah the base 58 was basically so you could double click the address and then be able to copy and paste it without it breaking with punctuation that makes so much sense yeah that makes so much sense i figured it was either that or something with the qr codes where like maybe q but that yeah yeah it was basically that and i i i don't know why i'm thinking this too but i might i don't remember exactly the copy and paste i do was so when you had capital o's like you didn't you had lower case and no zero so you don't

confuse them oh yeah yeah sure so it was like back in the original design but it is a pain to work with yeah then um there's this project out there called block seer that does like something similar but with graphical uh relationships yeah yeah input outputs okay and then my question was how big was the actual data set after you put it in the database so like that how big was it on disk after it was in the database yeah oh like how much did the database reduce it yeah like what is the size of like the database like yours if you were to basically focus on keeping it as an on-chain or sorry like it's basically

what's the current status of it yeah type model like how big is that i don't know um i remember it being very close to the size of the csv the actual so the database that i used didn't seem to compress it at all or do anything like that so basically um i had an 80 gig cs or i had 100 gig csv and when i loaded it in and i looked at that directory as it was loading in i have to get back to you on specifics but i remember it being about the same size as this do you remember it was it less than the full 80 gigs i don't remember all right i'd have

to yeah i'd have to go back and look i was just curious yeah yeah i know there's other stuff like i mean cassandra would reduce the crap out of it if you if you like really if you had really rigid data types and stuff like that but click house i didn't have as much experience with and also i tried to read a lot of the docs it's mostly in russian so like i would go and i'm like i like search like is there compression and i find like one google talk post of a bunch of russian i don't know

so is there anything in the blockchain protocol or whatever to stop like maybe me having two wallets and just transferring one satoshi back and forth uh there is a big there is a transfer fee okay so the way the way that works is basically um this is gonna get into a lot of other stuff but basically there's a finite amount of bitcoin that can be mined and eventually when all of those by design eventually when all of those bitcoin are mined the bitcoin network will be powered and bitcoin miners will be rewarded specifically by fees so a fee a transaction fee is extracted from every transaction and it pulls together to actually then reward the miner once

coinbases are gone and once like the uh once mining new bitcoin once all the bitcoin are in circulation so to answer your question yes there is something preventing you from doing that cool any other questions no i'm done my name's andrew thank you guys so much for coming

you