
hello hello virus coming through virus is coming through Frank too what's going on baby and now we have the table not for flipping this time I promise well maybe just just just stop stop stop that okay you've gone too far you've crossed a line that's good yeah all right so uh ready to get started ready we're ready we're going to we're going to talk on things look at him but understand me uh so if you don't know we're dc9 for9 and we [ __ ] around but not really we don't [ __ ] around fine [ __ ] it we don't [ __ ] around exactly we don't [ __ ] around when it comes to [ __ ] around so um has anyone
watched the layer one talk from a while back a couple months ago so two three four half the audience got it so uh this is the next steps of what we did after that talk um we're going to go over a couple of the same things but mostly it's going to be new and unique content uh with that being said let's get started oh uh if you don't already know I'm CP this is Adam this is Jeff B and uh let's get started next slide talk louder talk louder nice that's you that's me so does anyone not know what a Capt is good next slide all right so we didn't really just decide one day that we were going to
break recapture but that's kind of not true we we basically did just decide to break recapture one day but there's a reason for it initially there was a Twitter contest at shukan where if you had the most followers and you tweeted a certain thing then you would win an iPad or a television or something and uh these guys decided that they wanted to do that and all they had to do was break recapture to make Twitter accounts how hard could that possibly be well after a couple of days of working on it they're like you know what let's just do this later I had recently moved uh to a new state and I was kind
of bored and I said what could I do to have some fun I know I'll break recapture and about a week later I found out that they were also working on breaking recapture so we of course joined forces cuz that's what we did and we rolled out this uh this fun
Shenanigans every [ __ ] Christ so um yeah go ahead and play it do a sound check real
quick four anyone hear that Wednesday cup iron can we get some louder
sounds all right well what you would have heard was the initial version of recapture that we broke if you are interested go watch the layer one talk on YouTube um there's 58 words they're all basic words like boat and kitchen supplies like kettles and pot and table and you know just and uh the whole caption was just six words took seconds and then it was over you I mean it's is quiet no big deal next
slide too little too late well there's more there's more later okay so this is for the first version we're going to go over that and we are uh this is spectrogram what it is is the frequency versus time and the darker it is the louder it is so this is the loudest part down here at this frequency but what you'll notice is the background noise which is this was not at the same frequency range is the words so if you're trying to pick out the six words in that can you see where they are so yeah that was kind of easy to split the next couple thing rounds a little bit harder but this one super
easy to split just look right there D all right so we took all that dat and we had our individual words and what we needed to do was categorize them so that we can associate okay if the data looks like this then it's this word if a data looks like this other thing it's that word and so on and we had a few different solutions in the beginning and then someone and I just got the idea well why don't we just throw it in the neural network have that sort everything out and then uh you know we'll have all the answers so we decided to go ahead and do that uh question single layer multi layer single layer we'll get to
that so machine learning uh the neuronet the machine learning we using is supervised it's a neural network it is Sim similar to linear regression for any of you who know much about machine learning uh there's a lot of linear algebra Matrix math calculus all kinds of really cool math stuff and uh there's not really enough time in an hour to explain how neural networks work uh but that's what we use to solve it so this is how our neural network works uh you can see a nice little diagram there uh the left hand side is our inputs so we'll have how much bases are in the audio sample how much mid-range how much highs and instead of
having three bands we have 2,48 different bands and that just tells you kind of what the word sounds like but it's all done in numbers instead of you know humans listening to audio we have 1536 hidden nodes which I'll get to in a minute and then since there were 58 different words out of you know we knew that every sample is going to be one of 58 words so we have our 58 different output nodes they are correlated with one word so the first one might be red the second one's Blue Train Etc boat yeah so next slide so this might be the word red and if you and we solved all these by hand
first so that we knew okay this word is definitely red and then you look at the spectrogram and you have not a whole lot of bass bit more mid-range and quite a bit of highs the next word is blue and you can see it's a little bit different about the same amount of Base similar mid-range but not so much in the high end next word is green and you notice the highend drops way off there but everything else is pretty similar to Blue more or less so yeah and let's say we get another word and this is when we don't know what it is so this is unlabeled data and can anyone tell me what word
they think that is if it's either red blue or green black black wrong this is this is one of those three words red yeah it's closest to red so what you basically did there is you just take the difference between the black and each one of the other characters and you add it up and you see how much error rate there is so how far off is it from each of these different colors and then you pick the one that matches best that's simply how the neuronet works it just says how well does this unknown word match each one of the words that we know about and then it just picks the best one so we won't have 100% match but
we'll be close to it next next so when we're fitting this word we need to draw a line that goes through each of those points as best we can and that's then we measure against the line instead of each individual data points the reason we do that is we're going to have a whole bunch of different samples of red so we're going to have some that are a little higher pitched some have different background noises in them stuff like that so they're not all going to be identical so what we're going to have is a whole bunch of different samples of red and then we try and draw a line that fits fits the data as best
we can uh this would be one way to do it it's just a straight line and that might fit well nothing very well to be honest um so next one we could curve it a little bit that's a little bit better fit um still pretty far off on that middle column though next okay so we're getting a little closer now uh that's that's actually looking pretty good and we could just go all out and say oh well look it that line fits all of our data isn't that great and that works extremely well for the stuff that we trained it on however some new sample that's not quite perfect this is going to be I mean you just look
at that and you're like well I don't I don't think that's really a good representation of what red looks like that's that's kind of silly is the bottom line it fits really well but it's contrived it's all you know well it fits this sample well it doesn't generalize very well exactly we made it up and if we get the exact same sample again it'll match perfect but for any other red that has some backround audio in it it's not going to match very well next so those different squiggly lines are all valid options um and it's hard to know which one's best like in the samples I showed you obviously the first and the last ones
are not optimal but the middle two which one's better well it's hard to say that's where the hidden nodes come in basically each one of those hidden nodes is like a different line that we're drawing through there so they're all squiggly and they're all a little bit different but they're all similar so what we do is we use uh some linear algebra to get from the input nodes to a hidden node and then we use that same to get from the hidden nodes to the output nodes and what the output nodes are going to have is a combination of all these different squiggly lines and says well how well does that match the word red and we get uh we're getting like 90
some odd percent that we're sure that it's red and like 0.2% sure that it's blue so we just take those numbers and we say okay well the highest probability is red so we'll pick that next yeah there you go all right so for our first round that we we uh got 99.1% accuracy using the that splitting and then the machine learning solver and 99% accuracy uh a lot of them so very accurate not much more to say about this slide but it worked out really well D oh yeah and then they interesting thing is they rate limit you if you get less than 60% but after you get more than that they just let you go as fast as you want
and so an 8-second capture could be answered in half a second and it'd be all right so pretty bad Turing test we are as human yes we are as human oh a good one oh yeah this is a good one so the other thing that we noticed was there's only so many captas period there's a set number that set number was between 20 and 25 million but there's a set number and so if you just have a lookup table of md5 to answer you could solve the caption rather than 0.5 .5 seconds and so you just have to pre- solve it and then ask Google if it's correct if it's correct you save it and
so we did that for 15 million captions that came out to about a 500 Meg uh megabyte file that was just answers rate Li is important yeah and so if they had rate limited us we probably couldn't have done 15 million in a month and a half yeah and that that's 15 million unique you you we only recorded unique ones because we already had the answers for everything else so we were doing like 2 million a day between all our servers yeah oh and uh 15 million gives you 61% accuracy just using the lookup table yeah oh so yeah and that was everything we did prior to last talk and now if you saw last talk this is going to be the
interesting portion for you it's all new stuff it's not so much a prerequisite um as maybe just an experience to go and view our talk from uh the layer one we went over a lot more detail of the initial round um and we had a lot of fun doing it because I don't think we mentioned in this one but an hour and a half before our talk they patched it yeah so it's not like we were being stealthy millions of capses solved the day um it's just it's and we we've been trying to open a a line of dialogue with them since that happened and it just seems that Google and recapture guys don't want to talk to us and um can't
imagine why but once again come on let's have some drinks let's talk let's talk about this it's fun but they don't want to talk to us who's doing round two you I'm doing round two all right so um round two is the digits the digits all right let's listen to round
two all right if you could actually hear that maybe in the front good maybe not in the back it's three sections of four numbers so it'll be like 7 4 2 1 and then a pause and then another set of numbers and then another set of numbers um this is a really weird move and we didn't really expect this because they've done only digit capes before and it didn't end too well before we broke it someone else broke it because they only used numbers and here we are running back to using numbers yep so the yeah the thing they were hoping for is that they if they smatched all the words together like these three sections of four really
quick digits then we wouldn't be able to split it and so it would be really hard to split and then you since you can't split it you can't solve it but then CP had an idea expl wait explain oh all right yeah sorry once again this is spectrogram frequency by time for round two and all this stuff right here you'll notice it's not as dark as the orange right here here this is where the actual words are and so you can see the three groups of four digits each but you look in there and you can't really split the digits up too well it's pretty hard and so that's where cp's idea your idea oh
all right my idea was you take each one of these orange lines and then you try and group all the orange uh that orange spot and then this orange spot and that orange spot and sometimes they bleed together and sometimes they don't touch sometimes they do so I had like 700 lines of of code to try and do this and it was working marginally well and then CP had a much better and simple idea for like 3 days I'm saying why don't we just do this this one thing is so easy let's just try it no that'll never work that will never now I'm doubting myself that could never possibly work let's just take that chunk of audio and cut it into
four even pieces [ __ ] it so here's the [ __ ] it splitter where we take those chunks and we just cut it into four evens pieces and what I think what I've learned through this whole experience is that neural networks and machine learning in general is the closest thing to Magic that science has invented and uh back in round one we were initially trying to remove the background audio the the the fuzzy kind of background backwards radio broadcast to kind of break up your ability to you know uh Speech detex it and we're just we we're looking at how do we remove that audio that's tough and we thought why don't we just not
bother why don't we just ignore it I mean that's what humans do we don't we don't really listen to the background noise there's a hiss in a room and a general conversation in the Next Room over you're not really listening to that you're listening to us up here you are able to parse it out why can't we just do that and it turns out we could we didn't do any kind of noise removal for round one or round two orever basically so the pocket splitter never thought it would work and it turned out it did and saved us a bunch of trou really funny I think so there is some bleed over from one digit to another but the neural
network still works same as before didn't change any code like the actual code is identical all we change it the number ofit or we didn't even change the number of hidden nodes we changed the number of uh outputs because there was a different number of words 10 instead of 50 8 and this is their improved system right brilliant so um I don't know who's supposed to do this slide is that me or you [ __ ] it I'll do it round two uh we got 63% and honestly that's because we just couldn't be bothered to work it back up to 99% cuz we didn't really change anything we used the same code and we said oh we broke it again that's
neat and uh it was live for about 28 hours and compared to the previous negative 1 and 1/2 hours I'd say that's an improvement uh and they rolled a a different version out um which we promptly broke and we if we just got more samples we could got oh yeah and and and just more samples typically means more accurate and uh we didn't really feel like solving 150,000 samples again like round one but don't worry there's some interesting improvements we'll go over later about that oh no this oh yeah so remember remember the whole md5 thing limited number of captas well in round two they stuck in ID3 tags what looks like something and it turned out that all we could figure
it was for was to keep the md5 sums of the uh MP3 files from matching so you just remove them and they match again um so really like all right next slide round three so so round two yeah this is the ridiculous one on uh round two we released uh June 30th and then round three we released July 4th so the reason that is is because round three is actually round two but this is the version that they switched to right before our talk uh our last one and so then right before we broke this ver we were about to release this version they switched versions digit to the digit one and so then we we looked at that and we're like
no they couldn't have done that so we broke that real quick forcing them to go back to this version which we had a break for already then we released that so yeah this one was on the fourth and it was live till the 7th so yeah again an improvement yes this time it was for 3 days and 5 hours or something like that um 59 [Laughter]
words kill it kill it kill it kill it I'm done oh God I can't listen to that anymore that will send you into the mad house all you got to do is validate a couple samples just validate a couple samples couple thousand a couple thousand samples I'm I'm actually thinking about trying to start a study to see how fast we can drive a person insane by forcing them to listen to recaption and that's great that's not even the current version just wait oh all right well that was a dang so now you'll see the spectrogram can't even see where the words are supposed to be all right you can a little bit but not nearly as easy
as round two or round one obviously so then we're looking at this we looked at it for a while tried a couple different things and then we noticed hey what about down here and so just the low end of the spectrogram D so yeah look at the low end of the spectrogram can anyone see where the words are on this one the orange Parts uh yeah actually this one we had to solve uh we I think we trained before they switched it like 600 and then we did another 800 captas and so this one ended up with we saw solved it with about 1,500 samples yeah with this uh version that was really annoying and you just heard a
second ago uh humans were about able to get about 30% uh kind of industry standard is humans should be able to solve 70% for a decent capture system if they can get better then that's cool uh and it should also stop uh Bots from getting it at at least less than 1% the Bots should be able to solve so those are kind of metrics that are kind of industry standard across all capture systems theirs did not live up to that and we still beat it so anywh who so this is the spectrogram with a high pass and low pass filter to filter out everything except for 80 Hertz to 160 HZ and so then you get a
picture that looks like this and you can just easily parse this by looking for the orange sections and that's where the words are so wasn't per yeah very easy wasn't perfect but it still beat it good enough oh Adam so after we split it neuron Network same [ __ ] different day nothing like we changed number output nodes again and that was our code changes for the neuronet like he said black magic so this one went live uh July 4th about midday our test run we did about 1500 got 911 that's 59.5% accuracy which what sure I don't know the reference but uh but uh yeah 60% accuracy better than uh humans could have done and once again
we were just trying to release quick we probably could have done more captures if we didn't go insane and get back up to the 90 some odd percent accuracy we would have same thing as round two it's not that 59.5% was the best we could do it's just we got bored we're like yeah we'll just [ __ ] it just get it out there and but we'll just wait for the for them to fix it and we'll just [ __ ] do it again days we were also kind of hoping they didn't have another capture system on on standby cuz like they did for round two they had one just sitting there for round three we were hoping
they didn't oh yeah this this happened again we released the code that says and you know a nice comment of remove the audi3 tag and it's LS again and they didn't fix it for round three oh yeah so I mean go all right so round four the current one yes please yes play listen to this one this one's
great
so yeah so we're coming up with the number of words right now it's question mark and it no one knows what they are I can there's one that's like black or block or blle
oh oh all right down here so let's just put it this way after the last couple of months of us three doing this if there's anyone in the world besides spammers and crackers and I'm making money doing that stuff if there's anyone in the world that could solve this by hand is the three of us and we can't it's so completely utterly B the joke so far internally has been that we defeated it they broke
it so yeah round four current version 6 to 12 words between 16 and 36 seconds long and I tried 30 of them and I've already between the three of us we've done 60,000 Capas the old on the old ones oh well between the all three systems we've done 60,000 of them we got I got zero out of 30 I loaded you get you have a 30 minute time limit for each one I loaded up in audacity and sat there for like 10 minutes just replaying it and I'm like all right that is it Google told me no it is not so yeah it's it's pretty hard dang oh yeah so this is kind of funny as well uh this is the low
Spectrum split splitter again it's 80 to 160 or no this one's 0 to 200 Herz I changed it a little bit and so you'll look at it they must have looked at our code and thought oh they're looking for the orange spots so we'll add more orange spots that way they'll never get it D unless of course you do a socks noise reduce in which case so socks has this great little ability to remove noise such as all the stuff they try add in one command yeah one command that runs in about a quarter of a second so that's a word that's a word right there's one it it goes on I I had to actually crop this picture it
goes on for 36 seconds but it's pretty easy to see where all the words are some things like this this is a compound word so it could be like bookshelf and then each one of those parts but you can look for how close the words are and combine them into one so it's not very hard the amazing thing is even though we know where the words are and we can reduce the noise we still can't understand what the actual words are even when we're
cheating oh so if anybody knows how to solve these and can give us somewh oh we have a [Music] volunteer
[Music]
the
can we can we get a microphone over here I mean we might as well come on up Y come on and uh This is how we do things you you pour her a drink please Dr so repeat everything you just said and welcome okay all right so um I used to do Linguistics uh in grad school um and the guy that was the professor for the professor I took phonetics from is a guy named Peter LED uh l a d e f o g e d um he he would do stuff like you know be an expert witness at trials uh where he would have to identify whether the guy on the stand uh was the same person as the guy whose
voice was on this wiretap by comparing their uh by by comparing the the you know the spectrograms of of both voices he also could read spectrograms like they were English because as it turns out um the what you see on that spectrogram for every every phon in a word there's essentially a a ratio um between uh sort of where the where the formats the the bright lines fall and so when you learn to recognize those ratios um you can pick out individual phonemes and I think you could really drop the size of your um feature space if you were to essentially do to sort of a rule-based principal component analysis you are 110% correct but that's
but that's more work yeah but it's work I know how to do very true so we don't I'm I I I actually I forgot to preface this talk by saying we don't know what the [ __ ] we're doing we've never we've never done audio analysis or machine learning or neural networks or any of this crap it we were bored but you're right we we we've thought about um a generalized phenome system that will just detect the phon name phenome pheromone um what's interesting about the current version um is that the background noise and the foreground noise seem to have the same amplitude and seem to been used from the same text to speech synthesizer so while you may
hear you may hear a correct word in the foreground like snowflake you might also hear [ __ ] bookshelf in the background with the same exact spe Tech synthesizers it's really hard to do that and and by all means like let's let's talk after this I think that let's break it again CU we just keep doing this yeah no I think the I I think the amplitude Dropout thing that you're doing is going to end up completely correcting for that um the the the socks noise reduction thing yeah no I was I was writing that down in my notes going I hope they're doing this and then boom next slide so um we just kind of did some other things
which we'll go over a little bit later in the talk quick bit I think we're running out of time but we're just kind of holding our horses until Google releases the next version cuz there's no way in [ __ ] hell that they're going to be able to keep this version around considering all of the complaints they've been getting their Forum their Google group on recapture is absolutely stacked from top to bottom with things like oh you guys are terrible worst [ __ ] in the world and Ada complaint and this and that and we're just like so we're just waiting for them to releas a new version and then we'll break it again but you know what I
actually hope they do one thing it's one thing we put in our last release we said unless you change a significant amount of how audio capture systems are built still Walker is going to steamroll them every
time uhuh we we yeah what do we do to piss off Eric Schmidt I don't know but um I mean I I'll say it again I said it the layer one talk they got us so good releasing a new version an hour and a half before the talk that's awesome it's such a dick move it's exactly what I would have done so yeah moving on we we beat them in you know our one version 28 hours and then the next w three days so yeah anyway moving on ding oh yeah and so after we couldn't solve any of the recap Shas we couldn't we had it split and we have a solving method that'll work we just can't get
sample data so what are we going to do we got bored and so we found all this other stuff like new capture we're like oh these are terrible so yeah play please type the letters 9 z v once again 9 Z B after you heard round three and then you hear this one and we broke round three so why couldn't we break this so we did now go back I wasn't done with that one all right so uh they have they have uh some type of behavior analysis which is hilarious because those all client side so you can lie to it and they won't never know uh all the for fields are like when did you enter the text box when did you
leave you can just fill in what the normal amount is and then they will never know to give you different one but if you spam them really fast they'll start giving you different number of words which so they start off with three and they get up to a maximum of four and so it gets really hard when they have to do when you have to do four to be fair they may have done more but we answered enough so it I no uh they they may have done more but we answered them correct enough that they thought we were human I guess so maybe it would get up to seven words but it didn't for us oh and they repeat them
twice so you get two samples which makes it even easier all right now oh yeah the splitter low Spectrum splitter again this first part you see here is please type the words then word word word and then once again and then I cut all I cropped the rest of it but on the other side there would be these three words repeated again so you can pick out all the words you have to know to start one and a quar seconds in and just go until you see a bunch of words that are close together not very hard oh yeah and Adam Naro Network same [ __ ] different capture well done sir oh so our uh test
run here it's a little bit annoying to test because the hilarious part is I don't know if they're trying to protect their capture by not giving them out a lot but each site only has so many captas that they will give out so and you'd think that normal would be nice and large like Google's 25 million it's about a thousand less than that so each site if you solve a th a thousand captures from that site that's it and so they just keep giving you the repeats and so once you've solved a thousand of them or it's much less than a thousand I just say that to be generous but once you solve them all you never get a new
one and so you just keep going if you wanted to to spam a site you could spam it all you want oh yeah anyway yeah our test run was that uh demo time demo time demo we have plenty of demo oh God so I don't know if those of you who saw the layer one talk um well be considering the fact that they broke the they fixed it or broke it depending on your point of view an hour and a half before the talk our live demo failed but of course we took a a video as everyone who gives a speech or talk or rant or drinking in front of people should know always take
a video of your demo because [ __ ] gets [ __ ] up a lot please type the letters J T so here's the new capture demo once again so because they only give out so many per site including their demo page we got all from their demo page they only give out 250 and we need ones to train on so their demo page AG gave us 250 and that was it and so if if we were going to yeah I know uh if we were going to test against the demo page we trained against it and once again k z and so you can't test against the same data you trained on that's kind of
cheating so you we had to go to somewhere that wasn't their demo page to test so sorry that site you were the top at the Google list for p the SE powered by new capture so you may have a lot of comments that look similar we really should have censored that web site name there you go good now just hold your arm there during the demo well here's the best so cap so a new capture is pretty interesting they actually came out with a pretty um novel concept of how beat image captures they've got dancing red letters in an animated gif except you can just take the first frame of the gift and you have a still image and then
you can take the next frame and you have another image and you the next frame you have another image so you have you know 30 40 50 times more data than a regular capture to break we got to get going we got 20 more minutes who gives a f all right so we're going to move faster because we only got 20 minutes anyway that was new capture they only give out 250 on their demo site so we had to test against something else cuz we got 99 5% accuracy against the demo site since that's what we trained against so yeah but and they just gave us the captures we trained against anyway ding all right
oh yeah and uh like I said you can just md5 solve this because there's only like 250 to 1,000 per site and so if you wanted to attack a specific site you solve a thousand you're done Ding oh yeah and then there's more uh no uh PayPal because they they were awful it's uh 31 it's digits and letters some of oh demo time oh
listen stop doing that J N 7 n t it's five words used uh the [ __ ] it splitter which CP came up which which is awesome and then uh n Network same [ __ ] different caption anyway next slide back back all right yeah so that's the accuracy and that was our test run now we're going to do a quick demo hold this go ahead so um maybe PayPal wants to have drinks with us yeah or new caption or new caption or I don't know I mean new caption seems kind of boring d in red letters are you serious yeah so I didn't make a selenium version but this is Bash it downloads it with curl and
then submits it with curl because we're lazy uh you can actually notice each of them they get solved in less than a second which is funny because they take way longer than that to play but yeah three out of three and it's going to keep going it we ran it for 2000 and it got 95% so PayPal hooray users oh yeah we tested against their forget your username and if you just fill in a US username that doesn't exist then they'll still tell you whether you got the capture right so we didn't actually have to make new PayPal accounts every time like some other things anyway I think you're next I'm always next IM secure image this is
fantastic uh you me well we we we were you know we actually like all right so we got recapture and that's broken so we'll wait for that and we got new capture and we're like all right let's just let's just Google for audio capture systems and we found this guy and this guy is pretty funny uh this guy apparently sells to things like government agencies to have capture systems and their audio well I mean I think you can kind of guess what happened what happened so yeah it works Fair well so this one is actually an uh PHP site you can get an Apache so rather than having to solve them I just generated a bunch
of captures and we use that to train and so then use the [ __ ] splitter and T 5 K L 5 G yeah so the neuron Network tears it apart as usual same [ __ ] different capture we have a demo for this one uh once again more data would equal better accuracy but I couldn't be bothered to solve it again we uh don't have much time so we're going to skip this demo cuz we got a lot of stuff just roll through it we just roll through it we just keep going oh this one's funny this you what's me [ __ ] what did I do this time oh slash dot all right so while we
were just kind of [ __ ] around and looking at capture systems I had remembered that we were watching slash dot because like oh cool still Walker article on slash dot we're cool and Neato now and people like us because we're on slash dot so and I realized I saw that they had an audio capture system like yeah all right let's let's look at that and it's good in fact I would say slash dot has done more to increase the vocabulary of users than any other audio capture system which really isn't that big a claim of the frame play it we're going to play it Prosper PR o p r so it not just says the
word it then spells it but the vocabulary it uses is rather intelligent it's a I mean if we don't want tards talking on our Comon form like it's SLB which I love it's cool don't we didn't do nothing but I I decided that we should take a little bit of a different approach because we've already broken audio capture up down left right A BB and star select which I [ __ ] up because we always talk while drinking so we remembered that a while back I was playing around this Pearl script that takes some audio and submits it to an API that converts sound into text so it speech to text and this API just happens to be
owned by Google because it's their Chrome API so I guess if you have Chrome I've never actually used it to be fair uh you can do speech to text things like go backwards in the browser and find my porn but maybe that's unrealistic cuz I've [ __ ] never used it but we on the API we have a pearl script that submits to it we actually just download the audio from slash doot and pipe it directly to Google and Google says I think that word is banana hammock and we just take banana hammock and pipe it right back to slash Dot and it well it works the interesting part is the API actually spits back information things
like confidence and because it spits back confidence it seems like they're using a neural network so we didn't even we couldn't even be bothered to do slash doot because quite frankly I love slash doot I wouldn't want to submit spam on the forums just or spam on the comment sections just to test our [ __ ] it's the actually only thing that we didn't bother to religiously and violently poke slash dot got solved go back with 56% with we didn't even do anything we just sent it to Google and they're like uh yes that is this answer so um the API uh solver is also included in the next stilt walk salker release because as it turns out it
doesn't just work with Slash do it works with other shitty audio capture systems like this guy in fact I mistakenly spoke earlier this is the guy what yeah yeah this is the guy that apparently SES the government agency's their audio capture system and the Chrome solver gets 99.95% accuracy please enter these four numbers 6 4 0 0 6 no no background noise no nothing just really clean this guy probably recorded on his MacBook 60 five7 but you know [ __ ] Works go figure so um and obviously with four digits there's only 10,000 possible captas and md5 solver comes back into a play and that's game over again for this one we actually ran out of audio capture
systems to [ __ ] with so if anyone knows a different audio capture system talk to us and we'll get back to you like in a couple dayss couple hours maybe I mean it really comes down to oh so this is really fun yo dog we heard you liked artificial intelligence so we put some genetic algorithms in your machine learning so you can evolve while you're training we decided that at some point that we got tired of solving thousands and thousands and thousands of captas and we said all right let's let's take a step back let's look at the whole picture what are we playing with all right all right [ __ ] that I am tired to
hell of listening to capture systems let's just create a genetic algorithm that trains the neural network and increases its accuracy after a base set of just a few so we we solve a couple and we give it to our genetic algorithm trainer and It Go goes oh I'm going to train a neural network there it is it's higher accuracy than the previous revision and that just keeps Trucking along day after day next slide will explain more that's you so we got this flowchart that it's probably hard to read for the people in the back but start here just continue until you've got great accuracy start at the all right next slide we'll go later all right no
real really what you got to do is you have to solve a minimal amount and for our experim expent we decided all right we're going to solve one of each letter and number for PayPal so PayPal was our this is what we want to be with the the automatic strainer we solved one letter and one number from each thank you priorities so yeah uh so there's 31 they don't use the full letter and number Spectrum they only use 31 of so we had 31 audio samples of each letter and number uh sorry 31 total one of each letter and number and so that got us 12% accuracy which is hilarious that even that gets you any accuracy but it gets
12 which isn't great but it's not awful so what you got to do is you split it up into two sets there's uh because you can't train and test on uh the same sample set because if you train on it then you will obviously be able to test and get 100% on it so you can't test on yourself and that's something that's generic for all machine learning if you train on these samples you shouldn't test on them because you'll obviously be able to recognize them it's like saying let's have a Kevin cner lookalike contest and then you always pick Kevin cner for the lookalike yeah you you can't really do that first I I hang on I got
what that is a fantastic thing yeah like you said like we were looking at the data before when I had that ridiculously squiggly line if you test on your training data you're going to have a ridiculously Squiggy line and it's going to match and uh it's not going to work overall anyway so because of that we actually need to set up siblings sibling B sibling a you could actually have as many as you want but for our test we only had two so what you do is you download and split a lot of captas you don't need to solve them by hand you just need to be able to download them so you download and split them then you
have you split split that set into two sets or however many you have for sibling you have sibling a solve sibling B's captas and sibling B solve sibling A's then you use those ones combined with the previous rounds samples and you train on them and then you test against some known captas that you've already solved we had to do 25 for that and so if you get better accuracy you just replace the current uh solver with uh the pre you replace the previous one with the current one and you just keep going through this and so what you'll do is you solve a set with the opposite Theta value or Theta file which is the
neural Network's uh Source then you combine that with the parent and you take the best captas that are the most certain that these are the correct answers and then you train based on that and you come back around and test and if it's better it just keeps going so slide next slide so the genetic algorithm was a bit of a interesting story cuz I'm sitting around I'm out of my patio at my house I'm smoking a cigarette I'm thinking so many times during the process of this project I've we all have to take a step back and kind of looked at everything and say we've focused too much on this let's just take a let's think about that
you know [ __ ] the background noise let's do a [ __ ] splitter stuff like that we're focused too much on something and while I'm thinking and while I'm smoking and while I'm drinking I have this little Moment of clarity that I can't exactly put into words so I instantly call Jeff Bal and we hash out this idea for doing the genetic algorithm and we know that there's so much potential here because it saves us from having to listen to all that [ __ ] for hours and hours and hours and hours and this is a tremendous diagram I actually only saw this flowchart today and it's spoton it's perfect we have the siblings oh hang
on priorities I'm very happy with how this turned out as we'll see in the next slide ding that all right so like I said before we started off each of the two siblings had just one example a single example of each letter and number so that's very few captas we had to do 25 to test so that it could keep checking its accuracy and feedback into itself but anyway start off before with one sample 12% accuracy afterwards 48% accuracy that's in the matter of 24 hours of it just training itself over and over again so we no longer have to do a 100,000 captas we only have to do maybe 35 and so that's a very big
Improvement because it drives us crazy to do them so yay for genetic algorithms oh yeah and uh we're releasing all the code go to that site it's all there 1 two three and one two 3 is already uh yeah one two3 is already there this release with all of the random captas that we just talked about we'll be up there as soon as we get time to put it up later today today today oh yeah seriously Google let's have some drinks cuz uh come on questions comments complaints [ __ ]
you speak a little bit louder I'm I'm kind of deaf right now round four so current the current round all right so what we what we didn't mention was that in the first round the background audio was uh reversed radio broadcasts oh yeah so it was like um initially audio recapture was to transcribe radio broadcasts and apparently they abandoned that and just decided to use it for background noise um but it does sound like backwards and it's backwards of the same species Tech system as they have forwards at the same amplitude so it's pretty difficult to tell apart uh we're running out of time so quick question in the back comment the Earls go back a
slide it's pretty easy it's dc99 org slprs stilt walker uh on the main site there's a side naab thing and there's HTML and um you can click on things called hyperlink links next question comments [ __ ] you cuz we're just going to Jet out of here and I love you too baby one additional comment for uh the guy over there that talked about it the backwards audio so in round three one of the hilarious things was it was just random phonetics for the background noise and so what ended up happening is those phonetics sometimes align to hilarious audio so one of the samples has de Beauty and death as the back CR noise and our our brains are so powerful
in a way that we want to find patterns and everything and if you just align random phenetics you're going to hear weird and scary things like the beauty and death quickly gole does it look like we're in it for the money why don't you buy us a drink and we'll talk I don't know I mean is there someone from blackhead here that can tell us how to make money cuz we just printing press a printing press all right anything else anything else I'm getting angry looks so I gotta go oh we can keep going excellent so more [ __ ] yous please please [ __ ] you [ __ ] you thanks iron geek iron geek if
anyone does not know does a tremendous [ __ ] job at conferences doing the videos yesterday's video already online aren't they that's a tremendous thing Round of Applause and he keeps getting better at it way better than anyone else who does this [ __ ] oh you know so um I mean we can just keep chatting who wants to come have a drink oh hey we need to find a volunteer who wants to be hit with uh smelling salts does anyone want to be hit with you Pronto yep here we go now sit down yep no right now right now we got plenty of time anyway it's in my bag right there so what are of
a one I got so the question is uh what is it that they could be doing better to you know not fail so much nothing all right so there is very little they can do to make it human understandable but machine language not if the words were long ER it might be slightly harder or if we couldn't gather as many then it might be slightly harder but that's about it audio captures need to be fundamentally changed in order to not be broken uh the only thing I could think of is if they used like palindromes or something that would be you know basically if you take a sample of it and it'll be the same frequency
forward and backward it'll be two different words we'll need to give you different words but they'll look the same if we have if we take the average uh base and you know so that would be one way to do it and we can we have some ideas on how to beat that if they did do that and CP is something to say first of all you know the word I'm about to say because I haven't said it yet um we could actually solve that with a little bit of minutia if we if we shrink the sample size we can just defeat that anyway question comment [ __ ] you just adding on to just adding on to
the discussion about like what could they be doing better so like one thing I just thought of that they could do to [ __ ] with people would be homonyms um like you know they there and there you know if they if they gave people like a sentence in context and they had to spell everything right but nobody can [ __ ] spell there there and there right anyway on the internet so you know I'm not sure how well that would actually work absolutely excellent cuz we didn't go over this we went over this the layer one slide we cut so much content because we've never given the same talk twice ever hands down that's it this is the
closest thing we've come to giving the same talk twice in the original round that we did there was a problem where how do you spell blue b l u e b e w b o BL U well they all match they all matched and the slides that we cut are my favorite slides because we spent so much time developing word merges where one answer would would would validate against multiple challenges including the word v a oh we have the old slides yeah you keep doing that yeah so there's one particular word one of the first ones I came up with was the word v a GN who could think of multiple ways to pronounce that
word all right so vain no wrong all right so like like vagan vagen vagin is pretty close to Wagon but also in the English language we have this wonderful thing called a silent G when it's in front of an N it looks like van so the word that we spelled v a GN matched both wagon and van so we Shrunk the other one's great keep going keep going keep going this is the layer one slide so we're just going to Thor I am back go back one yeah no go back one it's it's the previous one one more where's the where's the uh Branch off slide I am sm smrt I mean so this is round one this is
what we found so you could combine words like fork and four to fork and they would that word would match both fork and Fork so we then had a solver that would say um if I'm if I'm confident it's this but I'm less confident than this it would just make a merge and submit that and that would get it right so spoon and teaspoon is really interesting cuz we have TS like tsunami but also tpoon tpoon seven oven seven here's a great we started getting thers because we developed a way to automate word merge finding and we found weon and wayon matches one van and wagon and of course Friday matches Friday fairy and four so we
significantly reduced the key space of possible answers by just [ __ ] coming up with merges I couldn't believe this worked go back one slide no no no academic papers come out in PDFs we're all about flat HTML so yeah go back oh yeah so brute forceing merges that was fun but we're not going to talk about that so I am I amrt I mean so smrt Homer would have loved this cuz just who gives a [ __ ] about the a we found that sometimes vowels just don't matter in in the initial round so you could do sp and it would be that's spoon but it's also SPN spoon um and of course I found that
that's actually an animated gift that I found the day before our layer one talk and it wasn't animated and I was sad but really I mean merging the words it's almost like soundex but it's not quite cuz because we thought it was soundex but it doesn't behave like it um of course now all of this is gone they don't do any of this anymore that's why PJ that's why oh md5 um what do we what uh a holding on the line in the Raffles in the baret raffle ticket you're not if you're not present you don't win I love you baby I'm not going to repeat that on the microphone more question no pay really
you're checking your phone what we're talking it's just sador you had a question
a
okay do you want the mic too come on come on come on up come on everyone else did come on clap and and we got plenty of time apparently come on I'm I was just uh saying that the one thing that is the hardest to replace probably is the context that a human can figure out and and Loud um the context is that a machine could not figure out so the minute you would probably pose an example that's you know even without the background noise noise I whatever pose a question and the answer would be easy to give as as an answer to the capture then then that is an improvement that's very hard to
break cheers so the the question was well com all right the question SL comment or input was well why couldn't the capure just ask a logical question that a computer wouldn't be able to solve too easily but a human would first the response to that is there's Dr Watson the Jeopardy uh computer computer that won Jeopardy and also you have to have unlimited samples that's also otherwise you could get it solve it and then just record the answer but there's actually a good paper we saw on a logical capture that called eggbot eggbot posed the question yeah eggbot posed the question Fork blank food and something like eat with would be the answer and you'd have
to fill an eat with or shovel blank Hole uh so the the the answer would be shovel Mak hole and so you'd have to fill in makes and so eggbot was this type of logical capture that you're talking about what they actually found is that with uh 35% accuracy they could search Google use the phrases found by Google and that would be 35% accuracy uh this wasn't us this was another paper but the other finding which is much more hilarious was that if you use the word make because make can mean pretty much anything you'd get 96.5% accuracy you answer every capture with the word make and you're almost always right and so that was a poorly
designed one but the Google one the Google solver is kind of or wolf from Al wolf Alpha they all can be used to beat logical capture does is it an improvement yes you want to talk about economics economics who wants to learn about capture e economics is anyone interested one two please just raise your hand was it keep talking the whole day all right capture economics Adam because you you read that paper I I skimmed it again this wasn't our paper but uh I read a paper on the economic sh capture breaking cuz I'm like well I mean we're not the only people doing this right I mean I've seen I've got a lot of spam so
I've seen this stuff before and turns out the cost of breaking a thousand captures is about a dollar um their accuracy is actually surprisingly good it's about like 90 some odd perc the reason that is is because they're using humans to solve it and in some cases we've actually done better than this slave labor humans so that's entertaining um what else about economics macroeconomics [Music] how like death Capt if you're a solver how much you make oh they by the way the okay some of them have their own like you know private you know office in some foreign country that you know you just have a bunch of people that will give you food and a place to you know sleep
and you do do work for us and that's kind of how they work uh other systems will have quote unquote volunteers where if you solve a th000 capture you you get paid 50 cents or maybe 75 cents and they're charging a dollar so they just take the difference sure makes our uh 15 million that we solved of round one look good so would have been a decent amount of money I did some math by the way during round one when we were we were solving captas 247 in order to um to to get unique captas and if we had sold the service for the same price as death by capture it was something like $200,000 a
year didn't cuz that's boring I don't know I think we're done we're drinking questions comments yes oh oh I'm so glad you asked this you got to say it this way man she what do you want to play a game Library so uh there's there's a bit the question was oh how do we feel about the game the the captas where you have to play a game Park the red car like Park the red car in the parking space put all the food in the refrigerator put all the objects that belong on the ground on the ground put the helicopter on the ground I was like no [ __ ] you helicopters don't go on the ground they fly first of all
tell them about the five I will so here's the Fantastic part there's there's a the main one um and I forget the name do you know the name I know well he'll look it up but I forget the name well the Fantastic thing about that capture is well they also use an audio capture provided by recapture so we didn't even bother with them because we all read de had them however um that being said there're that is a very novel idea because it's not just drag it into the place they do some playthrough the yeah playthrough is the name of it so what's interesting is they uh you can automate this there's mac there's macros like some guy made in
a Macbook he's like this is how I'm going to all you the thing and he he did it it was great um except that sometimes he'd be caught as a bot because of how he dragged the boat into the ocean cuz it was it was rigid and it's not how a human would do it so they they robot so this this is a really cool um system but they use recapture for the ADA compliance and uh that's audio and that's [ __ ] and that's done with but the idea of running a game like put the pancake on top of the shoe yeah remember round four welcome to round four again am I seriously the
only I hate to be outsmarted by capes I hate captures that is all I I actually I actually monitor Twitter uh for complaints against recapture cuz it's so funny watching them [ __ ] about the audio the thing yeah you know so if you go to um if you go to the Google Groups for recapture uh there's a tremendous amount of complaints like we spoke about earlier but one in particular was put together very well um and he made a fantastic statement uh let's see if we can pull it up here the statement is just amazing uh yeah that's it so at the bottom he makes a very passive aggress that's fine perhaps the audio cap is not
designed for use by humans so that I had a I love reading that we done good excellent so the next speaker is here thank you so much for letting us run over our time and drinking in front of you for no [ __ ] reason I hope you guys enjoyed uh still Walker we had a tremendous amount of fun doing it and apparently it's going to be easy for us to keep doing it so stay tuned End of Line