
how's everybody doing today I'm between you and lunch so I apologize about that we'll move fast by show of hands how many people own a Google home device or an Amazon echo device Oh quite a few looks like a little bit over half keep your hands up how many of you have read the user usage agreement or the privacy policy we got one nice work so let's kick this off with the demo that way when it fails you can all feel empathy for me for the rest of the time
let me do this so that you can see what it is I'm doing over here let me switch my displays
arrangement mere we up there
sorry I was playing CTF earlier and I think I might have too many things running all right here we go
hey Siri read my note from today you have one item on the conference's list okay Google what is on my b-sides list your shopping list has one item hey Siri read my note from today you have conferences list okay Google what is on my b-sides list there is one item on your shopping list hey Siri read my note from today okay I think you go on forever I'm sure that switch my displays back
I named it start a VM in that room like I don't know why this computer struggling so hard sorry about it
all right good idea I made a recording of this ahead of time because I think it was gonna fail but it actually didn't so applause so this was inspired by a popular video on the internet by dan RL so how did I do this well a magician never reveals their tricks but good thing I'm not a magician I'm an engineer but not that kind of engineer I did all the setup for this from my phone so I basically did an apple note a google shopping list in an amazon list the thing you need to get this to work your shopping list by opening the list in the Google home app tapping the share button and adding the
person that will share it with the thing you need to do to get this to work is to use a little bit of nature like the reeds and the bees you see when I said read like you would read a book it kept pronouncing it read like the color red and initially oh man no it's it it's that it's not loading my pictures also so I would say read like the color red and initially I had besides as one of the words as well as like one single word and kind of saying both sides so I did have to train my phone to respond to you as the slides not coming up at all
you pull this up
yeah okay so initially I put in more like b-sides is it single word we've seen both sides and I didn't really like that so I changed it to beat I did have to train my phone to respond to the Google home voice to get that full loop going so to do this you have to turn off the settings for at least 10 seconds and then turn it back on this is gonna prompt the teaching of the voice then I had to say ok Google say hey Siri a bunch of times I'm so excited I have a question for her hey Siri wanna dance I did get the Google home to say it about two of the five times that I asked it so
now it knows my voice and it actually knows the Google home voice so at this point I think I'm going to unplug those
okay we're just gonna go to activity monitor and kill everything
go by memory Firefox is probably the culprit quit
okay my name is Aaron Blythe I'm the lead organizer for DevOps KC a meetup that I've run for the last five years that meets monthly also we're working on our third DevOps days KC which is a conference coming up this October which you should check out you should also check out the video from last year's event AIA sizzle DevOps saves kc.org I currently work at New Relic which is an awesome software as a service company that gives you incredible insight into your software so the most common use case in my house is to ask these devices to make animal sounds this is my two-year-old daughter Lila
[Music]
okay what sound does a dog make
[Music] okay what sound does a sheep
[Music] so before I get too deep into the analysis of the traffic I do want to go over the functionality of these devices for my common use cases the Google home probably hits the mark the most so as far as like price point the phone cost way too much but we all have to have them for Siri but when you ask what sound does a pig make it doesn't respond with any voice or sound type of stuff it just gives you a list of a bunch of webpages the Alexa the price points around fifty dollars but if you get it on a Black Friday sale it's about thirty dollars and when you ask something like
what does the pig say it responds back with I'm sorry I only know English at the moment I'll try to learn some animal languages which is annoying to the kids for the Google home same price point it's about fifty dollars and when you ask what sound does a pig make it works you also say what does a pig say and that works to my finding so far is that the Google home is a little bit more natural language processing and you have to be very discreet on what you put in for the Alexa so let's go to my test the way I view this is to start with my devices data flows through to my router
but there's also an agreement that goes on that I have with these companies the data then flows to the companies that are providing the services this also completes the transaction that's part of the agreement for the data so I'm going to start with the router and move counterclockwise around each of the components the first question I'm asking is what traffic is leaving my house so I'm going to start with the echo dot so I have many Amazon devices I've got a couple of echos some Kindle tablets fire sticks for watching TV I don't have cable I only have over-the-air and streaming devices so all these devices connect to my Wi-Fi so I had to figure
out which device was the one that I wanted to hone in on I believe the first device i ever connected to my network was listed as amazon and then all the devices after that say Amazon then - nine numbers or characters I don't know what that and those numbers of characters mean I don't think it's a MAC address or anything but since this is a security conference not a hacker conference I did black/white all that out so that you can't see it so my initial analysis starts with the access time I did things on the devices to see how often it's sending things back from the log on my router what I observe with the echo for the most part that it only
seems to be sending out requests when that keyword is used then I moved on to looking at the domains the first set of domains I'm mostly fine with here we have the amazon.com domains Pandora and Spotify I did have to look up Amazon crl com however this is listed as being owned by Amazon so it's a small tangent on who is this tool is essential when you're trying to do analysis such as this as far as what's going on if you're a fan of Krebs on security he's been talking about how the gdpr recently that's the EU z-- general data protection regulation is kind of going after who is it sucks ICANN is scrambling on this and possibly it's
going to take about a year to get the changes made and then over the course of that time they're gonna have to pay a bunch of fines so that sucks anyways um the ones I was not sure about were these three we'll start with the cloud front I did check that the URLs that were being used from the device actually need authentic authentication and this is owned by Amazon this is simply their CDN I still have not been able to figure out why it's the echo is actually sending data to Google and that's one that I'm still trying to figure out so if you want to come see me in the CTF room after this maybe we can try to figure
that out and then the final domain is also registered to Amazon so let's move on to the Google home Mini the Google how many sends information much more frequently and at this initial level does not seem to be bound by a winner whether or not I'm using the keyword as I just had it sitting there and I was updating this log I would seem more traffic go out so these domains don't surprise me that much we've got Google domains we've got Pandora these are the ones that were of interest to me so starting with the first one which is redirector dot gvt 1.com a quick google search would make you think something was seriously wrong so uh as you read
some of these things i've got a possible infection sign of a virus how to remove this so usually when I do this click google search I don't actually click on any of these links but it turns out that the domain is actually registered to Google which you would think that Google when I did the search would have better SEO on this so that I didn't get freaked out about that anyways let's move on the P - CDN was a I did a quick look up to find out that that was Pandora and this was followed by Triton digital I did a who is on Triton and notice that some of the information was masked but the
organization is called stream the word world it's all one word so when I go to crunchbase which is where you go to look up startups and tech companies it looks like the organization stream the world was actually acquired by Triton digital both of these companies seem legit it's just surprising the traffic is going to them so Triton calls themselves the industry standard for digital audio so it's surprising but it's not that surprising we also had tune in traffic was going there and so that leaves one more place where the the traffic is going and this is SV scorecard research com so I read some things up on this site and they say that they try to gather
data in aggregate and not on individuals there were some places that say that the data is only stored for about 90 days however what if I went out like what if I don't want to participate in this they do have an opt-out policy however appears to be very browser-based from there documentation says please note that the browser opt-out mechanism linked above our cookie based if you delete or block or otherwise restrict the opt-out policy might not be effective additionally because different computers and different internet browsers require their own versions of the opt-out cookie you may need to perform this opt-out process on any computer and browser that you want to be opted out of these are listening devices
I don't really have a browser that I can go in and set these cookies not only that this is just super annoying because any time I use the internet and I'm logged in I have to go and actually connect and set up all these things and if I'm trying to protect other things by blocking cookies then I can't block this pretty annoying so the echo dot wins this round I'm still not sure why it's sending information to Google but other than that it's pretty clean so next up we have the clients and the question I'm asking here is how much data is being sent and how often is it being sent so to do this test I use my Mac as a Wi-Fi
point for all of the devices I plugged in the Ethernet cable to my Mac and that frees up my network card so that I can take the devices and connect them in through this Wi-Fi and then that will go out to the Internet then I used Wireshark to look at the data that was flowing through this so we'll start with the echo dot my first test I asked Alexa what is the way which is an appropriate question giving what's going on up here in the north we see is that it spikes up it makes that initial request and the packet sizes are a little bit on the bigger side I use dig to look up some of the IP
addresses that are was sending to and it and all of them ended up being Amazon assets like you would expect then our favorite question in my house Alexa what sound does a pig make very similar pattern that it's recording my voice and sending that out to Amazon I'm seeing a request back and we've got decent sized packets so then I wanted to see kind of a baseline what happens when it's just running and I'm not interacting with it on these screens you see that the traffic is much lower and what I did was for three minutes I was just silent and let it run and looked at all the traffic and then I turned the TV on for three
minutes to see if it was picking up any of that chatter and then not shown here the other test I did was I read in my normal voice some hardcore rap lyrics do you see what would happen um you know to see if it would pick up on anything it didn't it very similar results so back to the Google home for a similar test again with the what's the weather we see a spike up it sends the traffic off and we're at much higher packet sizes one thing I noticed I was interesting was where I was sending the the the traffic to is one a 100 net so you can look this up on Google they're pretty excited
about it right because if you know the history behind Google like when I was a kid we would say Google that's the the number with 100 zeros after it right and that's both goog oh oh so they actually have this domain register to Google and they've got a whole story about it on their site the rest of the traffic was going to places where I would expect it to go so then we did my test where I said okay Google what sound does a pig make very similar traffic then I repeated the test for the Google both being silent and the TV on and while it does have more traffic being set than what I've seen on the echo it's still
not as much traffic as when I was actually interacting with it so Google's sending on an order of about five times more however it's still low so for this round let's look at the number these are the the packet bytes that were sent so I did my five two-second ten-second test where I actually interacted with it it was sending quite a quite a bit more bytes the the Google home sends more on average than what Amazon is sending but you can see that in ten seconds I don't think what I'm concluding is I don't think that it's actually listening in all the time it's listening for that keyword and it's only sending when that keyword is used so let's move on to the
agreement first off the echo dot these agreements read like the central theme is to use the device to purchase things what I found in the agreement is that it actually creates a voice profile but it does allow you to delete that voice profile and the voice recordings that you have so a few things about the Alexa Terms of Use my lord are these things boring this was updated two days before I grabbed this screen capture however I can't find an earlier version anywhere and one of the things that it says directly in the agreement is that we may change to spend discontinue Alexa or any part of it at any time without notice we can amend this agreement at our sole
discretion by posting a revised terms on the Amazon website so basically whatever you whatever you're agreeing to they reserve the right to absolutely change and I can't find the earlier versions to figure out what I agree to you previously the only way I can find them is to use the Wayback Machine which isn't something that I really trust that much and the company if they're not going to show me the previous versions similarly on the Google home Mini this Agreement actually reads like they're covering themselves to track you so that they can sell ads it's much easier to find the archives here because they have a link directly to all the previous versions for the Google home Mini itself
the privacy concerns I hate these type of documents because you can't really search them unless you go through and expand everything so on this round I really wasn't happy with any of the things as far as the agreement I did like that it allows deletion when we go into the next round we'll find that maybe maybe deletion doesn't mean what you actually think it does so now we're moving on to the server side and what's actually saved on the servers so the Amazon service actually creates a voice profile over the time and they they save all of your voice recordings um what is weird is when I it where you actually delete these is over in your consumer Amazon
account it's not in the Alexa Amazon comm app and I had to find this through that earlier legal document it wasn't very easy to find where to do this and once I got there the only option is to scorch the earth you have to delete your entire voice profile you can't delete a single recording you can just delete the whole thing I also want to know that I'm an AWS user and one thing I think that's weird about the AWS agreement is that you really can't fully delete your account they they keep your name your address and your payment if nations in case you ever want to reopen it later so this to me is kind of scary
I guess so for the Google home mini Google has quite granular saving of all of your stuff you can go and see your activity on my activity in timeline type of format so here we can see the questions that we are asking in what sound does a pig make what sound what does a horse sound like and I can mark a single recording for deletion so this is don't get that confused with what google also has which is matt timeline if you're allowing google to track you on your device you can go and basically see an entire timeline of everywhere you've ever been I don't have this particular feature on this is not mine I grabbed
this link for this picture from the internet so back to my activity it's not only your voice stuff here it's all of your Google profile related to your data your searches your browser usage everything in fact you could download your archive of 35 different products in the default is 2 gigabytes zip files so I went ahead and did this in nearly 6 hours later I was presented with seven archives so roughly about 15 gigs worth of data on the last 10 years about myself and one of the things that you'll find here is deletion doesn't necessarily mean what you think it is people have found that emails that they've deleted show up directly in these archives so it's not even like
it's marked for deletion like they're not gonna give it back to you and it's just on their servers it is still completely retrievable so if you want to freak yourself out this Twitter link is awesome you can go to a bunch of things and see all the things that Google is storing about you this particular Twitter user Dylan Curran is actually pretty funny and he's got a great podcast that I I recommend so in this round I'm not happy with either of the devices the deletion is not really clear and it doesn't mean exactly what I would expect so now that we've been through the entire ecosystem of how everything fits together I do want to look at one more
last thing the way that we expect this to work based on what we've been told so March 2016 in the New York Times what we've been told is it doesn't stream anything without using the wake word right so we're at March 16th the it doesn't stream anything without using the wake word and it has a physical mute button that electronically disconnects a microphone but as with all grant groundbreaking braking technologies there's no doubt we're entering a new territory here about a year later March 2017 from the tech times at the present the third-party Alexa device only performs a single verification check determine to determine if someone indeed uttered Alexa but thanks to the new verification feature the device can now
send the audio to Amazon's servers to make sure that it really hears the right wait word so what this means is initially what we thought was it's just listening for Alexa on the microphone here and then it's not going to send anything unless it hears that now it's listening for Alexa and if it thinks that he or something close to Alexa because they're having a lot of issues with the things similar to Alexa be in the wake word so now what's sending the wake word and possibly whatever happens right after that up to the servers to verify and then not responding unless it actually almost verified that it's a wake board
I don't know what that was US Patent Office three months later after the last Tech Times article this was filed June 12 2007 teen and it was reviewed by the Patent Office on November 9th 2017 um so this is what's referred to as the voice sniffer algorithm patent so I'm gonna go ahead and read this section and one embodiment voice sniffer algorithm can cause a snippet or portion of the audio including and/or immediately following the trigger word to be captured for analysis the audio snippet can be of any appropriate length or size such as corresponding to a amount of time eg five seconds in an amount of data eg up to five megabytes up to a pause of voice
data in the audio stream or any other such determining factor in some embodiments the rolling buffer has other such data can be used to also capture a portion of the voice data immediately prior to the trigger word to attempt to provide context as discussed elsewhere herein and some about emits these audio snippets I read the wrong one anyways what it's doing is it's not asleep or Alexa anymore what so seems or is words like I like or I love or I hate basically sentimental words and then it's looking for anything that happens five seconds or up to a pause after that so it this is where we step into this is listening all the time it's listening
for what you're actually saying and it's actually picking those things up and looking looking to create future ads for you based on that so I also want to bring into the discussion the size of these companies on 2017 Google actually employed 73,000 people Amazon employs 566 thousand we compare that to the population of des moines des moines has two hundred fifteen thousand people so amazon has twice as many people that are actually working for it so a couple of years ago there was a lot of commotion about smart TVs that they were constantly living listening in and trying to figure out like what what it is that you were saying and sending that information back to a separate nation
state from the state that you are all like currently in I believe it was Samsung and I believe that the data was going overseas so I don't want to be an alarmist about any of these things I tried not to use a title that was as outlandish as the ones that I've seen quite a bit I just wanted to analyze the actual traffic that's being sent at this point I feel like the devices are for the most part safe they're not overstepping their boundaries but I think that it's going to escalate over the next couple years there's this analogy of a frog and something about boiling water which I've heard is totally not true I think that frogs are
a little bit smarter than that they do end up eventually jumping out however it is interesting that what we've seen in the media and what's been brought through the tech community is that initially we were told only the wake word and now we're moving on to you well we'll take the wake word and a little bit of information after that and now we're moving towards a patent where we'll take sentimental words and then take a little bit of information after that that being said I'm not really that scared we're probably still going to use it in my house here's my daughter again
okay Google make a sheep sound this is a sheep you ask you [Music]
you didn't hear at the end she said oh no just broke in so I think they have a long way to go on trying to pick out the way that a two-year-old talks and that's actually what I hope they move on to and in solve because that would help our family out a lot you should be able to find me in some way on Arab life.org I'm on LinkedIn and Twitter and all those things and that's all I have anybody have any questions thoughts right blocking domains that's the next step with the the SP scorecard one I do want to just block that using like a pie hole or something too so that my stuff
doesn't go out there because they don't they provide a way to not send it but the way doesn't work so I'm just curious function I don't know that that would be interesting or link before the voice analysis could it really possible to do that on a device that make it affordable the reason that they're sending a lot of it not just to collect the data but to do the voice analysis on servers yeah that's that's an interesting thing about all of these is I don't know they picked the best words for the weight word Alexa is the interesting part is when I first started working on this I worked on a team and one of one of the four people on that
team was named Alexa she's the only person I've met in my life named Alexa and it was interesting because at that time we all had to change our wake were to like echo which was the only other possibility now I think there's four there's computer echo Alexa and if someone helped me I don't know what the other one is I think that they can do the analysis pretty well of that but they did run into problems where other words we're triggering it so I don't know I don't know what it is that they're doing on the server that they can't do locally the other thing that I have noticed is I haven't really ran updates on this I don't know if it's
pushing updates directly because I've never liked every other device I have across all the the tech stuff you have to sit there even like my fire TV everything you have to go do the annointed updates so that's another thing I want to look into what's that pull you back in just nobody ordered
that's no question
no I didn't I hit that slide the the specific company that had the one that I went into you was Amazon funny you should say that I just wanted to focus on the one patent instead of reading you a bunch of patents and obviously one things I just read the wrong section ah this tweet here actually goes through the names of them I did have some trouble actually tracking that down I usually go to Google patents which is where I try to read all patents just because I like the format a lot better than the USPTO site the problem with Google pens is that I think it only shows or it only seems searchable if the
patents been granted so if you I've ever been to the USPTO side on the left-hand side they have the patents that have been granted on the right hand side they have the patents that have been submitted and you can do the search that way the only way I could get to it was the USPTO site so through this thread by Del McCoury he goes through a bunch of the patents but he doesn't have direct links to them but from those I was able to go read the patents Google does have a bunch right now the other interesting thing is I saw when I was reading through on Google patents was that voice and voice XML has been something all
going all the way back to like 2001 that Amazon has been having patents on now the technology has changed has just been in the last couple years that this is exploded for the voice analysis for these type of devices but they've been trying different things for over 15 years now um all of the domains that I was looking at the the data was flowing over HTTP I did look for some of those good if I thought that would be interesting but I didn't see anything that was just over HTTP now I didn't do any type of analysis where I tried to like unencrypted anything but what I saw in Wireshark was it was it was mostly encrypted data when I was
just kind of going through the all the requests that were made
I haven't the price point on that's still a lot I have the the big echo from the beginning I think the only reason I got that was because I was reimbursed cuz I was doing research for work on like writing what they call skills and I for home personal use I wait until the the second or third generation ones come out and they're cheaper Oh
Alexa update here's your Flash briefing Alexa stop Alexa stop so that reads your Flash briefing you would get a bunch of MP articles here in a minute
anything else or is lunchtime yes