
hi everyone um just I'll introduce myself and talk a little bit about what I'm going to talk about so my name is Hannah Zao I'm a Staff attorney at the Electronic Frontier Foundation um I actually studied computer science undergrad before I went to law school and um I work at eff I work on digital surveillance privacy issues and unreliable police technology so those are my areas and I'm also a part of the coders rights team where we consult uh help with security researchers and I'll talk a little bit more about that towards the end of my talk so um as you can see this talk is called scraping After High Q versus LinkedIn Haiku versus LinkedIn obviously was a big case in the ninth circuit what we're going to focus on is in the past couple of years there's been reinterpretations and the Supreme Court has had a a case on the Computer Fraud and Abuse Act the cfaa so we're going to talk a little bit about what kind of things when you do scraping is going to be more likely to bring you into legal risk and what kind of things you might do to make sure you're less likely to get sued by other people so first I have to give the disclaimer this is not legal advice please don't take it as legal advice if you have legal questions please do not ask me right now if something is legal or if you're in trouble um if you come to the eff Booth we'll give you a card you can email info eff.org I promise you somebody reads all of the emails I guess sent there it'll get referred to the attorneys who work on the specific issue that you're asking about and then we'll read it um please also don't you know vaguely disguise your scenario now during the or during the Q a saying I have a friend who maybe did this thing please don't do that again email us so that we can have attorney-climb privilege and nothing you say will uh be held against you so law is dynamic I'm sure we have seen a lot of changes a lot or things that we thought were law or rights and then they get overturned by courts and they're no longer law or rights so I'm not going to use really definitive terms like don't do this do this this is you know this is legal this is illegal I'm going to use terms more like this will give you higher risk this will lower your risk things like that and lastly and somewhat importantly is that nothing will keep companies from acting like you know spoil little children and trying to harass you or sue you you know they can't do certain they can't be uh go beyond certain lies they can't cross them but in terms of filing a case against you if they really want to put all their resources and time into doing that and they are paying a lot of money to do that they can do that now it's unlikely unless they're really really mad about something or trying to make a point but that's another reason why your risks will be probably higher than or like they're always existent because even if you end up winning on those lawsuits they might be annoying to deal with so what we're going to talk about when we actually talk about scraping in this talk specifically is the act of scraping though web crawling the Gathering the data from websites and we're now going to be talking about actually publishing that data or using that data in other ways putting it on your own website or anything else we're just talking about what if you use these automatic scrapers to get information from websites um the other things that I was talking about that's outside the scope of this talk those actually entail a lot of other cool pile of legal risks including copyright trademark Trade Secrets all that stuff that's not going to come into play we're going to talk mostly about cfaa with a mention of some of the state law that comes into play as well so cfaa as I mentioned before the Computer Fraud and Abuse Act so this was enacted in 1986 expanded in 1984 because President Reagan watched war games that movie um and it was originally focused on protecting government computers from hacking because if you watch war games that's what that movie was about um now a productive computer is much much more expansive it's basically any electronic device that has processing or storage and I guess we can call this fact fun but the 2021 Tia 84 plus a calculator is more powerful than most 1986 era computers so this is how the definition of computers has expanded so like I said we're going to touch a little bit on state anti-hacking laws um so in all 50 states Puerto Rico and the Virgin Islands they all have their own computer crime laws a lot of them are structured similarly to the cfaa and the type of things that they address would be unauthorized access and also what is called Loosely computer trespass so they borrow this theory of trespass from normal property you know like when someone comes onto your property without permission and they borrow that legal concept and used it in these computer crime laws now eff we're really against this we think it is a really outdated way of thinking about what it means to access a computer but unfortunately there are some state laws that have that and this is part of the reason why these laws are even more restrictive than the cfaa so um this is a one of the dangers when you do scraping even though you know obviously eff's position is that scraping public data as public you should be able to access it whether it is through web crawling or whether it is your handwriting on every note of every page that you visit on the internet all of that should be okay but because of these other Concepts that come into play in the state laws there can be some risks that you would have if you're in the states that have you know more restrictive state laws so one example of that state is in fact California unfortunately um California the penoko section 508 of 502 it was interpreted in the friends for Fullerton versus Fullerton case and this was a where uh the friends for Fullerton were charged under the California state statute section 508 for their scraping of data um and it was a uh really bad decision eff filed some briefs in that arguing against the implementation of 502 or that interpretation of 502 but um you know unfortunately it resulted in um some restrictions but that's not going to come into play for the public data that we're talking about here so uh a little while ago there was uh a case um Van Buren versus United States now this case was not about scraping at all but it did it was the first case that the Supreme Court decided that was about the cfaa so if you remember cfaa passed a while ago um this is the first case the Supreme Court has taken to interpret this law took them a really long time to get there but because of that it's a very very important case when we think about is what you're doing going to get uh civil or criminal suits against you under the cfaa so the way that um Van Buren talks about it is the cfaa prevents unauthorized access or the kind of access that exceeds authorization so what does that mean well the Supreme Court said that basically if you they took a gates up versus Gates down approach so if the gates are up if it is open you're allowed to access it then even if you weren't supposed to but you you could and you do then that's not a violation of the cfaa it might be a violation of other laws so what that means is in this case for example there was a police officer who was looking in this database for some information that they were selling to other people it's kind of a complicated thing but basically they had access to this database they were authorized to use this database as a police officer they weren't authorized to use that database to get this information to sell it to someone else and do all of that so the government argued this was a violation of the CIA FAA that the way in which he accessed it and the purpose for which he did it was inappropriate and therefore it was without access now eff's position is no the it might be a violation of other laws but under the cfaa he had authorization he was authorized to look at this data and the Supreme Court agreed that he was in fact authorized to look at this data so it wasn't a violation of the CFA so what this means is that basically if you have access to data so for example public websites anyone can visit public websites well almost anyone can visit public websites you have that ability therefore you have the authorization to visit that website therefore when you take information from that website that is okay then um you know this is a uh way of thinking about the law that eff agrees with that we liked but there's this issue where there is footnote 8 which suggests that there can be a withdrawal of authorization so when we're talking about scraping if you're authorized to access the data so like a public website you can scrape it but what does it mean in the context of a website to withdraw authorization the Supreme Court did not explain footnote a confounds a lot of lawyers that's probably gonna um you know be able to fund a lot of lawyers salaries for the next couple of years as this is litigated out exactly where the confines of this withdrawing access uh means but that is one of the things that you would have to worry about so what do we mean when we say with a withdrawal authorization how can you withdraw authorization from a public website that you're scraping so and the incomes High Q versus LinkedIn so high Q in versus LinkedIn began before The Supreme Court took of Van Buren but it also continued after Van Buren so high Q was basically a data analytics company that was scraping LinkedIn website for information so that they could use it in their beta data analytics and sell it to businesses um LinkedIn decided they kind of want that had been letting High Q do this but then decided they basically wanted to make the same kind of product they were like we have all this info why don't we just make money by building data analytics off of it so then it sent a cease and desist letter so the question is when you are scraping public information but you get a cease and desist letter does that withdraw the authorization that you originally had eff's position is obviously as I mentioned before no because this is a public website you can't you know the gates are up for everyone so if the gates are not down so therefore this cannot be a violation of the cfaa um but LinkedIn argued this and then um the ninth circuit actually prior to Van Buren coming out from the Supreme Court said agreed with eff said this is not a violation of the cfaa but after Van Buren was decided High Q was also up on appeal at the Supreme Court and the Supreme Court said hey we just decided this case in Van Buren we're gonna put this back in the ninth circuit you redecided based on what we said in Van Buren so the ninth circuit looked at it again in high Q2 um they said that Van Buren's gates up or down in curry which only applies when authorization is required and as I said before if it is public data then authorization is not required so that means that um Van Buren basically its decisions really reinforced what um it had originally decided so it again said that scraping information from public websites is not a violation of the cfaa and we're really really happy about that because if you think about data there's basically three categories right there is a category where you only get access to it through credentials so you have to you know log in and things like that then there is um that there are other ones where uh there are past so there's password protected and then there are ones where authorization is given but there's a third category which is what is open to the general public so it's not about did you have authorization to log in it's that you don't need a login at all so the what the ninth circuit comes that they're still litigating this in fact but um what they said at the end of last year was that Haiku is likely to win on the scraping violating cfaa issue but um it because it was scraping public profiles now there's an issue because haikyuu ended up hiring Turks to um scrape so basically to build fake accounts on LinkedIn and scrape the data that you would only get after you uh law and you had an account with LinkedIn and that is a different issue that did entail higher risk so the course said that was a violation of the user agreement so you can what you can see here is the more authorization the more steps you need to take to get that data the more closed off it is the higher the legal risk that you're taking on when you decide to scrape that so public information that is just on the website totally okay but then if you get down to you have to log in to access that information that is a little bit if you're and especially so if to log in and to create that account you had to click you know one of those whenever you create the account you have to click I agree to these terms or whatever often in those terms there will be prohibitions of scraping data and while LinkedIn had argued that prevented High Q from scraping the data even of its public profiles you know um it was scraping it but it was a public profile but they said if they still created these accounts they still clicked I agree for these uh contract terms these user agreements but the ninth circus said that didn't apply to the public stuff but to the private stuff it did and also LinkedIn prohibits creating fake accounts to well prohibits fake accounts in its user agreement so in creating this account in order to do scraping it violated the user agreement so again when you're scraping data from websites if it's the kind of website where you needed credentials and there was a user agreement where you clicked I agree to the terms and conditions to continue that might entail a lot more risk when you are accessing that data so um as I said uh these triggers who did the quality assurance while logged into LinkedIn by viewing and confirming the the information The District Court granted linkedin's motion for summary judgment for this as a pre of contract so that entails contract law which is different from the cfaa the cfaa means is a civil and criminal that means LinkedIn could sue High Q under it and also if a prosecutor was really gung-ho about it could bring criminal charges under the cfaa that's why the cfaa is really worrisome because we're not just talking about people losing money or having to pay money we're talking about possibly jail so the lingering concerns we have after this huge development leading up to even November of last year is first there are no technology barriers for for the type of access that you need to create and account for So in theory the gates are still up under the Van Buren's idea of gates up versus Gates down the question is is it is uh these types of crit so Gates down means you know they have um technology that is preventing scraping or that is preventing access to the information so not just acquiring authorization but also not just requiring authorization that anyone can get everyone can you know get um can create an account on LinkedIn with real information so when there are no technology barriers does that information count as public obviously eff's position is that it does but this is this is unclear so when you like I said before when you do have to create account that entails more legal risk another thing is that doj has a charging policy so to you know tell its prosecutors when they should actually be charging someone with a crime and did that charging policies considers cease and desist letters to revoke authorization but it's unclear if that's just talking about non-public data or also public data so when I talk about your risk going up or going down if you are scraping even public data if you get a CSUN assist letter now eff would argue that is no good but it is not completely clear whether that is no good and whether the Department of Justice considers that to be no good or does it consider it to be a valid revocation of the authorization that you had so that definitely so scraping after you get a cease and desist letter that increases your legal risk I mentioned the brick of contract claims and um also the trespass claims which came up in high q but uh LinkedIn didn't really argue it so the nine circus that we're not deciding this again that's the kind of thing that um eff's position is very much against interpreting those types of trespass laws to apply to computers and accessing computer information but this is also kind of a gray area so uh what should you do when you actually come across these types of issues well the eff has a Carter's rights project I'm part of it it is a team a subset of our lawyers who provide a free Council to security researchers on legal issues so perhaps you're conducting some research and you know you come across something either you want to disclose bugs or you're like I'm doing this thing and I'm not sure what my rights are or where I get into really murky territory and you are please email us at info eff.org like I said someone will read it I promise and um we also have guys up at eff.org coders um and we are able to consult with the security research community and provide you with some free advice and counseling on your situation even if we don't represent you or do anything for you that conversation will both a be privileged and B we will be able to point you to someone who might be able to help you so that's the end of the talk do we have questions yes um it was so that is a really good question um uh yes is adding one of those captchas to make sure that it is a human and not um some code they're trying to access the data it does that so that um in terms of revoking authorization it does not but it does give them you know it yes for the rest of it so it in of itself the existence of that were not necessarily suggests that there's a revocation of authorization therefore shouldn't change the legal risk that you're taking on whether that exists but if you have signed a user agreement and the user agreement in there says things like you cannot use automatic subscribers this might be something that comes into play in the breach of contract claims hi there uh can terms of service uh deny authorization to access a public website um but it is you know based on the law right now it is probably unlikely that it can be used for that there may be issues with um contractual sorry there may be issues with uh Contra like I said user agreement like the EULA issues that come into play the breach of contract claims but it looks right now it is that is not very effective now again this can be wrong like I said there will be a lot of litigation about what footnote it meant and that is one of the worries although the the direct uh opinion of Van Buren actually suggests that that is not like that's not effective that having that in the toss is not going to be effective yeah so yes so his question was what if the terms of service includes that you can't scrape data from the public website so uh my answer was if you clicked the I agree thing there might be contract issues but um it looks like from Van Buren that should not be a problem that you should still be able to scrape it even though the terms of service says no uh any other questions we have time for one more question has there been any legal cases or Research into like for instance where the New York Times they allow the Google crawler to see all of their articles and there's websites that'll go and grab them for you yeah so they're I'm not aware of the legal cases of that but there is um both in the doj charging policy and in Van Buren there's some suggestion that essentially the type of frame that New York Times has is there is no authorization but we will grant authorization to specific parties so like for Google crawlers to get it so it's okay for Google to but it is not okay for someone that the New York Times didn't say it's ok