
♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪
or information that he'd like to share with you. A little bit more about Michael from the five minutes I've gotten to know Michael. He's been a really amazing person. He's also the co-founder of Corporate Blue, which serves clients in their pursuit of mitigating cyber threats. Cyber, you should drink. And he does quite a bit. He holds a lot of credentials. And he's responsible for delivering information assurance by means of vulnerability assessments, risk management, training, and all of that really cool stuff. So without further ado, your taxes are being leaked and how Michael Wiley will let you know. Thanks. Thank you very much. Appreciate it.
Well, thank you very much for joining me early in the morning for us hackers here at the beginning of Hacker Week in Vegas. This presentation has kind of changed a little bit since I submitted it. I first found a vulnerability that I wanted to share with the community and how I found it and what we can do better. But as I started doing more and more research into it and talking to some of the security teams, I found more and more and it started snowballing. Up until about last week, I was on the phone with some of these vendors and they're still struggling to fix some of the vulnerabilities. So it's kind of taken a
little bit of a turn and I've tried to defang some of it and anonymize some of it as they're still rolling out patches. Tomorrow they're rolling out more patches. Some of them haven't even addressed some of the vulnerabilities yet and it's a long road to fix this. So I hope this can do good and I've got this talk booked later in the year and hopefully I can release more of the details. I've had to remove some of the specifics just to protect all of our tax data. So I think they did a great job introducing me, but basically I'm a co-founder of Corporate Blue. I do cybersecurity consulting, Department of Defense contractor. I do training
for the U.S. Department of Defense, some universities. I've built some pen testing programs for colleges and I teach cybersecurity as well. So this is kind of an outline of what we'll go through. We're going to do a little bit of introduction to the tax prep industry and what I've kind of found on that. We'll do some look at the research and some of the case studies into breaches, specifically in California where I reside. We'll look at some cybersecurity laws and how they apply to CPAs and tax preparers. Then we'll look at some common breakdowns in the CPA security or tax preparers and what they're doing wrong. Then we'll analyze tax prep software and some of
the ones I looked at, what the tax prep and CPA people are using. And then we'll go into a systematic issue or systemic issue that I found or a lot of those issues when testing the software and what I've been doing working with the vendors to fix some of these issues and then what you can do and what your CPA can do to protect your taxpayer data. So the tax industry, I learned a lot doing some of this research into the industry. It's a large global market. $11 billion are going to these CPAs and tax prep people. I don't think this includes a lot of the advisory services like Ernest and Young and Deloitte are
doing. These are just for tax preparation. In North America, it's $4.6 billion that they're generating. There's about five data breaches per week, according to the IRS in their report in 2017. 177 tax pros reported breaches from January to May in 2017. Five point billion dollars was paid by the IRS in refunds to fraudulent attackers in 2013. And then the IRS claimed that they stopped 24.2 billion dollar in fraudulent returns in 2013. So they did a good job in stopping a lot of those, but there's still a lot of fraudulent returns that are going on and our refunds are going to the bad actors. So how do Americans file their tax returns? We can see that some people are going to H&R Block or those brick and mortar places.
Others are self-filing. If you do that, kudos to you. 9.2% don't file taxes at all, apparently. And 10% or just about 11%, they have a friend or a family that might know tax laws and they file for them. And then a lot of people are using TurboTax. So 34%-ish will use TurboTax. And then almost 30% are using a CPA or tax preparer to do that. So I kind of broke this down based off of age as well because I wanted to understand and I think we're missing a piece of what about businesses. I think most businesses are going to use a CPA or tax preparer but we can see the higher income earners and as
you get a little bit older you're probably going to be using a CPA compared to filing yourself or using TurboTax or having a friend or family member do that. So the IRS, I saw a couple articles and people were saying, "Yeah, we're doing great as tax preparers. We're protecting your data. We've stomped a lot of these fraudulent filings." But yet even the IRS, 2016, 2017, 2018, are sending out these reports to tax preparers and CPA firms saying that there's there's a lot of warnings that they're seeing an increase in these fraudulent filings and taxpayer data being leaked. So thieves are able to access tax professionals' computers, use remote technologies to take control, accessing client data,
and completing e-filing returns on behalf of all of us. So essentially, there's different ways, and I'll go into some of these that the attackers are doing, but they are able to get into the CPA or the tax firms, and they're looking for your and my extensions. And when they see that you filed for an extension, then they go ahead and file a return on your behalf, and they send your refund check to their PO box or some other place, and they take your taxes or your refund check for you. A special agent for the IRS, he's saying that the bad actors tend to go for the low-hanging fruit. I thought this was an important quote
because a lot of the CPA firms, the smaller ones that I work with, they say, "We're not really a target. "We're not EY, we're not Deloitte, we're not KPMG. "We're not really a target." But even the IRS and special agents saying no, we're seeing that they're going for the low-hanging fruit. Deloitte, EY, etc. These are big players. They should have better security in there, but your little 5, 10, 15, 20 user or even up to 100 CPA firms, they're a low-hanging fruit. They don't have the security controls in place. They're also seeing that Wi-Fi security is one of the biggest and most common threats for these small firms. So let's look at some of these
case studies. So what I did is I took, I went through the California breach notifications and it's a little hard to dig into what I wanted because A lot of these CPA and tax prep firms, they're using their names. It's kind of like a law firm and you use like Wiley & Sons, LLP or whatever it is. So I kind of had a hard time going through that and figuring out what are CPA firms versus law firms. So I had to use some keywords. Unfortunately, I only used CPA and tax as my keywords. I probably could have used accountancy or accounting as well. But I went through those keywords and I looked at every breach
that I could find in the California database since 2015. And 2015 is when California set up their database and they started recording this information online. So that's all the data I have was 2015 to present. And each state has their own data breach laws and reporting requirements. So you can check out your own state. But I wanted to focus on what I knew and that was California. So I had the 2015 to current data. I had to make some assumptions that most states they've got only electronic breach requirements. So I don't know about you if you go to a CPA or you've been to any of these places. I've been to plenty of CPA firms
doing security assessments and when I go there I see positive piles and piles of paperwork, right? They've got tax stuff and I get a little bit concerned when I go in there because I can see other people's income, tax returns all on the CPA's desk and I wonder how many other people are coming in here and seeing my tax returns sitting on your desk. And so almost all states do not require you to report if your physical files get taken. It's only for electronic files. So this 2015 to present breaches that I've researched and went through, they're only electronic breaches as well. We also have to assume that even though they're required to, it's kind
of difficult to get caught unless, you know, if someone broke into a CPA's firm and they got the digital records, they may get noticed, they may not, but even if they did get noticed by the CPA firm, it's not like the police are going to swoop in or the FBI or the CIA or the IRS and see that. So a lot of these places may just say, well, we're not really going to get caught, so why report it? We also see that in California, if it's less than 500 records, so if I'm a CPA firm and I have 499 of your records that got stolen, social security numbers, home addresses, spouses information, child, dependents, etc.,
I do not have to report this as a breach. Also, if it's over 500 records and it's encrypted, so I lost 9,000 records in my firm, for example, but they were encrypted and they were using DES or maybe they were just using MD5 or maybe they were really just base 64 encoded. I may not have to report that either. Companies, so as I mentioned, I use CPA and tax as my queries and I was a little bit surprised from the records and the research that I found was that they were very vague and so the California provides a template of what you can kind of send out to your customers if you have a breach
And one of the things that I saw in common was that you have to obviously say, "Well, we're going to pay for Experian or whatever to monitor your credit for a year." But it was very, very vague on what they actually had to say that happened. And a lot of times I felt like they didn't even know what really happened. So these are -- these statistics are kind of my assumptions based off of reading every single report. I see that there was about 12% of them I would categorize as email compromise. So that tells me that they're not using multi-factor authentication, they're using weak credentials, they're susceptible to phishing attacks. And in the most recent
spear phishing campaign that we did for a local CPA firm, we had over 80% open rate. We sent that out on April 17th because we knew they'd be busy with taxes and filing at the last minute. So we waited the morning of that and we sent out our spear phishing campaign and we got a large open rate and a large click-through rate. So we're seeing that that could then lead to some of these other attacks like malware, unauthorized remote access, portal compromise, or any of these others up here. M12% I think is too high for our CPAs and tax prep people. If they have public facing sites or authentication such as webmail, Office 365, Google
Apps, they need to have multi-factor authentication. We shouldn't see that there at all. The other ones I saw, malware. So we don't really know what the malware did. It was very vague. They just said we got infected, a system got infected. I don't think they knew or they really spent the time investigating that of what the malware did. They might have just saw a pop-up come up and they said, "Okay, I've got malware or ransomware," and they don't really know. I even read one report and the CPA firm said We got ransomware and so we're reporting this breach. For me, I have no idea why they're reporting a breach ransomware. I think that's encrypting your
data. So was there something they're not sharing that, well, the data was also exfiltrated and they're not really letting us know? Or did they not really know what ransomware did and they thought they had to report it? So there's a lot of confusion in these reports. The other one that really surprised me doing this was physical security. So I was thinking, okay, a CPA firm, they know what they have, they've got these physical records, and they probably have these fireproof safes, and they're taking care of our data. 25% of the breaches stemmed from a physical security issue. So a laptop being stolen, someone breaking into the office, and this isn't just someone threw a brick
through the window and they got in. This was data was actually taken out of the environment. We'll go into a couple examples of those as well. Okay, and then the other one was a lot of unauthorized remote access. Almost 40% was unauthorized access. What does that mean? I have no idea. That's how they kind of classified it. So did they have remote desktop open and their domain controller public-facing on the internet? Were they using some remote software? Was their IT team doing something? They have VNC running? We don't really know exactly what that was. They just classified as unauthorized remote access. And what were the attackers actually after? This was really difficult for me to
decipher because they don't say. They just kind of said, this large amount of data might have been stolen, and this is what we think might have happened, and this is the date range we think there might have been someone in here. That's it. That's all we have. So a big part of that 56%, I could not distinguish what the attacker was actually after. Was it to file extensions and get a refund? Was it to take your social security number or medical records or whatever it was? I really don't know what they were trying to do. I don't think we ever will. 25% was to file fraudulent returns. So that seems to be one of the
targeted attacks. The goals of the attackers are to get your social security number and find out you're on extension and file the returns on your behalf to get that money. Okay, so I've got a couple of local case studies in California, but I also have a couple of larger companies. So even Deloitte, right? So we thought they were secure, E&Y is secure, these big companies. We even see that Deloitte had some of these issues as well. It's very difficult for me to find out what exactly happened in these private companies because the reporting requirements are so vague and subjective that you don't really have to say a lot of information. So if you look at
Deloitte's website about the breach, they said that very few customers or clients were impacted and they think it was only a couple. So they did not even put the breach in the California database. But then there's other sources that say all administrative accounts and all internal systems were compromised. So I'm not really sure which it is. We have conflicting information. The one that said that all administrator accounts and internal systems were compromised was supposed to be an internal employee, like a whistleblower. So we don't really know exactly what happened there either, but it may or may not have been a couple clients or all the clients. They also had, they got in a little bit
of trouble because the breach happened in 2016 and then it wasn't until late 2017 they actually reported the breach. So it could have been an entire year that your social security number or your data has been leaked out. Here's some more local ones in California. So Wheeler and Egger. They had a breach in the end of 2016 or mid-2016 and they reported it about a month later. They basically said a bad actor e-filed 45,000 returns on extension. So this one actually was reported to me by one of our clients and they said that they knew this company and they had mentioned that It was a sophisticated attack and what happened is the attackers were actually on the network quite a bit longer and they didn't find out until
about that month and then they looked for specific customers that were high income earners and that had filed an extension and those 45 unique individuals out of thousands they targeted and they saw that they would be able to get a substantial refund check because of their income level. They filed those returns and they got the refund check and they walked away with it. And then when the CPA firm went to go file, finish the extension and file the paperwork with the IRS and the state, the IRS, they kept getting rejections saying, "You've already filed your return." That's a little bit strange. We did an extension. No one in the office knows about these people, and they did a couple more returns. They found another one and another one. I
thought, "This is unusual. We have so many people, and they're saying that they've already filed their returns, and they haven't gone. They haven't left our firm or gone somewhere else. We know that they're our client." So they called the IRS, and the IRS said, "We were about to come visit you. We know exactly what happened. So the IRS is looking for some of these things. They're aware of it. And we're seeing this more and more of these e-file returns by the bad actors. And as usual, most of them will say that we don't know everything that was taken, but we assume it's your name, your gender, your date of birth, your telephone number, your address,
social security numbers, EIN if applicable, your employment, W-2, 1099s, K-1s, et cetera, investment information. and more. So in this specific scenario they assume it's malware. What I assume happened there is that they had some kind of targeted attack and there was some type of let's say a reverse shell or something that they had access because I don't think just malware is looking for tax data and then it's going to e-file or return. So I think that might have been the entry point but there was a lot more sophisticated targeted attack. Jeffrey Born, CPA. This is the end of 2017, so you can see these are even recent scenarios that are happening. Two unencrypted password-protected laptops
were stolen. So I kind of chuckled at this one and I thought, well, they're saying, and I added unencrypted, but they just say to all of their customers that were affected by this brief that had their name, date of birth, telephone number, social security number, 1099s, tax data, insurance information, all the stuff that was leaked, they sent them a letter saying, we'll give you a year of protecting your your identity and really what happened is we had two laptops that were password protected so it's okay and they were stolen but if you remember back on reporting requirements you only have to report if it's unencrypted and it's a certain number of attacks so or breaches.
So this basically is saying, yes, they were password protected, but this means the CPA is not doing basic security and encrypting those laptops. And essentially those laptops were taken, I can pull the hard drive out as a bad actor and I can throw those into another machine and I can see all that taxpayer data. Okay, Friedman Perry CPA, This one happened, there was a span, so it's about a six month span that we see that where there's potential bad actors sitting on the target network and it took them about two, three months to report this breach and so they can see they were able to determine that there was bad actors that came in from
a foreign IP address so they weren't using any kind of GOIP filtering or protection against those foreign countries. And it looks like from what I can read that it was remote desktop protocol that they had access to. So this tells me that on their private network they had remote desktop port 3389 that was open and someone was able to get in. I'm also going to make some assumptions there that they probably didn't have lockout policies or any thresholds or anything like that so someone was probably brute forcing that trying to get in to a system. And I can only imagine that it was a file server or a domain controller or something like that. And
they filed a ton of fraudulent 2016 tax returns. So data that was accessed, again, full name, date of birth, telephone number, address, social security number, tons of other data relating to your taxes. And then this one is a little bit different. I got two more case studies here. So this one was TaxSlayer. There was a huge breach. Almost 9,000 tax records were released. And TaxSlayer, if you're not familiar with them, I believe it's a little bit more like TurboTax, where you can go on there and you can file your tax return, get help with your taxes. And so they had, from what the FTC said in their complaint, is that there was weak security measures
that were involved that led to this breach of over 9,000 accounts. So usernames and passwords that were able to be compromised, they were able to get that. They were able to get in and see tax returns from 2014, social security numbers, name addresses of 9,000 people. And then two, TurboTax. So I want to put this one in there because I figure a lot of people are probably using TurboTax according to the research. And Tim Tomes is actually the one that brought this to my attention as I was letting him know about this talk. And so from about 2010 to 2015, and the reports that I have here are a little bit vague, Because a lot
of it has been kind of suppressed. So 2010-2015, there's allegations in court cases as well as articles from whistleblowers that TurboTax knowingly was seeing fraudulent returns coming in and they were basically ignoring that to keep the revenue high. So there's a couple court cases that people have filed, class action lawsuits, individuals. After these two individuals, ex-employees, Shane and Robert, that were in the Intuit, I think, security departments, and they notified management that we have clear evidence or we have clear indications that these certain accounts are filing bad tax returns. For example, you see an account, a TurboTax account, and it's filing over 100 different tax returns. That doesn't really make sense. You know, why do you have 100 different social security numbers in
there? So they brought that up. Or that there was multiple or duplicate social security numbers being filed with different TurboTax accounts. That's another indication. Why would a user try and file their taxes three, four, five, ten different times? And so they reported that to management and management claimed that, you know, they're not going to do anything about that. And TurboTax makes 50 to about $150 from what I've seen filing these tax returns. So they thought, well, if we're getting this money, why suppress these? And so I say these are allegations because they're in court documents, and I've got these in notes afterwards. I can share the link for this. But every court case that I
saw that was filed against Intuit regarding this issue after these whistleblowers reported this were suppressed with arbitration agreements. So I'm assuming when you go and you sign up for TurboTax that they are saying when you basically check that box and you go through that you're agreeing to arbitration. So every time someone filed a lawsuit and said, "You're leaking our data, you're breaching this, you're negligent." TurboTax and Intuit came back and said, "Look, look at that arbitration agreement. You can't do this in court." And they basically pushed it under the rug and we don't know exactly what happened in arbitration. Intuit, I think their CISO it was, that said this in one of the articles I
read, is that it's IRS's job to catch fraudulent returns in response to this. So they're saying, "No, it's IRS's business. We're just basically filing returns. It's not our responsibility. We're pushing it off on someone else." Intuit was granted dismissal and closed the door on arbitration, cited their case there. And the victims in this, all these people that had fraudulent e-file returns because TurboTax let them do this, even though they could have stopped it allegedly, they are no longer allowed to file e-returns. The IRS has said, "Someone has your social security number, you're no longer do that." So now they're inconvenienced and they're damaged based off of this. And from what I saw in the court
cases, it was over 30 hours per person that they dealt with these issues and all the problems dealing with the IRS and the states and trying to clear up their name for this. So it's a lot of pain, suffering and financial impact to these individuals because their data got leaked and Intuit allegedly had the ability to stop that. Okay, so let's look at some of these laws. I thought, okay, there's got to be laws for this, right? We've got HIPAA, we've got PCI, DSS, we've got all these laws. There's got to be some cybersecurity laws for this. And I talked to some CPA firms and tax preparing firms about this. They said, "What are you
bound by?" And they said, "Well, the IRS tells us all the stuff we have to do." I thought, "Okay, so they've got some laws." They said, "There's the AI, CPA, they've got some laws." And they said, "State laws." So I started doing some research. And I like to think I'm pretty good at Google searching. That's how I'm in IT, I'm in security. That's where I got to where I am. But the AICPA says that there are actually no uniform federal laws on business cybersecurity and this is relating to CPA firms. So there's nothing federally saying that. Okay, maybe the IRS has something. I looked at the IRS and scavenged through their site and I see
that there's a quote, the IRS recommends for payers create a security plan. That's pretty much what they say. That's their law. GLBA or GLB was probably the closest thing that I saw to a law that applies to CPA or tax prep firms. So there's a safeguard rule that says you need to designate one employee to coordinate InfoSec. You need to identify risky PII. You need to design, implement, monitor, test safeguards. This law was enacted in 1999. Has cyber changed since then? I like to think so. Okay, so I think that this is a little bit outdated and it's not really doing a whole lot of protection. And I've gone through so many security programs and
policies and built them for customers and whatnot. And they'll say, okay, it's a requirement from a customer or a regulation that I have someone designated as our InfoSec person. Office banner, that's you. Okay. They don't really know what they're doing. There's no training involved with this. So these are kind of like checkbox items identifying and assess risky PIA. I guarantee if I went to a CPA and said this is a requirement, they're going to go ahead and say, well, tax returns, that's it. That's all I have. But they're not really doing anything to protect these. There is the financial privacy rule requirement the IRS has. There's a couple sections of IRS code that require for
disclosure and penalties for unauthorized disclosure. But again, it's not really anything preventing or keeping CPAs in check. right now use the cpa term but it's really tax repairs you don't have to be a cpa to do tax preparation you have to take a quick test um the irs has um some requirements i did find for e-file security so if you're e-filing or your tax person's e-filing or you use turbo tax because they're e-filing for you there are some there's six requirements that they they give out one of them is have extended validated ssl certificate okay it doesn't even have to be really installed properly you just have to have one Weekly external vulnerability scans. I
have seen so many vulnerability scans and I have rarely seen someone actually take their own vulnerability scan and do something about it. The Department of Defense is probably one of the closest ones that do it pretty well. But a lot of people will get those vulnerability scans and they look at it and they say, "I've got a lot to do. I'm overwhelmed. We don't have the resources. So we're checking the box. We have a vulnerability scan. We know that there's vulnerabilities, but we don't have time to fix it. Or I don't really know how to fix this." It says that my server is accepting remote desktop, but I need access to remote desktop. Or it's
using weak ciphers, but I'm not really sure how to fix that. And we're not going to pay for an InfoSec person to come in. So I have the vulnerability scan. I did my job, but that's it. And also, it's external vulnerability scan, if you see as well. So external vulnerability scan, you probably have a firewall, maybe a Palo Alto, a SonicWall, a Cisco, whatever it is. And it's doing a fairly good job. And I really think when I do security assessments and pen tests, I don't try and attack the external gateway. That's very difficult for me to do. I'm not that smart to break into one of those, but I can phish a user and
I can go on the inside, or I can drop a USB drive, or I can do other things internally where it's much easier, lower hanging fruit to get in. They need to have information policies and safeguard policies, but as much as I like policies, a lot of people just, they've got it in place, it's a checkbox, they don't look at it, they don't update it, it's not really doing its job. You need to have a website with CAPTCHA. You might have to accept tax returns, something like that, so you have to have CAPTCHA. It doesn't have to be properly implemented or tested, you just have to have it. Your domain has to be registered in
the United States. Then you have to report security incidents, but that's also a state and federal requirement. There's not really a whole lot that they're mandating, even if you're doing e-files. They do have this. So if you've got a CPA or maybe you work for a tax prep firm, this is a little snippet from a PDF that I found that I actually thought was pretty good that the IRS posted. And so they've got this, I think it was 2018, June 2018 was the last revision of this PDF. It's called the Safeguarding Taxpayer Data. It's a PDF. It's pretty lengthy. And it's got some really good checklist items. So you can go through there and And
you can say ongoing or you've done or it's not applicable, but you can maintain this and it's a pretty good checklist for security and it's geared at tax preparers or CPA firms. Okay, and again to kind of combat what some people are saying is, oh no, we're getting better, we're getting better, we're not seeing as many fraudulent returns, we're not targets, we're blah, blah, blah. What I see from the IRS commissioner himself has said the threat remains and we need to help tax professionals take basic steps, so they're not even taking basic steps to safeguard systems and taxpayer data. This is a site. It was dwt.com. It was awesome. It prepared this graphical map for
me that showed me what are the breach notification statutes in the different states. So we can see that most states, the ones in yellow here, you only have to report the breach. So if your taxpayer data gets out in any other state except for a couple of those green ones, and it's physical copies, so your whole CPA office gets ransacked and they take all his file cabinets, they don't even have to let you know about that. Right? So most of it's just electronic data. There's also another way for them to kind of get out of reporting the breaches as well. There's a harm threshold. So the ones in yellow here have that harm threshold where
it's notifications not required if in good faith and prompt investigation, the covered entity determines that the breach is not reasonably likely to cause substantial harm to you. Okay, so there's a lot of subjective things, you know, reasonable, substantial, etc. Good faith, you know, what's a prompt investigation. There's a lot of things in there that they can weasel away out of. There's also, as I mentioned, the encryption safe harbor. So if you've got encrypted, truncated, obfuscated, so you take a social security number, you convert it to base 64, they may argue, well, we don't have to report this because it's obfuscated. Okay, so in Southern California, California has a couple laws. You've got the breach notification
laws, and that's where I got a lot of my data. And there's finally one. This one has not been, it's been put into law, but it doesn't really, you don't have to follow it until 2020. It's the California Consumer Protection Act. This is one I found out about recently. 2020, you have to start, it's enforceable. It allows consumers to sue companies for unauthorized access, exfiltration, theft, and disclosure of your information. So in the future, we should have this. This is supposed to be kind of to be similar to GLB and whatnot. So California breaches, here's the problem that I found. So there's, in all the breaches I've read and the CPA firms I've done security assessment for, I see a common theme here. So there's a broken record.
There's improper basic security controls. There's file servers with taxpayer data that has full control with anyone in the office. So everyone full control on these shares, NTFS and the share permissions. The IT people just don't understand that or they're not locking it down for whatever reason. There's other things like passwords, shared passwords in the office, these basic controls that are not put into place. We see lack of logging, so in all these breaches it's a lot of, "I don't know what happened, I'm not really exactly sure, we think this happened." So we're seeing they're not logging properly, they're not notifying. There's no encryption on taxpayer data. Laptops are being stolen, backups are being stolen, unauthorized
access to these systems, and what I'll show you towards the end of my talk here of actual tax software that they're using to file your tax returns, they're not even encrypted at rest, in process, or in transit. We're seeing that there's no clear understanding of what data was taken. It's kind of like we weren't logging properly, we don't have IR policy, we're not really sure, so we know someone got in because something happened, but sorry, we don't know. And then they'll give you one year of identity theft protection. So as me as the attacker thinking as the bad person, if I breach a CPA firm or tax prep firm, then I'm gonna wait 13 months
and then I'm gonna go ahead and do identity theft after that point. So you're still impacted after one year, but they're only required to do the one year of identity protection. And they play down the severity of what happens. I'll show you that in a second. Some of them are paraphrased, some of them are exact quotes from these breaches, and I'll translate that into what it really means. So they're playing down what happened, and for the end user, not you who are at a security conference, but your parents, your grandparents, your relatives who are in different industries, they might see that and say, "Oh, password-protected laptop was stolen. We're safe. It's okay. We're good." They
don't understand what this actually means, and there's no requirement to educate the users. And then reading the breach reports, it's not clear what, who, how, when, or what they're even doing to protect this in the future. There's nothing that requires them to actually take better controls in the future or make sure that they've remediated all the security issues. So let me translate some of these reports for you. So statement, a password-protected laptop was stolen. That means the laptop was not encrypted, your sensitive data has been accessed by someone else, and they could just plug it into another laptop and they can see everything that was on it. Another statement: "We found unauthorized access to our
secure network." If there's unauthorized access, is it really secure? "We immediately contacted an IT consultant and promptly hired an IT security expert." So our IT consultant doesn't really understand security and we didn't have the proper controls in place. Number one: "The attacker managed to hack into our systems despite the use of firewalls and antivirus software." So really they're just checking a box or they didn't implement it properly. They don't understand the security controls. Another statement, backup hard drives were stolen though they require proprietary software to be readable. I downloaded every single software except for one and within a matter of about 10 minutes an AWS spun them up and I was able to read database files. Also, most and I won't disclose exactly which vendors but most of
the tax prep softwares I found the file that they're actually storing all the PII data and your tax data, pretty much everything in the software, and they are in some proprietary format, but if you've ever used the strings tool in Linux, or you can download it for Windows, all your data is in strings, clear text. So you can just read it with that. So that proprietary software for me, I don't even need to install it. I can just run strings against the file. I'll give granted some of them, they did make it a little difficult to find, and I had to dig around in directories, but it didn't take too long. Also, none of that
data has been encrypted. They wouldn't have to do these reports. Another statement: "We take aggressive steps to protect your information to ensure all records are securely locked." This isn't a breach notice. So the data was not securely locked. It was unencrypted and someone already got your data. But I think these are important because non-security people are seeing this and they're saying, "Oh, okay, they take aggressive steps. They securely lock this thing. It wasn't their fault." Right? And maybe they did do a good job. Maybe I'm being too hard on them. But from what I've seen in doing more and more research in this is that everyone's passing the ball as far as who's responsible for
this. The CPAs aren't responsible because they're buying secure, protected software and they're buying firewalls and antivirus software. And the vendors I've spoke to were not at fault because there's clear text protocols that the CPAs are using. And also it's the IRS's responsibility, not ours. So everyone's kind of passing the ball. And the IRS says CPAs need to take basic steps to secure this stuff. So who's really responsible for our tax data? So let's look at some of the software that the CPAs and tax people are using. So there's this great survey from the accountancy or the Journal of Accountancy that I grabbed. They provided this awesome information. So there's really only a handful of software
that the tax pros are using. Again, this is not the TurboTax or the, I think, H&R Block has their own proprietary software. These are, if you go to a CPA firm, a tax prep firm, this is what they're using. And if we add up all the different categories except for other, we're going to see that it's about 84-ish percent, if I did the math right, that they're using the top five categories. pieces of software. Even if we take the top two here and we see that the UltraTax TS and Pro system, that makes up pretty darn close to 50%, a little less than 50% of that. So these are major targets here. The 2017 survey
did a little better job than the 2016 survey and they broke it down by number of preparers. And this is one of my favorite slides for when I started talking to these vendors that make the tax software and they would say to me flat out, and this is even as of Monday of this week, they told me, "Well, you know, we're working on securing software still, we're putting a patch out soon, but our software is not meant for large CPA firms or our software is not meant to be shared amongst multiple CPAs. It's for one user and a client only." And I look at this and I would show them this and say, wait a
second, and I won't name the vendor, but I'd say, you know, pretty much if you look at some of these and let's just, again, it's not dumb, but let's take CCH and say, okay, well, if you're saying that, over 100 users, the CPA firms, 30% of the CPAs are using your software in 100 more environment. So even though you say we don't recommend it and it's not designed for that, that's not how it's actually being used. And you have KB articles, you help people install it on server client models and topologies. So even though you're saying one thing, it's not your responsibility, we can see the data that it's being used a certain way and
you as the manufacturer of this tax software that's holding sensitive data, you are responsible for protecting all of our tax data inside of it. Okay, and this one, I'm going to kind of skip through this, but I've got it again in these slides later on. I tried to find, looking through surveys of why CPAs are using one software over another based off the size of their firm. I didn't find a lot of information that kind of correlated it. I could see ease of use was a big one. It's funny, but price was a big reason that a lot of people either switched software or bought one software. So price and ease of use seemed to
be the reason that one piece of software was more popular than another. And support, I didn't find too much difference on this, but one thing I also noticed looking at all these different tables was there was nothing on security. They didn't do any surveys of how important security to you. What measures, you know, the steps that the provider of the software is using for security, is that important? Is it not important? Anything like that? It's not even part of the survey here at all. Okay, so let's look at some of the testing of the TAC software. And I want to go into a couple definitions before showing some screenshots. Because I've had a lot of
debate with different tax software and vendors and people and they're trying to argue whose fault it is. So I thought Troy's hunt quote on PwnPasswords was pretty good. He says, the entire point is to ensure that any personal info in a secure data is obfuscated such that it requires a concerted effort to remove the protected data. move the protection, but that the data is still usable for its intended purpose. So we need to protect the data, but also make sure it's usable. When I looked at MITRE and I thought, well, how do I start registering these CVEs? Because I'm not really a software person. I don't do exploit development or security research. This is my
first project in this. I'm really focused on securing businesses and doing some security assessments and pen testing. And so as I looked through this, I thought, OK, well, how exactly do I classify this? And I kept getting pushback from the vendors and saying, no, no, it's not a vulnerability. It's an exposure. It's not this. It's not that. It's not our fault. It's really the protocol. So I kind of want to go through these. And while I do think that these are some exposures, I also think there's some vulnerability to it. So with an exposure, a systems configuration issue or a mistake in the software that allows access to information or capabilities that can be used
by attacker as a stepping stone into a system or a network. So I thought, well, it kind of makes sense, but it's not a stepping stone. If you're an attacker and you're after social security numbers, names, date of birth, tax data, is it a stepping stone or did you get the jackpot there? Then when I look at vulnerability, it's weakness in a logic found in software and hardware components that when exploited results in a negative impact to confidential integrity or availability. Well, these exposures do have a negative impact on confidentiality. I would assume you want your social security number and your home address and your phone number and spouse information remain private. And then so
if we look at the definition of access control or improper access control, which I registered most of these CVs under, they're defined as the software does not restrict or incorrectly restricts access to a resource from an authorized actor. And so this, I went back to this, that's a definition of vulnerability. It's one of the subcategories you can register the CVE under. And the manufacturer of one of this software kept saying, it's not really weakness, it's not a responsibility, it's a clear text protocol, or there's other servers or things that are leaking the data, it's not us. But again, this here, and I underline it, the software does not restrict or incorrectly restricts access from unauthorized
actors. So I do think that they're responsible for protecting that. So in my findings I found, in looking at these different CPA firms, that if you have more than one CPA or tax preparer, even if it's one CPA and an office administrator, the office administrator still needs access to the tax software because it's the CRM as well. They need the phone number, the name, they need to print documents. So even if it's one CPA with an assistant or an office manager, you're still probably doing a client-server topology. Even though most of these vendors when I spoke to them were saying that it's not meant to be that way. It's only supposed to be client setup.
So we're not seeing that even though they're saying that's how it's supposed to be. And they also said, well, if you're going to do client server, you need more people to access the software, you need to set up terminal services or remote desktop services. Great. Now we've got remote desktop running around on our network that we have to control as well. Also, the default setup when you start installing softwares in a client server relationship is using SMB version 2. Many of the software vendors don't recommend SMV3 due to performance. So when I first started this in my first disclosure and I contacted the vendor, almost slipped with their name, they said, no, no, no, we
don't recommend SMV version 3. There's a performance hit. We don't even support that. And I kept going up the chain, up the chain, eventually in security managers and their InfoSec team and developers. And they finally said, at the end of four months now working with them, they said, our recommendation is to use SMV version 3 with encryption. Okay, I thought there was a performance impact. No, no, there's not a performance impact. Okay, so are you recommending that now or are you requiring that? Because going back to the slide here, the software needs to restrict access or unauthorized access. And they said, well, we can't force people to use that, so we're just going to recommend
that and that's our solution to the data exposure. Well, if you're not restricting it, you're not really doing your job and not being responsible for this. So, vendors claim they don't recommend the client-server relationship since the data is analyzed in transit and rest, at least from what I found. I didn't even look at it in memory and process because I'm making an assumption it's already in clear text anyways. Still working with vendors on the patches. So, no one yet, out of all the tax software I've tested, no one has fixed the problem yet. And this report of four months ago was the start of this. I'm still working with them. I had a call as
of Monday with one of the teams, also with their compliance, which I think is a fancy word for their legal counsel. And one of them is releasing a patch tomorrow, which I'm very grateful for. And they were actually the last person I found or the last vendor that I found the issue with. And I reported to them, and they're releasing a patch tomorrow. So kudos to them. But still, there's a lot of issues, and they're only fixing one of the two issues. So I did some screenshots there for illustration. Originally I had exactly which vendor, the CVEs, how I found this, and I had to tweak this because I'm shocked that they still haven't fixed
these issues, but my CPAs, every CPA I've ever used, their software is still vulnerable. So I don't really want to give you all the details. and you go find my CPA and now you've got my tax data. A lot of security pros that I know, I've asked them, "Hey, what does your CPA use?" And then I've gone and test that software and it's vulnerable too. So I feel like all of our data, if we use a CPA, we're probably still vulnerable to this. So I'm trying to keep it a little bit under the rug until maybe another couple months or two and then I'll post a blog. And then I'm also gonna be talking at
ISACA in Nova Scotia, Canada. And I'll release all the details pending that they please, please fix these issues. So these screenshots that I have coming up here, they are defanged and some of them don't exactly map one to one, but I'm trying to kind of hide who was vulnerable, but you'll still get an illustration of what was leaked out. So I first did a little bit of research about, okay, well maybe someone else found these CVs and it just hasn't been fixed. So I looked at UltraTax, which was the most popular one, MITRE, Annex Boyd DV, zero. No vulnerabilities disclosed. How is this popular? That's the most popular tax software, billions of dollar industry, and
there's no vulnerabilities in the software? The software that's made a long time ago? This is incredible. So I look for Pro System FX. One vulnerability, but it's like a DLL buffer overflow. It's not really related to this at all. Intuit Lacerte. And by the way, Intuit owns quite a few of these as well. So what we see just Intuit Lacerte, that's the one I'm the most familiar with because I've had a lot of customers with it and I've worked with it when I did IT in the past. But they own a couple of these. Nothing. Not a single vulnerability that I could find out there. Drake Tax. Nothing. Nothing. Pro series vulnerability research. There was
one, this one's definitely, it was a DLL, so we have buffer overflow and like QuickBooks and applied to a couple of them. And that was 2007, that's quite a bit of time ago as well. Okay, so let's look into this. So I grabbed the software and again, this isn't specifically the software, but every piece of software I used, I put in fake data here. Obviously my name is still there, but I put in a fake social security number of all ones, just occupation security guru, a date of birth that made me apparently 106 years old. All this other stuff in there, put my fake wife's information there, her social security number, so I could see
what's coming out. So I added in as a new customer. some more information I added in again I just kinda incremented all the numbers so I could see what was what and I documented this. Hey some more tax information I could put what state I'm filing in etc. driver's license information, spouse, dependents, etc. Some of the software even asked me to upload pictures of a driver's license picture and also bank account numbers so that you can have your taxes and the money funneled out to the IRS. I'll show you how that's interesting in a second. So once I added the record and a bunch of other fake records in here We can see on the
default, this is like a customer list of everything that was in the database. Name, customer number, primary social security number, and then the status of the return, and then what state that they're filing in there. So once I basically, most of the software, when you launch, you've got the server that's running, it's got the database file. When you've got the client and you open it up, sometimes there's a password, sometimes it doesn't require a password at all, which... whole other issue but you log in and so then what the client is doing, oops down one, the client then says an SMB request for a file and I've redacted that and it does a read request
and then gets a read response and so my first when I did a pen test I saw tons of SMB and I thought wait a second why is this tax software doing this and so then I saw Okay, this is not good. Ones were my social security number, twos was my wife's social security number. So in clear text, I'm seeing that come across in transit over the wire. Then, to my surprise, this was one of the worst ones that I found, the client. So in the red is the client's request, and we don't see the full conversation. Blue is the service response. When the CPA... logs in or they launch the client on their workstation,
the server then sends an entire copy of the database of all customers, including social security number, home address, name, occupation, you know, pretty much everything you can need here. Spouse's number, account number, mobile phone numbers, email addresses, where they work, PO box, home address, and they logged in. They didn't even request the file. It just basically sent a thousand in this this first case study they did over a thousand records in social security numbers in clear text on the network. A little closer view of that and I thought okay so what about when I make a change inside a customer so that you logged in all that data is just exposed in clear text and
and we see all that that's fine what about actual W2s when you file that so I open up the software here and I put in a fake I wish this was a real amount that I made every year but nine million nine hundred ninety nine thousand dollars And I watched with TCP dump and Wireshark what was going to happen over the network. And in clear text, I can see not only that record that showed up, but again, the client sent all that taxpayer data-- social security number, name, home address, et cetera-- everything to the server. The server responded with the same thing. I guess they wanted both to share the information over the network. And
then it was an entire record here of every number that I had in my tax record. So W-2, 1099s, et cetera. So you could map this out and you could basically get, this is the tax return essentially of the customer. Okay, I thought, okay, that's bad. Now I tested a different piece of software. I thought they can't all be like this. This has got to be a one in a million. I stumbled upon this. Nope, second one I tested, same kind of issue. It didn't do the entire customer database, so thank you for you who made the software, but you did still send the client ID, the client name, social security numbers, who filed the
tax return, the taxes they filed, what state, if it's federal, and then they also sent in a nice piece of information for the attacker, the bank name, and then the bank account number as well. So now you can pretend to be them and you can go into their bank and try and steal money from them as well. I thought, okay, so let me look at these database files. So now that was in transit, now I took a look at REST. So these proprietary software have to be encrypted. They have to have the data sitting on disk and it has to be protected. It just had to be an issue with reading that was the vulnerability.
So I ran strings and yes, you can get strings for Windows. And so I ran strings against these database files, redacted the file name. And this one was interesting because The ones were my social security number. They really wanted to share that with someone. I don't know if that was like a placeholder, if it was blank or what was going on, but they essentially just dumped my social security number and then the twos were my wife's social security number and some other information such as phone number, job title, names, etc. And it just kept going on and on and on. So basically, yes, you can't open these database files just with Notepad or something like
that, but if you run strings against them, you're going to see everything inside without authenticating. "Hey, here's another one. This one you could see everything else." A lot of weird characters and stuff in there, but essentially jobs, titles, names. This is basically the whole database file as well. This one was one of the best ones that I saw. Okay, so some discussion with the vendors. It was somewhat difficult to reach them. I had a hard time getting ahold of security teams. Eventually I had to even try and threaten them on Twitter and other places and say, "I have a vulnerability to your software, please respond." The worst one I got was one vendor. I only
found the support team 'cause sales and other people wouldn't respond to me. And their support team said, "You need a customer ID and PIN number." They're like, "No, no, no, I'm a security researcher doing this stuff. I have information for free, I'm helping you." Sorry, you need a customer ID or we will not respond to you anymore. Do you not understand? I posted a link to this talk in an article I wrote and so I said, I will be exposing this to hackers this week if you do not respond to me. So that's why I had the call with their legal counsel and a bunch of other people on Monday. They apparently, my tweets when
I said, hey, you've got this vulnerability, blah, blah, blah, they kept getting deleted. I still have to look into that. I don't know if Twitter was deleting them or there's keywords or what was going on, but they kept just disappearing. The vendors claim they didn't delete it or don't have responsibility for that. Some of the vendors have private bug bounty programs with bug crowd and hacker one but I'm not part of them so I couldn't get in there and tell them the issue. They need to make these public. They need to make this accessible for people who want to responsibly disclose this. They denied initially that it was their responsibility and they said the issue
was SMB version 2. I told them if I'm a bank and I have your login over HTTP, it's not my responsibility then either. That's yours, right? You should be using VPNs. They didn't quite agree with that either. So I said, no, you can't force someone else to use SMB version 3 with encryption. I'm set up. It's your responsibility as a vendor to make this secure and at rest in transit and in process. They said the software wasn't meant to be client-server. They say that there's legacy software. Going back to 2015, we can't patch those issues. We don't support it that old. So some of this stuff's never going to get patched. They say encryption would
break integration with third party. They say that's why they can't do anything. It's because there's other third parties that you can link your bank and this and that, and they would break too many pieces of software, so they don't want to make it harder to use. So that's why they're not fixing it, or they're having a difficult time doing that. one vendor patch their vulnerability or it's doing that tomorrow and I really appreciate that they're doing that taking it so seriously I reported it two weeks ago and they're actually doing something about it so I'm not going to name them because they haven't fixed all of it but I really appreciate that and anyone who
is a software vendor here or works for a company we're trying to do the best thing as presenters and skilled researchers so go ahead and take our free help and please patch your software if we're disclosing it. So now what? So wrapping up here and So interview your CPA, right? And so say, "Hey, what are you doing? Are you encrypting your backups? I'm seeing a lot of data that's not being, you know, it's backed up but it's not encrypted at all. How are you encrypting it?" Interview them a little more. "When was your last pen test or ball scan? What are you doing about that?" So interview them a little bit. Don't just say, "Hey,
you're the cheapest price. I'm going to use you." What exactly are you doing? And eventually I'm going to turn this into a checklist similar to the IRS's one so people can use this and interview their CPA. What's your instant response policy? Your IT person probably doesn't know security or instant response, so who do you have? Do you have someone on retainer? Do you have insurance? What are you going to do about this? And then what policies do you have in place? Make people aware. Share the link for this. I have blog posts. I have the slides on our website that you can send out or give to people. Make them aware, and let's try and
get this fixed as well. I really want this problem fixed for the general public. Help test TAC software. If you're a security researcher and AppSec, please help test it. I am not an AppSec person, so I'm sure there's a lot more problems. We saw there's no vulnerabilities posted except for a couple of buffer overflows and myself. So please try and test the software. Your CPA, they can do defense in depth. You can help them out, build security programs. Do not use wireless on production network. Have them use multi-factor authentication for everything that's public facing. Have them encrypt everything at rest. I see FTP servers open, remote desktop servers. Have them encrypt and lock things down.
Make sure that their tax prep workstations that are accessing these softwares are dedicated and isolated. And use endpoint protection, maybe EDR, buzzword. Who's doing it right? This is my last slide. Drake Software, I'm going to name one vendor. I think they're doing a pretty good job. They might not have things perfect, but these are some pop-ups I got. They notify you when there's sensitive data that you're looking at. They pop these things up when there's an update and say, you cannot use your software unless you patch it with these security issues. Thank you. They also, if you don't touch the keyboard, they mandatory require a password. And then if you don't touch the keyboard, it
locks out. All these are default settings. I didn't touch these. Every other tax software did not have a password when I set it up. It didn't require it. It didn't require updates. So they're doing a fairly good job. They also are encrypting at rest and in transit from what I've seen as well. So that's my findings. I thank you for coming out here again in an early morning. If you want to contact me, I've got my contact information up there, Twitter, LinkedIn, an email address. And if you go to corporateblue.com slash blog, I do have the slides of this and you can download a PDF with all my notes and research, case studies, surveys, all
that kind of stuff. So you can look at that a little bit more. Happy to take, are we doing questions? Outside? Apologize about that. So if there's any questions, I'll be standing outside, happy to answer them. If it's related to this or other topics, I'm happy to chat and I'll be out there. Thank you for your time. I appreciate it. All right, good morning. Welcome to B-Sides Las Vegas. Before we get started, I have a few announcements. First off Thank you so much to our sponsors, Rapid7, Amazon, Oath, Semly, Endgame, Telos, everyone of you who are here and who donated and who are just participating in B-Sides. We wouldn't be the same without you guys. Welcome to Breaking Ground. This specific track, we
cover groundbreaking research, something that's new, something that we haven't seen or heard before. Today, we have the pleasure of having a talk by Michael Giannourakis, about iOS runtime hacking crash course. This is being live streamed, this is being recorded. So please just make sure that your cell phones are off before we start the talk. Otherwise, please enjoy yourselves and please give a warm welcome to Michael. There we go. Perfect. All right. Yeah, I am losing my voice and if I cough, it's not because I'm sick, it's just because of poor life choices. So don't stress. Cool. So just to start off a little bit about myself. My name is Michael Generakis. If you can't tell from my accent, I'm
from Australia. I started a company recently, AssetNote. I'm not going to talk about it because it's not this kind of, not that kind of con, but if you want to chat to me about it afterwards, that's cool. Before that, I was the director of Spider Labs in Asia Pac. Spoken at a bunch of conferences before, mostly on mobile stuff around the way. I also organize a much more disorganized conference than this in Australia called TASCON. And also a local meetup in Brisbane. And for anybody who's part of Duxec, I am a flat duck enthusiast. I'll have a few beers and I'll tell you about that. Cool. So just a bit of an overview of what we're going to talk about today and really where this
talk is coming from. So it's just a bit of a crash course into messing with the runtime of iOS applications, mostly for pen testing and bug bounty purposes. I did a similar presentation at a local conference in Brisbane a few years ago, but a lot has really changed since then. So the big one, obviously, is Swift was introduced. And that's changed a lot of things. Apple has pushed 64-bit only as well. So I forget which exact iOS version they stopped supporting 32-bit apps, but they don't anymore. There's also been a rise in cross-platform frameworks. So frameworks that allow you to develop in a particular language and then access native functionality across multiple platforms without having to write separate native apps.
And the tooling has evolved and we'll discuss in the presentation. It's also in some areas not really kept up. So it's really just an updated presentation. If you've seen that one before online or whatever, it is an updated presentation to cover this. It's focused on iOS app testing, so like no mad iOS kernel today, sorry. I wouldn't be presenting it to you probably if I had that. Cool. So setting up your environment. So I won't go into too much detail about this, but there are plenty of guides on the internet. And because we've got limited time, I honestly haven't timed this talk. I was working all night on it. So we'll see how we go. But the main tools that you will need-- and we'll cover it in
the presentation-- is obviously a jailbroken device. Currently up to iOS 11.3, I believe, can be jailbroken. Some things you can do on a gel device. It's not as easy and it's not as straightforward. And you can't do as many things. But yeah, for the purpose of this presentation, we'll be using a gel broken device. In terms of some of the tools that we'll be using, Frida, Script, Mobile Substrate, ClassDump, SSH, Disassembler of some kind, essentially all you need. There's a bunch of recent tools that are really good, like Objection by SensePost and Needle by MWR. They're really nice and they abstract away a lot of what I'm talking about here and make it easier to use. But I want to discuss a
little bit more of the techniques at a lower level so you guys can get a feel for how you could do it yourself, but also how those sort of tools work if you do use those. Cool. So we'll start with Objective-C apps. So most iOS apps are still written in Objective-C or at least have some Objective-C component. The trend is definitely moving away from Objective-C, particularly for consumer apps, but Objective-C frameworks will still be around for a while. Apple's got a bunch of internal frameworks that they supply as part of iOS that aren't likely to be updated to Swift anytime soon. So it's still something that you should know when you are doing iOS hacking. It's not that bad. A lot of people don't like
Objective-C, but it's not too bad once you get used to it. So that's a bit blurry, so sorry about that. We're using VGA. But here's just like a little bit of a primer on the Objective-C syntax. So on the left here, you've got your header file with the interface. You can see there there's the @interface keyword. You've got the class name, and after the colon you've got the superclass, so what it inherits from. You've got your properties, so it's just property, some characteristics, you don't really need to bother with those. The type, and then the property name. And then you've got your class methods and your instance methods. So the class methods are denoted by
the plus, and the instance methods are denoted by the dash. And you've got in the brackets, you've got the return type. And then you've got the function name. You can see here for the instance method with parameter, that's the syntax if you've got parameters. An interesting thing to note, and it will come into play a little bit later on, it's pretty common as a design pattern in Objective-C apps and iOS apps in general to have the first Does the function name indicate what the first parameter is? This is just a really contrived example, but it might be, I don't know, login with user and then the parameter will be like a user, something related to the user or whatever. That's pretty common. And then you've
got on the right hand side, you've just got the implementation. So it looks pretty much the same. With the properties, you synthesize the properties so that all that does is at compile time, it just generates the getter and setter methods for that property. So you don't have to write it out yourself. Then you've got your class methods and your instance methods pretty straightforward. It doesn't look too different, right? That's sort of the basic syntax of Objective-C. You really don't need to know all that much Objective-C to be dangerous for most pentesting tasks. So just basic object-oriented principles, like the difference between a class and object, difference between a class method and an instance method, a very rudimentary understanding of the MVC design pattern, iOS apps. adopt
this pattern. Not all of them. There are some hipsters who like to do reactive kind of stuff. But for the most part, Apple pushes as a standard an MVC design pattern. When I say rudimentary, I really mean rudimentary. If you just think of the M being the model, which is data, and V being the view, which is the presentation UI, and C being the control, which is kind of the logic. That's If you're going to get that, then you can kind of understand what's going on from what we're going to discuss. How to call methods, the syntax to call methods, we'll go over that, and how to read and write variables, and then just that
basic syntax, class syntax that we saw in the previous slide. So you don't really need to know that much. You don't need to be an expert programmer to be able to be dangerous for this sort of stuff. So let's get into reverse engineering Objective-C apps. Objective-C executables need to have a bunch of class information in order to run and to support the dynamic features of the language. It's great for us as pen testers because we can extract this information and it gives us insight into how the application is architected and how it runs and how it functions. you know for a pen tester or a bug hunter this gives you a map of the application to help you you know with finding potential vulnerabilities and you know attacking
the runtime so back in the day class dump zed was the go-to for this as it had better ios support than some of the alternatives but it's not actively developed you can see there like copyright 2009 so it doesn't really work on 64-bit apps or or any kind of swift or mixed mixed apps, so that's a bummer. Hasn't really kept up. But for Objective-C apps, the original ClassDump utility by Steve Nygaard is still probably the best. It's also not really actively developed, but it still works for sort of pure Objective-C apps. It works fine. You can of course use something like O-Tool, which comes with the Mac. It gives you a lot of different ways to sort of mess with the binary, extract information out of
the binary, but it's not presented in a really easy to digest way, in the same way that class dump is. You could also get this information out of a disassembler, but for the first pass when you're looking at this, definitely a class dump from the class dump utility is really the way to go. That's the command that I use. I'll go through in the demo what those what those sort of options do, they're just like formatting, right? And usually I just pipe it out to a, or output it to a file. You can see I've got the .h extension on that. The reason I do that is because when you load it up into a
text editor, which you'll see, it just automatically does the syntax highlighting. So that's the only reason I do that. You're not going to be able to see this, but that's fine. I do have something in demo. That's a class dump of Instagram I was doing for a bounty thing not too long ago. I'll probably just skip that and go in the detail of that and wait for the demo. and I'll go through the class dump and some of the stuff that you'd look for when you're analyzing the stuff. So before you can actually do any of these tasks, you need to decrypt the binary. So iOS apps that are downloaded from the App Store are
protected by Apple's DRM and the binary is encrypted. So to be able to analyze the binary, whether it's for getting a class dump or disassembly or whatever, you need to decrypt it first, right? The way that you do it, or the kind of methodology, it's pretty simple. You just sort of work out the correct offsets for the encrypted portion of the binary. extract that encrypted portion after it's loaded at runtime because obviously it needs to be decrypted to run and then you take that now decrypted portion and you shove it back in to the binary, you patch it back in and you're good to go. You can do this manually and there's plenty of guides
on the internet to how to do this but there's heaps of tools out there to automate it so I wouldn't bother doing it manually. I prefer to use Clutch for decrypting binaries. I guess this is probably the point where I should say be responsible and say don't use this like pirating apps. There's a big warning when you go to their GitHub page. You know, not cool, but it's really useful for security analysis, right? Those are the options, but really what you want to do is the -d option, which dumps the bundle ID into an IPA file. You could also do -b, which just dumps the binary portion. You could just do the binary for your class dump and disassembly and stuff like that, but I like to get the
whole IPA file because it also has a bunch of other interesting files. So if it's a cross-platform app, often they'll have all of those, say the JavaScript files or even some of the compiled DLLs for Xamarin apps and things like that. It also has some interesting settings and whatever. So it's useful in a broader context, but specifically for what we're going to talk about today, it's not necessary, but that's what I do normally on a pen test. So once you've got the class dump, the next step is really just to go through and start analyzing the class dump and seeing what you can get out of it. So essentially, once you have that application, the
class dump, it becomes like a map of the application. You can sort of see where everything fits together. So the first thing you want to do is sort of look for interesting functionality. So authentication, in particular local authentication and other sort of local checks are definitely something that's interesting from a security perspective. How the app is doing data storage, in particular key management, right? That's usually pretty poor on mobile apps. You know, storing the keys with the lock is not really great. People don't do a good job. Security checks and controls, things like jailbreak detection or prevention, anti-debugging and other sort of more advanced runtime security measures, you know, you want to offer deal with
these simply because if you're doing it in the broader context of say a pen test, often these can stop you from completing that and doing other tasks that you might want to do. And so you'll need to look for that and potentially break that, which we'll go into in the presentation. How it handles transport security, so does it implement cert pinning? How does it interact with the backend APIs? If it's using If there's no sort of cert pinning issues and it's using HTTP, HTTPS, it's not too bad. But often you'll find in mobile apps they use different kind of protocols or custom protocols or unusual protocols that aren't easily aren't easily intercepted. So you might
want to have a look at if it's got some kind of custom network stack and it's implementing it. You might want to hook those functions and see what's going on there. And you can also see what frameworks and third-party libraries are in use. So if there's anything that has known vulnerabilities, you can get an idea for that as well. So once you've identified the interesting functionality and you have a broad understanding of how the application is architected, you can start to look for potential security issues. So some of the key things and really the three sort of broad categories that you would look for from a runtime security perspective is simple application logic that can
be exploited, so bypassing security checks, access control and auth bypass, and we'll go through a few of those in the demo. sensitive information that you can extract from memory, so things like auth keys, password, encryption keys, whatever. And then exploiting the way data at rest and transit is secured, right? So bypassing cert, pitting and validation, you know, how they're doing encryption of any client-side data storage. Cool. So we'll go into a demo, which I've pre-recorded because I'm a good boy. This is just a really simple app, Subjective-C app. It's got a couple of jailbreak checks. That's terrible. But that's cool. It's just checking. They're both failing and saying that you've jailbroken. This is a password, like a login field. It's just saying the password that I typed in is
incorrect. Pretty basic. I'll make these available as well online if you want to see them. The projector is not so great. Let's run through this demo. That is terrible. I can't even see it on my screen. That's great. Basically what I'm doing here is running class dump z, showing that it doesn't work. I had the screenshot in there. Now I'm getting class dump. And so these are just some of the options which you can't make out, obviously. It's mostly formatting options. I like to sort it by inheritance and also the methods alphabetically. And so, yeah, and then I'm just actually running the command on the binary of that application that I showed you. And I'm
just, as I did in the presentation, I'm outputting it to a file on the desktop. And then I'm opening that in a text editor, which is... written in Electron because I'm a hipster or whatever. This is not going to work. But yeah, so we've got our class done. You have to trust me on this. So I just did the syntax highlighting. So that shows how close the formatting is to a standard Objective-C header file. Because by inheritance, you've got all the protocols first. You've got your app delegate and a couple of classes that we've got in there. Pretty small app, pretty simple. What I would typically do is just start searching. It's just grep or command
F. Where I usually start is the app delegate. That's essentially main for iOS apps. It's really the point where the developer gets control. And usually, this is just a small app, so it's got really nothing in it, but usually there's a bunch of interesting stuff that the developers kind of just chucked in the app delegate because that's where it goes. You can see here, so you've got the class name, you've got the superclass, and in the angled brackets, I didn't go through that in syntax, but that's just the protocols that it conforms to. Then you've got the different class and instance methods. Then I start looking for interesting stuff, like I start searching jailbreak, I start searching password and things like that. You can see there's some
interesting looking classes here. You've got this jailbreak manager class, which seems like it would probably be handling those jailbreak checks, right? You've got this one, you can't read it, but it says totally interesting information here. So it's a bit obvious, right? And then you've got a couple of variables, which are password, username, and then there's a couple of... couple of methods. So you have a class method that's called get encryption key and then two instance methods that get password and get username. Then we're looking at the view controller. So when you think about it from our rudimentary understanding of MVC, right? You've got this is where the logic is so you can see some you
know like the login button pressed and you know jailbreak check one and two. There is an interesting one They're called user is authenticated. You'll have to take my word for it. But yeah, it's definitely there. Cool. So let's move on to actually manipulating the runtime of the app. So once you have an idea of what you want to target, the next step is to actually then manipulate the runtime and exploit the issue to achieve your objective. So commonly, it falls into a few simple buckets-- reading the values of variables out of memory or modifying them. calling methods directly, typically to sort of exploit poor logic in the application flow, and then rewriting the implementation of a particular function to change the way the app
functions. And we'll go through all of those. There are a number of tools and techniques you can use to complete these tasks. Frameworks and tools such as script and Frida, using a debugger like LLDB, writing your own dynamic libraries and linking them in, or even just patching the binary can often achieve some of the same objectives. I'm glad the photo of Christian came out. That's great. So script. So script is kind of an old tool written by the same guy who does Cydia, who writes Cydia. It is a ridiculous name because it is pronounced script. It is not written like that, at least in my mind. But it has an even more ridiculous premise, which is a programming language designed to blend the barrier between Objective C
and JavaScript. I don't know about you guys, but that just seems ridiculous to me. But it is a really great tool for interrogating and manipulating the runtime of the app. Christian is a hipster. So using script. So you can use it to load scripts, which is why I hate that name. Or you can use it interactively, which is usually how you use it. So most of the time you want to hook into the running app to use it interactively. And it's just the dash P, and then you provide it the application name or the process ID. And then there are a lot of people moving to Frida these days for a lot of the same tasks. But you can use scripts. But I use Script and it's
still a very handy tool. It's kind of got a different focus to Frida and there was a bit of a spat between the developers of both of those around what the purpose of each one is. Script was really more designed for tweaked developers to play around and see how things are working, whereas Frida is definitely more of a security research tool. But we'll go into both. Here's some little tips and tricks that you might want to commonly do. So you might want to get the bundle ID, so just NSBundle, main bundle, bundle identifier. Dumping instance variables, just a little asterisk in front of the object you want to dump the variables out of. Getting all
the objects of a class is also something that's useful. I'll explain why in my demos that you probably can't see. But it's got this really cool function called choose, which basically takes a parameter of a class name and then it goes through and tries to find all the instances of that. For Swift apps, the sort of syntax kind of breaks with Swift. So instead of just putting in module.className, you have to use this other method called Objective-C getClass, right? Because it just breaks the JavaScript stuff, and you get things are undefined and whatever. And then to replace the implementation of an existing method, it's just the name of the class, dot prototype, dot the function that you want to replace, and
then you're just basically replacing it with a JavaScript function that does whatever you want. Usually if it's simple logic, the new implementation's usually not that complex, right? But yeah, we'll go into that. So you can load up scripts. So this little script here just prints the methods or attempts to print the methods of a particular class. So you could type that into the script sort of REPL, I suppose. Or you can just create a .cy script and load it in when you load, you know, inject script into the process, and so you don't have to keep typing it out, right? So, yeah, if you do use that, print methods, the class, it gets all the instance methods, and if you add the second parameter for true, it also
gets the class methods. All right, let's get into the demo that you probably won't be able to see. All right. So, all I'm doing here, on the right hand side, I'm just setting up a tunnel over USB to my device so I can connect to it, which is now what I'm doing here with SSH. So I'm logging in. So now I'm connected to my iOS device, which is running the app on the other side there. And so I'm just now loading up script and injecting into that Objective-C app that's running on the side. And you can see that I'm just getting the bundle identifier so you can see that it is that app.
There's a few cool features. Here I'm just calling a method and the exact same Objective-C syntax that you would do. It definitely does bridge Objective-C and JavaScript. You can see I'm just getting the application instance there. Then you can get the delegate, which gives you the app delegate instance. But then script also has a bunch of shortcuts. So for the application instance you can use UIApp and then you can do things like UIApp.delegate and you get those same instances. Trust me, those things are the same. There's also history and tab completion as well, which is what I was just demonstrating there. Which is kind of handy and kind of nice. So, move on.
This is just demonstrating getting the instance methods using the asterisk shortcut. That's for the app delegate. Now we're going back to the class dump. You'll start to see why this is a useful document. So here we're going in this totally interesting information here class and we're using it as a map to sort of help us navigate through the runtime. And so here I'm highlighting this get encryption key method which seems like something we might want to see what it returns. And because it's a class method, you can just call it using the class name. So I've got the open square brackets. You put in the totally interesting information here class, and then type it. And this comes back with a string saying this is an encryption key. And
this is common. You'll see this all the time in iOS apps. But I'm trying now to call the get instance getPathsWordInstance method, and it throws an error because I'm calling on the class. So going back to these OO principles, you can't call it on the class. You have to call it on the instance, like the actual object. One of the ways that I find out where these instances are, or references to it, is I search for the class name. This is a very small app, so there's this instance here. In the view controller, there's an instance of that particular class. Now what I want to do, and you can see why we call this a
bit of a map, now what I want to do is go to that view controller instance and then read that variable to get an instance. For iOS, the app has a key window property which always has a root view controller. So in this particular case, it's just the view controller because it's a very simple app. So what I'm typing there is uiApp.keywindow.rootViewController. It's just returning the instance of that view controller class. So one of the things you can do here, so going back now,
I want to read this instance out to get the instance of that totally interesting information here class. Should have made that a shorter name. and that's returning an instance of that class. So it's read that variable, and so now we can use that. Now we could type out all of that string, right? You know, this dot this dot this, or you can create a reference. So this is just a JavaScript variable, and I'm calling it info, and I'm just using the instance function in script, and then giving it the pointer to that instance. And then now I've got a reference to that that I can use without having to type it all out. So if you just see, I typed in info and it came back with that instance.
So now that we've got that, we go back to the class up and we say, OK, well, let's try and call these instance methods now. So we can just reference it using the info variable that we created. And then we type in get username, which comes back with username, and get password, which come back with password. And then we can also read the variables as well with just the reference that we created, info.password, which is what we're doing. All right, so yeah, password, there you go. I made that different to the actual password for the login just so we do some different things. So now what we're going to do is have a look at this jailbreak check. So the
first jailbreak check, we hit it and it says jailbreak check failed because obviously we are on a jailbroken device. So again, let's go back. I mean normally we just search around for jailbreak, but it's right there because it's a tiny app. And you can see here we've got an instance method that says check jailbreak that returns a Boolean value. So likely what's doing the check, it's returning true or false depending on whether it's jailbroken or not. We can see here trying to do what we did with the last one where we go through and we're like, okay, Here's the different instances of that. We're not finding it, right? What we see here is this class method called sharedJailbreakManager. That's a common pattern in iOS
apps. It's called Singleton pattern. Basically, that returns the instance of that class, typically. So basically, when you're going through a much larger, more complex app, you do a search for shared and you can see all that kind of stuff. So here I'm just calling that class method and it's returning an instance of jailbreak manager, which I can then use. I'm creating a reference called JBM to that particular instance. And then I'm now going to call that instance method called checkJailbroken. using that reference that we created and that's going to return true, right? Because it is jailbroken, right? So what we want to do now is change that to return false and see if that defeats the check, right? So this is where you know
you could you look for things like simple logic that can be you know like flags that can be switched over and whatever. So here we're using the syntax that I spoke about before where you've got the class name dot prototype dot the method name and then just equals and then we're just changing it with an anonymous function that simply returns false. So that'll return false all the time and now you can see jailbreak check passed, right? So we've now defeated that logic and now we're good, moving on.
So yeah, this is just demonstrating the choose functionality, right, and why it's useful. So you saw how we could kind of navigate throughout the app to get those instances, or choose just returns all the instances of a particular class as an array that you can then just use, which is kind of cool. All right. So we have more on this. So now we're going to look at the authentication, right? And see if we can bypass that because it's just local auth, right? So here, you know, just to show what it does again, you know, you're typing in a password which is incorrect. You hit login, it says password incorrect, right? So we want to see what
we can do. So again, going back to the class dump. So again, simple understanding of MVC, right? Where would the logic for this be? Right, be in the controller. So we've got the view control here. You can see here you've got some buttons and some different actions, so likely to be where this is. So you can see like login button pressed, so that's obviously doing something when you press the login button. But then there's also this other interesting one here called user is authenticated. So what we're going to see is what happens if we call userIsAuthenticated directly. So again, we need to get the instance of the view controller because it's an instance method, which we're just using the same technique that we
used before, uiApp.keyWindow.rootViewController. See there's the instance there. And then we're just using that and calling it. I'm just going to minimize that just so-- yeah, I was just checking what it's called. Although you've got tab completion and script, so it doesn't really matter. Just minimize that so you can see that it works. and they got possible correct now this is actually quite common in IOS apps where you'll find this kinda process where developers will cut abstract everything away so what they'll do is all do like are the button was pressed and then they'll call like it may be a check function to check check the creds or whatever, then after that it will call, now go and display
this view controller. What you can do with that is, particularly with local auth, is if you can basically short circuit it and jump around the check, either you could change the check to return whatever you want it to be. Often it won't work because it'll require a password input that you don't know. but you could brute force it maybe, writing a little script, or you just sort of short circuit it and go to, hey, present this view controller. And if it's local auth and it's sort of been authenticated before and there's data populated in it, it'll work. For apps that use a backend API to populate it and it uses typical session authentication, that kind
of technique doesn't really work. Even if it is possible to do that in the app, you'll jump to that view controller, but there'll be no data because it's not able to actually get it down from the API. But you know there's a lot of apps that do that are interesting that do use local auth the one that I didn't do here But I usually demo it is Evernote like the pin code controller You know you can easily bypass that and other things like that so so next oh Yeah, all I'm showing here. Is that that change that we made to the jailbreak check? that's at runtime so when we close that app and we fire it up again and it's not persistent, right? So it's saying now
it's failed again, right? So there are ways to make it persistent, and I'll go through some of them, but yeah, that's all I'll show you now. So let's move on to Frida. So Frida's kind of the new hotness when it comes to messing with mobile apps. It's from their website, a dynamic instrumentation toolkit for developers, reverse engineers, and security researchers. It essentially just injects Google's V8 engine into a process so you can execute Java in the context of that process, access memory and all that kind of stuff. Frida can be used in many ways and it's really great sort of framework toolkit and I recommend looking into it. It has many different bindings to all
different kinds of languages that you're familiar with. So if you don't like JavaScript, there's Python and whatever. It's mainly used to write scripts and tools, but it also comes bundled with a bunch of tools that you can use to get an idea of what it's capable of, help you with your scripts and stuff like that. Those new tools that I mentioned earlier at the start of the presentation utilize Frida and rely on it often quite heavily. So it's good to understand how Frida works. So Frida comes bundled with some tools that you can use right off the bat. So forgetting about writing scripts, you've got Frida CLI, Frida PS, Frida Trace, et cetera, et cetera.
Frida CLI and Frida Trace are probably the ones that are most immediately useful for pen testing. We'll go into that in a second. So now we're looking at the second jailbreak check, right? So you won't be able to read it, but it says jailbreak check fails look harder for the check right so the text is kind of different and I'll explain why that is in a second. So, okay, following the process that we've become used to, we go back to the class dump, we start looking for jailbreak, right? We come up with the jailbreak manager, but that was for the last one, right? It's obviously not controlling this other one. So, you know, and then
it's just going through, like, the buttons and the other references to the jailbreak check buttons and all that.
So you might start looking for like root or like check or whatever and it's not really working because it's not there. So what can you do? This is common, developers will try and hide these sorts of checks. Ones I've come across, for a banking app it's like, "Get store location," or whatever, and they try to hide it. I have one where they just made all the security stuff random strings, which just made it stand out in the class dump. I'm like, "Well, I'm going to look at those." It must have been really difficult to A little trick I do in those sort of instances is I look at the text. Usually there will be a
pop-up. You can see that the text is different. Well, you can't see it, but I'll vouch for it. It's different. It says, "Look harder for the check." A cool technique to use is you fire up a disassembler and you look for that string and see where it's being used. Just firing up IDA here. and basically going and doing a search for that string. I've used this multiple times on tests to see where these hidden jailbreak checks are or other kind of sensitive stuff. You can see here, well, you can't. It's in the view controller, this jailbreak check two button is kind of being referenced here. You can see there's sort of two branches. On that right hand side where my mouse is, that's the positive branch where
it says you're not jailbroken. And this is the one that has the text that says look harder for the check. So this is obviously what's happening. So if you scroll up to before that kind of branch, You can't see the text here, but what I was trying to point out here is you don't even need to really understand any of this assembly to kind of understand what's going on. That's saying shared application, the next string down is delegate, the next string down is a function name. So basically what that's saying is that there's this function that's being called that's named that, which you can't read, and it's in the app delegate. So let's go back
to our class dump and go back to the app delegate and have a look at that. So have a look at the app delegate. There's that method. And it kind of looks very similar to the other methods, but if you've done a lot of iOS testing, you'll see all those other methods are boilerplate methods for state transition stuff that automatically get generated for you. And this one was kind of designed to be a bit sneaky and kind of blend in with that. It's called application terminates after background and it returns a Boolean value. So I've basically written a script in Frida to modify that, right? So the first variable there is just the class name, then the function that we're hooking, so that was what we found.
Then the next is just building a hook string and then this is really what it's doing. So this is the interceptor, it's basically attaching and using the hook and then and basically saying change the implementation and then it's calling this JavaScript function here which is all it's doing like the rest is just sort of you know console output, but all that's doing is taking the the return value and Changing it to zero from one so changing it from from true to false and so we'll always return zero which will be false so If we go back to Frida, all I'm doing here is Frida-U, which is connecting to my USB device, then referencing the script, injecting into
that. So just checking that it was still failing at that point. Just so you could see that. And now, you know, it's now, Frida's running, and I hit that button, you can see now it says jailbreak check passed, right? So that's now, and every time you press it, it's running that function, and, you know, it's changing the return value. So you now bypass that check. And that's often a more persistent way to do it on a pen test if you want to make things a little bit more persistent. Let's race through Swift apps because I don't have a lot of time. But yeah, so as I said, increasingly developers are using Swift apps to write iOS code. And it's impacting
some of the techniques and tools that you would usually use for Objective-C applications that we've discussed. In general mobile app security sense, like testing Swift apps isn't actually all that different except for some of the stuff that we'll be talking about. Most issues in iOS apps, like any other app, right, it's due to poor design decisions, misconfigurations, or like incorrect implementation of like system frameworks, third party frameworks, stuff like that. But yeah, what's really changed is how you sort of reverse engineer the application. So Swift, everybody kind of knows about Swift, like I'm rushing through some of these less relevant bits. It's created by Apple. Eventually, the idea is that it replaces Objective-C. Here's the
basic syntax. It's a lot cleaner than Objective-C. You've got mutable values and immutable values, let and var. It's Swift and first type, but you can be explicit with your type as well, which is with the colon and the type. This is a class declaration. You've got your class, and then you've got your properties. You can have a property with a default value. Yeah, cool, 10 minutes. Probably with the dot default value, then you've got your initializer, which basically initializes any of the properties that don't have a default, because obviously when it gets initialized, it just uses the default value. The class functions, class methods, are denoted by the class keyword, and then functions are denoted
by the func keyword. And so you've got class methods, instance methods, parameters. The only thing that's really kind of interesting here is... here with this one where it's like instance method with an exported parameter name. So look here, you've just got the parameter name and the type. Here you've got an exported parameter name and then the parameter name and the type. And I'll explain why that's relevant. And then you've got the little dash and the return type, which is Boolean, and then you've got, you know, you do your implementations. Cool. And then so yeah, to initialize a class, it's just class and then you pass it the various sort of property values that you need
to initialize it. And then call in class method, right? Like Objective-C is just calling on the actual class. And then instance methods you get called on the... on the actual object and then with an exported parameter name you need to put in that exported name whereas in the middle example you actually need to put in the parameter name, right? Just put in the actual argument. All the usual types are there. I'll skip that. Objective-C compatibility and interop. It uses the same runtime environment. It still supports C and C++ in the same app, but you can't call C and C++ app unless it's in the same app from Swift like you can with Objective-C. You have
to go through a bridge or just have it as self-contained code. It can allow for some dynamic features and runtime manipulation when you've got that interop, which is most applications these days still. Other Swift features, barely scratched the surface, Unicode, so that's like valid Swift. Cool. So reverse engineering Swift applications. So there are some challenges in reversing Swift applications. It's less dynamic than Objective-C and less flexible, so it can make it harder to get some of the information that you get out of the Objective-C, like the class dump and stuff like that. It's less of an issue when you've got a mixed application, but it's still harder. It's limited tooling. Most of the tooling isn't being updated for Swift. We'll go into that in
more detail. As we went through, the most common and easiest way to retrieve class data from an Objective-C binary is class dump utility. It's one of the first things you do. You've seen how useful it is. This is what happens when we run class dump z up the top and the regular class dump utility on our pure Swift app. You get nothing back, right? It doesn't work. All right? Sad face. So, what's next? So, class_tum_z and class_tum don't work with Swift binaries. Now what? Let's start diving into the binary. So, what happens if we dump the symbol table? Well, we get some interesting information. That looks kind of interesting. Kind of looks like some class information there. So, what happens if we look at something that we already know
is in the app, like the app delegate? You can't see that because this projector is not great, but it's coming back with a bunch of symbols. And a lot of this stuff in the middle there is some of those, remember I was mentioning those boilerplate methods, right? A lot of those are there. So this looks promising, right? But it's really like a far cry from the output of ClassDump, and it's kind of hard to make out. But the reason for that is Swift saws metadata, but a function in its symbols, and in the process it mangles the name. So this is a rough sort of translation of one of them, right? This is a class
and a function that's in the app. So the underscore underscore T denotes the Swift function. Then you've got the module name prefixed by the length. You've got the class name, also prefixed by the length, class method. The C, sorry, denotes class method. Then you've got the function name prefixed by the length. Then you've got the return type, which is the SB, which is returning a Boolean. The Y and the F and the Z are things like string protocols and stuff. I don't know why that's on there for this particular function, but it is. Whatever. That link down there has a really detailed explanation on the open source Swift GitHub page around what all these things
represent. You can do that if you don't want to go drinking or something. instead. So Apple includes a utility called SwiftDemangle that you can use to demangle the names. That's just showing that same mangled symbol, but with just some of the different options. So by default, it spits the mangled symbol back out at you, and then the other one is a demangled version of that. And then you can simplify it, or you can make it more complex and get more information. So with that, you can basically create some kind of equivalent sort of classed up, which is what I've done, right? It's like a simple little script to dump classes and function signatures from a
Swift binary. I put it together last night. I didn't really sleep. I'll put it up eventually when I fix it all up on labs.astronaut.io. It's pretty hacky, but it does the job. Eventually, I'll get around to adding some more features and stuff like that, but I'm in Vegas. I'm going to be partying, so it'll be next week sometime. Here's what it does. and you're not going to be able to see that. But basically, I'm just calling it and passing it the binary. And then it comes up. You can see here. You've got the classes down the side and then the associated functions on those classes. There's the app delegate, the view controller, and this jailbreak manager. Then you can see the function signatures,
which show the function name, the return type, any parameters. Why the exported parameter names became useful is the exported parameter names will actually show up in here, but if there's no exported parameter names, it won't. So you kind of have to guess what those arguments are. Usually it's not too bad in iOS apps because of that design pattern that I sort of mentioned, that naming convention, where it kind of calls out what the sort of first parameter is. So you can kind of get a feel for it. Other options, you can use Frida or tools that are based on Frida to get some of the way there. It's not all the way, at least with
my limited knowledge of Frida, but it's also a useful option, which I'll demonstrate now. Basically, this is just using the Frida CLI tool. I'm just injecting into the process of this Swift app. Then I'm using some of the inbuilt functions. This is just objc.classes and you can see it's coming up there, or you can't see. That's just got swiftdemo.gelbreakmanager and it's basically showing all the classes. You The syntax, if you're interested, is kind of different to what you'd use for Objective-C classes. If you want to reference that specifically, you have to basically do objc.classes, then put it in square brackets with the module name and the function. One of the things that doesn't work with Frida, which I was demonstrating here, is you can't get the methods of
that class using the standard way that you get methods. You can get inherited methods, but not like the methods that are implemented in that class. So things like the jailbreak check method that's in that class are not showing up in that list. So one of the things you can do, and I'm thinking about maybe doing that, is you can use the module the module class and the enumerate symbols function and basically get a list of symbols and you can kind of do what we're doing before with demangling the symbol name. So you could build in that demangling logic and have that in. So basically all this is doing is getting all the symbols for that
module. Cool, so I'll skip that. Other options, you can use a disassembler. There's a link to some stuff that, some plugins that automatically disassemble it. Dissemble it function hooking less lag. I'm going to smash through this because I've got a few minutes. It's still possible. It's much easier with Swift, mixed Swift and Objective-C binaries. You can still write tweaks for mobile substrate. This is super simple class. And basically what we're going to do is we just want to change that variable. It's got variable and int, and it's got the initializer, which sets that. So you can hook the getter method, and that works. So the getter method, you're changing it to return 10. You can
hook the setter method and it kind of works. So you can hook the setter to set it to 10. But certain functions in Swift are inlined and the class constructor is one of them, so the initializer. And that's what's setting instance variable in this case. So the setter's only called again by the top level code. So if you call from there, it works. So that's why I say it kind of works. And then changing instance variable directly, it works, but it's probably not a good idea because you can mess up how the app functions. The end. I did it. Cool. I don't know if I have time for questions. There's some. Do I have time? Okay. There's one down in the front here.
I was just wondering, for iOS, I have done some work on the Android side, but I haven't really looked into iOS. But I was just wondering if you see much of obfuscation in method names or function names? Not in iOS. Not like you do commonly in Android. No, it's definitely not common. I have seen it after we've done pen tests, where we've exploited some functions, and then they just change it to random strings. But it's like the same functions. But no, it's definitely nowhere near as common as it is on Android. Cool, thank you. - Yeah, there's another question down here. - Hey, great talk. - Thanks, man. - I just wanted to ask you
what's your approach when trying to search dynamically for keys or passwords or all these information that you can get dynamically? - So it's not interesting. It's literally, I'll go through the class dump and I will literally just, you know, grep search, whatever it is, and like look for interesting stuff, right? and then sort of just scroll around from there. So look for interesting classes look for you know I'll search like pin password key whatever and just go through the whole thing. So nothing crazy, nothing super sophisticated. The rest of the questions, please go outside and one more round of applause for Michael. Thank you. Thank you. Thank you. ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ We're working to this. There's multiple networks. There's multiple hosts. And what you don't see here is this is one team. We actually scale this by as many teams that compete. Last year, we had 16 teams in the regionals, and then we had 10 teams at the national competition. 23 regional. Okay, 23 in the regionals and then 10 for the nationals. So it was this infrastructure times 10. So you can see like we have a really big problem. We have to set up all these hosts and then manage them. You also don't see in this diagram all the jump boxes and all the team's infrastructure. So this is just the competition network. There's actually about double this when you take into account the
jump boxes. Like every member of the team has a Windows and a Kali Linux infrastructure. instance running in the cloud as well. - Yeah, the VDIs are how they hit their environment. It's all segmented and compartmentalized too, so they can't leak out of their environment and hack the school or whatever. - Last year's regional was 1,471 hosts. - And another thing, I don't know if you can see it in this diagram, is that we actually do it across providers. So we'll do it in multiple clouds and across different providers for resiliency. This is a lot of infra and it sucks. Pretty much this is a big problem and we've had a lot of developers and
all the developers approach it from their own perspective. So we needed a tool to kind of unify everybody and bring everybody together so we could write this stuff and the tool needs to be simple, it needs to be an easy language to write, like this is the first language people are going to be writing so they can't be learning something new. It needs to scale, but it also needs to work across multiple providers. So we couldn't just use something like a traditional DevOps tool like Terraform. We had to kind of come up with our own solution. And like I was saying, we want resiliency. So we don't want to just stick to one cloud provider.
We want to have multiple cloud providers in case one goes down. So the solution, LaForge. - Yeah. You guys like this slide? Yeah. I ran into a nice little slew of Geordian data online little anime characters, and so I just had to throw that in there. Yeah, Dan did a great job kind of covering why we needed to go out and build our own tool, right? This isn't in a talk about us hating on traditional DevOps tools like Terraform and Ansible. They're really good tools, right? The people that made Terraform are brilliant people, but they made it for use case that just wasn't ours. We took a lot of inspiration from them, we took a
lot of inspiration from a lot of these other DevOps and operations companies. You'd be surprised, us being security people, how little we actually know about that space. It's a whole industry that really operates just almost like ours does, right? There's DevOps conferences, there's people that go around and talk at conferences about this stuff, but they, it's a science to them just like it is to us and we really don't take the time necessarily to go out and learn it. So part of this on our side was going out and actually trying to understand the current space of DevOps and realize where we needed to be and what effort we needed to put in on it.
You know, we're going to build all this competition infrastructure. Well, you saw how big it was. Am I going to just have a hard drive with OVAs on that? That's not going to work in today's age. We need to have our infrastructure as code. We need to have it reproducible. We need to be able to say, I'm going to go build these five hosts, and then I'm going to hand them to Dan, and Dan's going to press a button, and he's going to get the same five hosts that I just built. and then I can add one thing, ship them back, and we can merge and redeploy a test amp. - Exactly, and that was
something we saw as security people, is we're not developers, we're not dev op experts, so for us, a DevOps person might find Terraform really easy. We wouldn't, right? They're just not tools we're used to, so for us to be able to have a group of people building this infrastructure, we really had to just go out of our way to try and make it easy for them. - Yeah, most of our team is volunteers. They're doing this on a weekend. They have a month to kind of help us put together this infrastructure. - Right, and so let's kind of talk about the history of LaForge here for a bit. So the very first version of it
was this really embarrassingly large Ruby script that I wrote that just, I was writing Terraform configs for CPTC and I got really sick of writing the same 25 lines over and over and over again. And I was like, oh, I can just template this out with Ruby and make it happen. And then as we scaled the competition out, all of a sudden, to generate my Terraform templates, it was taking literally 10 minutes. So we went out and said okay, how can we solve this problem? It was about the same time for those of you, I see a lot of CCDC people in the room. I love you guys, this is not a talk where I'm
going to talk about how I hack you. But we were getting into Go for those reasons and we decided let's see, maybe we can rewrite the Forge in Go. So we took a stab at it. It went really well I would say. It's kind of supported us in that. that V2 version for the last two years. - It was really fast. We could iterate on it, we could ship a single binary, anybody could use this single binary. So it was nice, it was like, you know, redeployable. - It really was. The biggest problem with LaForge V2, honestly, 10 minutes, thank you. The biggest problem with V2 LaForge in my mind was that I just didn't
understand my users enough. I needed time to put something in front of them that they said, "This sucks. Can you fix this?" So after another year of us just watching how people used V2 this past spring, we decided, "You know what? Let's throw that one out. Let's do it all over again." Some of the big problems we had, and we'll kind of talk about it. Yeah, they don't really talk about it on that, but it's fine. Some of our problems were really around how we decided to make the structure right so when you use the forge last year you had to Clone a repo you had to set up some dot files in your home
directory you had to set some environment variables and you had to go and you had to know where everything was in this folder structure and if you didn't know where something was, you might have messed something up and it would just break something for a lot of people and then you'd be like, oh, what have I done, right? - Git pull? - Yeah, and that really was a difficult experience for us. It showed me that a lot of the problems with the Forge v2 that we needed to solve in v3 were really around usability. It's really fast at generating these configs, it has a lot of extensible uses, but it's just not, people aren't finding
it fun and easy to use. So what we did this year is I sat down and I said, okay, what are the tools people really do like using that they use every day to collaborate with other people on, right? And the two tools I landed on were Git. How many people in here do a Git command at some point in their week? Yeah. Awesome. Does everybody find that just easy? One of the things I really loved about Git is you could be anywhere in a directory and Git knows that you're still in the Git repository even if you're five folders deep in that. That's the kind of just implicit awareness that I wanted this software
to have. The other tool that I took a lot of inspiration from on this was Docker. I'm not a Docker expert by any means, but as I've started to play with it the last couple of years, I've noticed that they do this sort of layered approach to how they build these containers up, and I think that's a really interesting thing. You don't have to rebuild everything every time, right? Compilers have been doing this for years, caching objects and whatnot. So figuring out a way where we can take the best of both of those worlds where the the nature of it is just implicit. You don't have to go and explicitly say things. And the actual
tool itself doesn't just have to constantly just do its state over every time you build. So we got rid of YAML, because YAML, if anybody's ever used YAML in a distributed environment where somebody has Windows, another person has Mac, yeah, I see Doug was over there just shaking his head, right? You get one person with Windows line endings and you're done, the whole thing done, done and parse. So we got rid of YAML and we were like, oh that'll solve our problems, right? It didn't because we still hadn't solved the underlying issue which was we needed to make it easy for people to use. - Do you have an example of what that looks like?
- Yeah. This is kind of the structure of the old YAML here where you had all these folders and you had all these YAML files. And again, is this easier than writing a Terraform config? Absolutely. Do you need to learn Terraform? No, you just need to read the comments in our YAML and you can probably just, you know, really just figure it out. But... Set up some networks, set up some hosts, run some scripts. Exactly. So what we've done in v3 with LaForge, and we're running out of time so I'm going to be real fast with this. We basically made LaForge contextually aware, just like Git is aware when you're somewhere deeper in a folder
structure, Git knows that you're there. LaForge knows implicitly where you are and depending on where you are, knows how to load the dependency graph for that environment. Now what do I mean by that, right? We talk about those Docker layers, This is exactly that concept here, where every time you go to a different location within the folder structure, LaForge really just knows where you are and can overlay the configs on top of the dependence of the environment that you're in, right? So this is a little screenshot of LaForge now, where I can just run a command, laforgedeps, in any of the directories inside of our info repo, five minutes, thank you, and It'll show me,
okay, so my main is the build right there, the build context, and that's just inheriting down from those, right? And what's nice about this, here's another command, forage status, shows you at any point in your repo just like git status, what your current context is. Here's kind of an example of why this is really powerful in this situation. So we have our own little config language called LaForge, the files are .laforgefiles. It looks a lot like HashiCorp config language. We use some of their parsers and ASCs to implement this. So you have a host config, which is just this block with an ID and some variables there, right? And that's in a global context. Over here in your inf context, which is yours, this is shared, this is yours,
you decide you want to just make a change to host ext1. So you make a change. Well in new LaForge, this just inherits this way and you just patch over the top of it. You didn't have to rewrite his config. You didn't have to go in and figure out what needed to be changed or what didn't need to be changed. You just said that for my environment, host ext1 needs to just be slightly different. in this particular way. And that's what makes New LaForge so powerful, it's no longer you just having to drop hundreds of YAML scripts in a folder and hope everything works. Effectively, right there it shows you, now in my environment, that's
what ext1 equals. It's non-destructive, so anybody, if Dan has in his environment, if he's extending ext1, he doesn't see any of those changes. And we can even say how they conflict and what happens when that happens. So another pragma that's inside of the new LaForge configuration language is an on conflict block. You can actually describe if you are going to collide with an object how you want that to be handled. So in the previous example it was just going to do a soft merge between the two. In this configuration it's just not going to accept any of his configs and say no, the only thing is valid is mine in that state as bare as
it may be. So how does this happen? How does this know what it needs to be loading? That's a hard thing to cover in 90 seconds, but we'll give it a shot. Graph theory in 90 seconds, great. So I took a lot of this stuff in school, and I'll be honest, I'm pretty sure I got C's and D's in it, but I tried my damnedest. I think I learned something here. The whole concept of Dijkstra's algorithm and switching and cost of paths and stuff like that is fascinating to me, especially when you start looking at dependency graphs of software. When you have files that include other files, at what point do those things get loaded in? That's exactly where the magic behind NULA Forge. We have A, imagine that
like our in context configuration, that's gonna load B and C, and C's gonna load D and E, and D it's gonna also load F, right? So if you think like six degrees to seven, six degrees to Kevin Bacon, right, like there's a, a stage that you can count incrementally on how far away from the center that you're getting with these. And that's exactly what LaForge does, is when you go and just, and anytime you're in your repo, when you do a git status, a LaForge status, LaForge depths, it walks your entire dependency chain contextualized to the directory that you're in. and starts at the first and just like Docker, just starts building up the layers
of the config, one on top of the other, applying the differences, using the on conflict pragmas to make sure that everything is keeping sane, right? This is actually what you end up with, right? Layer one gets layered on by layer two, layer three, and that kind of stuff. So your environment no longer has to be a bunch of duplicate code of Dan's. I can actually just extend Dan's environment in particular ways and have everything that Dan just did, and now I can just work on it at my leisure. So there's a new preview of it. It's live. Be gentle. It's alpha preview, V00, pre-beta, whatever you want to call it. It's still in flux. You
can help us. It does work. It's there. If we want to step outside afterwards, I'll give people a little demo of it. It's really, you know what? Oh, I'm on your laptop. I can't give the demo. It's fine. Yeah, we're done. Yeah, so you can go check it out. The code's there. It's in Go. If anybody here is a gopher, please hop into it. It's kind of fun and wacky. TLDR of it. We can build competitions super fast. I think CPTC last year probably had more infrastructure than almost any collegiate competition I've seen. In a long time, we did it with a very small staff of people, and it's not because we're really good, we just automated ourselves into looking so good. - Yeah, and it scales per team,
so if you stand up one environment, you get 10 copies, it's great for competitions. - That is the original thing with LaForge, is I didn't want to have to write for 10 teams the same config 10 times. I just wanted something to build me 10 configs. And now LaForge does that with a bunch of bells and whistles that make me not look like a failure in math class, I guess. And coming in on time. On time. Thank you. Thank you. A big round of applause. If you have any further questions, since there's no one here, you're welcome to stay here and ask questions. But the streaming will be ending in a few minutes. In five minutes. Great. Okay, cool.
Any questions? Questions? Yes. So the question was, what are some of the things you can do in the config? Can you give a real world example of something that you can deploy through this and how the deploying works? Yes, I can. Just like a single host. Great question. Great question. I can zoom in on this. OK. So this was our dev environment for Gotham elections. There are hosts that have dependencies here. So for example, DC02 needs to DC promo before backups01 joins to that domain. there are DNS records in the GitLab box because the GitLab box deploys code into the production environment. We don't build these competitions to just stand up some meta-sploitable box. We build real, real integrated applications. - Like
Gbug is gonna be catching logs from all these other services. - Exactly. I asked myself, I don't know how to use salt or any of this stuff, how do I deploy this stuff? Well I know how to write shell scripts. Everybody knows how to write shell scripts. So we probably could have done a better job covering that. With LaForge, to actually do the configuration, like yeah, you do your LaForge little markup statements and whatnot, There's a section there where you just define scripts. And those scripts are just shell scripts or PowerShell or whatever you want. You can define Python scripts and if there's a Python interpreter it'll try and push it in there. So in the new version we can actually put dependencies on the scripts. So
we could say, "Stand this up as infrastructure, run these scripts, but then wait on these ones until those are done." But at the end of the day, configuring GitLab doesn't require me learning Salt. I just have to write a shell script that knows how to take Ubuntu 16.04 and builds GitLab out. Which is great for volunteers because, again, everybody writes shell scripts. They give it to you. It just configs the thing. One of the powerful things about it that I saw last year was for the first time we were able to solicit volunteers to help contribute to this. So cost... Actually was prime example this came in three days before the competition said I want
to help how can I help in any way? We're sweating. We're just like Don't talk to me right now, but now it hit me It's like he doesn't need to learn this Kyle go write me shell scripts that like do cool and interesting security things That's it. Yeah, and just make them small and give me like seven different ones so that I can just kind of scatter security things around this infrastructure and then when you go back to score I do scoring at CPTC, I just go through the shell scripts because right there I can be like, oh yeah, he put a vulnerability there, there, there, there, I just audit the config and boom, I
have my rubric for how we're gonna score that year's competition. It was really powerful, we had probably a half dozen people last year at CPTC contribute scripts without having any knowledge of the environment, how LaForge worked or anything. it also lets you design the environment and the networks and the configs and then kind of push that downstream and other people just stand up and config the boxes. Exactly. Yes? So the question was, are there any pre-built sample environments or environments you can just get up and running with real quick right now? So there would have been had we have just said, yeah, we'll just release V2. V2 code is out there, but the V2 configs are not. As I'm going through, like I was up Last night, writing the
readme on GitHub, I'm gonna be putting up examples and stuff like that on there. Part of my new development methodology with this is to actually just start writing an entire specification to test against, and that spec itself will be an example environment. There's also a sub command if you want to know how do I configure this. You just run the forage example script or host, right, you saw the little host block. The forage example host will literally print a host block with all the parameters, then you can choose which ones to fill in, just copy that into a text file, and it's there. The one issue is you do need a provider, so you need
like AWS or keys like that, right? You don't need it to play with the model. Okay. So you can build all of those relations and dependencies. You can't spin them up. Without actually spinning up something in AWS, right? You can play with the language. You can play with the configs. You can play with inheritance and layering stuff on top of it. We are developing... we've developed, now we just have to develop the plugins itself. LaForge is now all of, it's gonna compile to Terraform or it's gonna compile to Vagrant. That's all now extended out into a plugin system. So I'm actually writing the Terraform one right now. We've got a null one, I'm gonna do
a native one that just spits out a shell script. It's a lot easier, the old LaForge you wouldn't have been able to say, "Yeah, let me just write one for Vagrant." That's not gonna happen. But the new one totally will. If you don't like Terraform or Vagrant and you want to use something else, let us know what that is. And if it's easy to do, I'd love to make a little plug-in for it. Thank you, Alex and Dan. For any other questions, please go outside and have a wonderful lunch. Cool. Thank you, guys. I thought we would just, you know, I thought I'd just leave my mic on.
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
Testing, yeah.
Good afternoon. Welcome to Las Vegas for Las Vegas B-Sides. I would like to present Tala with his talk, Securing Robots at Scale. However, before we get started, we need to thank our sponsors. We definitely need to point out our Inner Circle sponsor, Rapid7, and other stellar sponsors, Amazon, Oath, Talos, and Semmel. We'll also include Endgame. They have a whole list of sponsors that we thank, but a few of them just get an occasional billing. It is their support along with their other sponsors, donors, and volunteers that make this event possible. Now, a few notes. Cell phones. These talks are being recorded and streamed live. They will show up on YouTube later. If your cell phone goes off, you will forever
be remembered as the person whose cell phone went off in this talk. So please don't be that person for me. If you have a question for our presenter today at the end of his talk, you may raise your hand. I will come find you with this microphone. You can speak into it so that it is memorized, remembered by the technology over there. Otherwise, people on YouTube later will have no idea what you asked, and it's confusing for everyone. Other than that, let's go ahead and get started. Tala, it's all you. Thank you. Thank you everyone for coming. Welcome to B-Sides. Very excited to talk about our journey to secure home robotics at scale. I'll talk about a few things today. Basically, you know, robotics
platforms, what comprises of a robot, different components and sensors in a home consumer robotic space. You know, some unique threats and attack surfaces for robotic systems. Then I'm going to talk about security and trust models for consumer robots, basically giving some analogies between your IoT devices, your other hardware devices, and how some of the attack surfaces are very different from the consumer devices that you are accustomed to. Then I'll spend some time talking about some of the security foundation work that we've done in the robotics space, and then basically talk about some trade-offs and challenges between security, privacy, and safety aspects, some of the hard decisions that we have to make and every other company has to make when shipping a
consumer device. And then lastly, I'm also going to cover some interesting privacy work that we've done as it pertains to robotics as well as consumer electronic devices. Quick introduction, I've been doing security work for 15 years. I started my career with NCR. Did a lot of payment security work on terminals and ATMs and check processing machines and point of sale machines, both hacking and securing them. Then I spent some time at Microsoft doing security engineering for Azure. Transitioned into a consulting role with PwC. Did a wide variety of security work all across the US. Then went to join a startup called Financial Force and now I lead security and privacy engineering at Anki. For those that
don't know what Anki is, we are a startup based in San Francisco. We primarily focus on consumer robotics and artificial intelligence and machine learning focused on vision and voice and other aspects that go into robots. About 200 people team. We've shipped about 1.5 million robots as of today. Some of the robots are in the toys and gaming and entertainment business. They were the best selling toy on Amazon for the last two years. And then a lot of work that we have done, we've also open sourced, and a lot of developers and robotics professionals are basically using our tech to create more interesting robotics applications. Before I delve into the attack surfaces, I just want to get a brief overview
of, you know, there are different kinds of robots. There are industrial robots, consumer robots, big robots, small robots. Some are autonomous, some are semi-autonomous. You know, some have high degree of assurance, some have lower degree of assurance. But the way to think about robots is that space is huge and different types of robots have different security requirements, different threat vectors. Our focus today for this talk will be home and consumer robots. To give you a sense of where the robotics market is today, robots are everywhere. They're toys, they're companions for the elderly, they're applications for home assistance or healthcare assistance or companionship or manufacturing and industrial automation. There are some stats about how fast that market is growing. Basically, we
want to be at the cutting edge in terms of being the thought leader on doing security and privacy work in the robotics space. What is the Anki robotics platform? Sorry. Yeah. The way to think about the entire stack is, you know, a robot doesn't do much on its own. There are a lot of different components and services, both on device, off device, on the cloud, in the applications that comprise for a holistic solution. So we do a lot of the robotics functionality comes from the cloud, you know, things like analytics, voice recognition, the robots that we just launched today. You can talk to it just like you can talk to Alexa in Google Home. So there's voice
activity detection. There's some other interesting cloud functionality. From an interface standpoint, this is how you interact with the robot. It has touch sensors. It can do perception, which basically means it scans through the environment it is in and makes decisions on what it sees and who it interacts with. We focus and invest a lot in the character and the emotional EQ aspect of it. And the reason for that is we want people to have a relationship with the robot. We want people to have trust in their robots. And that also intersects with a lot of privacy and trust aspects that I'm going to talk about. Then moving forward from character, a lot of SLAM, which
is the mapping and localization when the robot moves from point A to point B in your home, all the vision algorithms and all the interesting AI tech that we've developed. And then basically scaling that to more character movement and manufacturing and other aspects. We also abstract a lot of this functionality in our SDK, not all different blocks, but for us, the security models are a little bit different when you think about how the robot behaves in your home versus how a robot can be programmed as a developer. And I'm going to talk about that as well. This is an example of one of our home robots. It has different sensors. It can see, it can hear what you say. It can interact with people in your
home. It has some brain of its own, which some of it is local. A lot of AI algorithms that we've developed and how the robot behaves and interacts is local on the robot, as well as some applications are pushed from the cloud. The way to think about the complexity is our robot comprises of, I mean this robot has like 700 different parts. There are different sensors and there are different systems. There's microphone and there's camera and there's laser sensor and there's capacitive sensors and there's other wireless protocols that it enables and uses for communications. But in terms of complexity of the hardware, even if the size is small, the attack surface both inside and outside the robot is pretty complex.
So I just wanted to give you guys an idea of when you start decoupling the attack surfaces, they are pretty different from your conventional applications. Talking a little bit about threat models, the way to think about this is if you are protecting your applications or cloud services, you mostly worry about application security, data protection, confidentiality, integrity, availability threats as it pertains to your product and service. When you have a physical product in your home and it can roam around and interact with people, then you have to worry about a lot of other aspects. Safety is a good example. If the robot fails, if it's bigger in size if it's providing some utility it can cause a safety risk you know it can be it can have some
environmental impact it can create hazards there are costs associated with you know different applications and functionalities and how do you think about optimizing those for a mass consumer electronic device trust is a huge aspect to us you know robotics especially in the consumer space is very, very new. Nobody knows what it would look like five years from now or ten years from now. But if people don't trust that tech, you know, IoT devices today are notorious for their bad security. We don't want our robots to be, you know, treated that way. We want people to trust our robots. We want people to have an association with robots. We want people to have assurance of if
I'm buying a device, if the robot is supposed to behave in a certain way and provide a certain utility, I can have some assurance and trust on that's what it does, especially if it has these sensors like cameras and microphones and you buy and bring it in your home environment. So plenty of things that can go wrong. I just brought in some interesting examples of what has gone wrong. And just Google or YouTube for robot failures and You'll find dozens of examples. Again, not all of them are security failures, but like I said, from a security, safety, and trust perspective, there are a number of things that you want to be able to basically give
your customer assurance on. This is a funny example. If you go to the Department of Labor and just search for robot, literally everything is like somebody died because of a robot. Understanding attack surfaces. So this basically, you know, the way to think about the attack surfaces, you know, let's say you have a physical device. just consider all the different ways it can capture input. It's not just digital packets coming in and out through the internet. You have physical sensors, you have firmwares and OS and applications and cloud and the robot interacts with all these different components. It receives different kinds of radio frequencies and signals, capacitive touch sensors, the world it sees through its camera, the vision aspect, the voice. So there are a lot
of interesting attack surfaces using those sensors and I'm going to talk in detail what those are. Some of the other aspects are also, if you think about what signals it receives, even the charging ports and the diagnostic ports are big attack vectors. If you have have ever done hardware hacking, you would know things like JTAG and USB and all those ports that people leave open that you can use to basically either debug or hack a device. And then reverse the analogy to what signals does the robot emit that you need to worry about? If the robot is providing some utility, in case of this robot, let's say if somebody just scans through outside your home,
your environment, of what kind of devices does this home has, and somebody sees, hey, here's a robot. It could be a BLE advertisement, it could be a Wi-Fi transmission, it could be some other sensor that beacons out that somebody can scan and basically look for interesting attack vectors. So, there are interesting trade-offs between usability and security of what do you enable for how long. I'll give you an example. For example, when we do pairing of a smartphone with a robot, there's an BLE advertisement that goes out, but we only enable it if you physically interact with the robot. So, it's not like beginning out, "Hey, I'm a robot, and I'm ready to pair all the time." You know, hardware threat modeling, this goes back to, it's a little bit
different from your software threat modeling that most people are used to in terms of how do you build attack trees and how do you do data flow diagrams. It's similar analogy, but if you start decomposing 700 parts and how does each part interact with a different part and what are your security boundaries and where do you start your route of trust and which sensor talks to what sensor and which bus is transmitting what kind of data. So we do a lot of hardware threat modeling down to the sensor level to figure out what are the failure scenarios, what happens if one sensor fails and the robot can't operate how it's supposed to operate and then
what kind of trigger or action we need to take. This is pretty interesting, but it also goes very low level in terms of how you typically do threat modeling of any product. So I'm going to spend a little bit of time on just the voice interface. So this is the problem with voice-enabled devices. If you look at the logical access controls and the digital realm of applications, people are conditioned to use their authentication tokens or credentials of something that is tied to them. You go to a website, you enter your username and password. That's how you authenticate. You can have a two-factor auth. How do you authenticate over a voice interface? How do you treat with-- cases of like a mischievous neighbor who can
also shout to your voice activated device the same way you can or you know your curious child who may just you know send some commands you know this is just a simple example of like you know somebody keeps ordering on Alexa and you know the parrot at your home learns how you order an Alexa and now it's basically talking to Alexa device so it's basically a very different model of authentication and interacting with the robot. So should you treat like voice interface as like an authentication signal? The problem with that is yes, you could add, you know, some step of authentication like, hey, if you are performing a sensitive transaction, I'll add a pin. You could do biometric auth of like, hey, I can do voice recognition of this
person versus that person. But then this goes to privacy and usability concerns around the more granular data you are collecting of your customers, the more creepy you become as an entity, the more risk you gather. For us, the decision we make is we don't want to collect data that is not used for any product. So yes, we could do biometric auth if we need to, but that's not a place we want to be, at least for now. There are other interesting attack factors like, hey, What if somebody just recalls your voice and starts replaying? How does a robot detect replay? Some more interesting hardware attack vectors are when you have different sensors, like gyroscope. Your phone has a gyroscope as well,
accelerometers and different hardware antennas. Gyroscopes are pretty sensitive if they have a really good antenna. You could basically measure acoustic signals, feed it into some advanced signal processing and apply some machine learning, and you could actually identify speech. These are still theoretical academic PhD attacks, but the more sophisticated the sensors become, the more plausible these attacks will be as well. Inaudible voice command injection, this is another interesting one where human ears hear a certain frequency of sound. The microphones can hear even what you don't hear. So what if you can create some voice signals which are ultrasonic, send it to the microphone, the microphone treats it as a command and executes it. And these are pretty
prevalent now. You can see people have started attacking Siri and Alexa devices and Google Home devices where you can craft these voice waves centered to your device and trigger a command. In the iPhone case, it's basically dialing a number or centered text. And the more interesting research in this area is when you start applying all these advanced neural nets and deep learning algorithms. This came out about a couple of years ago. It's called Lyrebird. It's a tech. You can give it like a minute of your voice, anybody's voice. Just grab a video file from YouTube, give it a one minute of audio, it runs its classifying and training algorithms, and then it can talk like you. It's not 100% accurate today,
but it will give you a sense of how creepy this tech can become, especially if you are using these things for like, you know, you know, voice activated biometric auth or, you know, like phone support and if someone can speak just like you, how you speak in the same tone with the same linguistic manner, things start becoming really interesting. So the robot makes its decisions based on the world it sees and its primary interface for that is the camera. You know, we have developed a number of training models and classification algorithms that we feed to the robot and we store them locally. And there are things like if the robot needs to go from point A to point
B, how does it determine the best path when it's mapping the environment, where the objects are, where the collisions are, if it's a person, if it's an object. The problem with that is, you know, the way AI today works is, you know, you build a model, you have some algorithms, you run those classifications over some data and you train it over the data and you make trade-offs. The problem with making those trade-offs is, you know, is this a chihuahua or a cookie or a muffin? If things start looking similar, you can start defeating the model classifications. You could add more you know, accurate sensors, you could combine other algorithms, you can start tweaking parameters, but these
failure scenarios still exist. The best example I can give you is, you know, in the autonomous car industry, this is a big problem where, yes, they have the same issues around mapping and, you know, training their algorithms to recognize different stop signs. So once you train your model to recognize a stop sign, you're basically saying, "Hey, this is what a stop sign looks like. This is where the text will be." But if somebody puts some text above and below the line, now it's basically thinking it's a speed sign instead of a stop sign. And then basically your robot, whether it's a car or a physical robot or a home consumer robot, you are making decisions based on the world you see and the
world the robot thinks it sees. There's a huge class of basically interesting research which is getting huge attraction which is called GANs. They are basically deep neural net classifications and architectures comprised of different neural networks basically attacking each other. This is another example of researchers fooling even the Google's vision to recognize something which technically is not. The adversarial examples are prevalent in deep learning systems. These algorithms are, you know, a lot of them are open source, you know, libraries from Google, training data all over the internet, and that's what a lot of people are starting to use without recognizing all the failure scenarios that come with them. Yeah, Ian Goodfellow, you know, he has the Google Brain AI research. He has
a number of interesting research papers on this topic for reference. So manufacturing. This again is a pretty interesting attack vector. If you think about the applications that the We are a 200 people company and we don't have our own factories and we don't manufacture our own devices We partner with some of the leading companies in the world, you know who have all this tooling around scaling, you know a huge complex supply chain of you know, 700 parts and Bring them to a factory and manufacturing those devices, you know in a scalable manner the problem with that is you know, if you're manufacturing and you know hostile countries or you know factories which you don't have much control over, there are a number of things in and
out that ecosystem that worry us. Counterfeiting is the biggest one. Your intellectual property gets stolen. People start manufacturing your clones. Then how do you deal with basically, and the more premium your product is, the bigger the problem becomes. You know, supply chain compromise goes from anywhere from ransomware hitting a factory to nation states putting, you know, advanced malware and flash chips that you might bundle as storage capabilities in your robots. And there's no easy way to solve these problems, but because your robot is comprising from all these components taken from different supply chain sources, it becomes a big risk. Untrusted manufacturing lines, this is another interesting one. You have to basically trust the factory to perform some privileged operations. In our case, think
about they get very low level diagnostic tools which they can burn the secrets, they can test electronics, they can write firmware, and they can also disable or enable certain capabilities. And you have to basically give those people who are building those devices for you have access to these privileged toolkits. End of life components is another interesting one. you have to make some decisions in terms of cost of what kind of Wi-Fi chip or a VLE chip or a processor you want to use. But in terms of consumer devices life, people use it anywhere from five years to 10 years till that stuff dies. But if those components are out of life, if those chip manufacturers are out of business, how
do you patch or think about security of those components? It's a pretty complex problem and no easy way to solve it. And then gray market is, these are some examples of people basically putting up fake stores or basically taking stolen inventory or even downright copying our exact designs and creating things that look like us. You know, we are small, we don't go after every single body, but as a company it's of huge interest to us of how people are stealing or abusing our intellectual property. So I'm going to transition about some trust model considerations. These are some interesting problems especially when you have a physical device and a robot. I give this analogy of if you had a web application or a mobile phone, you enter your credentials. That's
how you authenticate the user. How do you authenticate a user to a robot, especially if it doesn't have a screen and a keyboard? Do you enter credentials? Do you shout your password? Do you make it recognize you with some biometric fingerprint? If you also have some logical access control outside the physical realm of the robot, let's say a web interface to control the robot, you have your username and password that controls the robot, but then the robot doesn't know that username and password, and how do you marry basically these two worlds? Especially if you start thinking about the developer ecosystem and the applications and the SDKs, things start becoming very interesting of like what is
an optimized trust model for a robot. Which users do you trust? The way we think about our robots are these are family robots. You as an adult, you buy a robot, you bring it at your home. But your family and friends and everybody who's around it, the robot will interact with them. So how much do you trust them? they can give the robot some input. What kind of functionality should you enable for whom? Those are not easy problems. It's a matter of basically usability versus security versus privacy, and you have to optimize for that. Then, you know, what kind of signals and anomalies should trigger, let's say, an adaptive auth. Let's say we detect something is not right. Somebody is abusing the voice interface and basically sending
ultrasonic waves that we don't recognize. Like, what should the robot do? Should it shut it down? Should it beacon a notification? Should it cry? Those are interesting decisions. Then this is another interesting one. Our robot is autonomous. We want it to be autonomous in the sense that it's basically on its own. It only basically responds to commands or actions. But then we also have this realm of people want to write apps and there is an SDK. So how do you transition from an autonomous world to a semi-autonomous world or a developer world? and how does the security model change if somebody's actually controlling it through an app versus if it's anonymous on its own. And what is an autonomous behavior? It could be a
signal on the robot, could be a signal in the cloud, could be a signal on a sensor, and those are not easy problems. Going to talk a little bit about basically some aspects that I mentioned about how much do you trust and where does your trust start in the chain? You know, we design our own electronics, we design our own hardware, but we don't make our own chips. We basically use commodity processors that the mobile industry has been using, They have certain code running on their chipsets and often times it's kind of a black box. I'll talk a little bit about what those things are too. Manufacturing facilities, primary users, secondary users, basically. You have to give
them certain control on how they access and interact with the robot. Talking a little bit about how we think or thought about securing our robots, you know, basically we had some goals about, you know, confidentiality, integrity, availability, cost, trust, safety, and those were like our high-level goals, and then we basically stepped down and derived security requirements, and a lot of controls that we designed are basically optimized for, you know, the security goals that we have for our products. So high level, we basically start with, we want people to have assurance on what the robot does. So there are privacy indicators for different functionalities. If the robot is streaming to the cloud, if it's in listening mode, if it's taking a picture, the robot shows some
clear markers like that green light that you see. There are different visual cues that will guarantee the user, hey, the robot is performing this action. Code authorization, this is something dear to my heart. You know, what we want to do is, there's a huge complex stack of code that runs down from the hardware ROM to the bootloaders, to the operating system and the firmware and the applications that run on the robot, we sign and verify and check for integrity the entire file system. So we want to protect every code that runs on the robot and every robot that we ship, we want to have assurance on if it's performing, if it's running, if it's operating. we can guarantee
that it's our robot and it's running our code and we signed it and the robot basically verifies the boot chain and the code. So in terms of code authorization, basically it starts literally from when you power it on, the boot ROM basically verifies the kernel and the kernel verifies the file system and basically the entire storage of the robot. So both confidentiality and integrity guarantees here. We also encrypt and sign every single update. A little bit about hardware security. Because of problems like counterfeiting and we want to authenticate the robots talking to us, We basically generate crypto certificates and we burn them in the robot at factory time in a hardware key storage. It helps with many interesting cases like fraud and abuse and
counterfeiting, et cetera. There's some tamper resistance IDs that we use for fingerprinting the device and again, if they're not cloned or they're not the robots that we manufacture, et cetera. There's hardware-backed key store which is used for key derivation. So when we think about aspects like protecting data on the robot, whether it's photos or your Wi-Fi password or other data that the robot stores locally, we encrypt the hub data partition. But basically the decision of keys are derived locally from the hardware key and the trust zone which is running on the robot. we try to keep ourselves and our cloud and our control out of this key derivation mechanism. Gives better privacy guarantees, is a better security story
as well. Then we have a huge hierarchy of keys for data protection and then we also think a lot about how do we harden the hardware itself so when we release our production robots, we disable all the privileged interfaces that commonly people use to attack hardware-based products like UART and JTAG and USBs and SPIs. The problem with these is this is often like a one-way street. If you disable them and you burn fuses in the SOC, there's no easy way to reverse them because, yes, you disabled all those things in production. So even we lose certain capabilities as a vendor. It's just a picture of like the trust zone that I talked about. In the hardware, there are capabilities
around what we call trusted execution environment, similar to how some of the Android and iOS phones do hardware-backed key storage and key derivation and privileged operations like password decryption, et cetera. So what about physical security? I mean, this is hard. You know, you'll see people like you ship a product, people will take it apart, you'll have a blog on iFixit, they'll shame you in terms of, hey, they don't have any physical security. The problem is physical security is hard. There are different physical security requirements if the robot is performing your heart surgery versus if it's a toy robot. So it's a very hard trade-off to make, you know, what's a good enough physical security. That question is not easy to answer. You know,
you have to balance between cost of a mass consumer electronic device versus, you know, customer expectations versus serviceability. You know, you could make a really secure hardware and add a ton of temper resistance, but then you make, you know, it harder for service and basically harder for, you know, plugging in different parts. There are also challenges about intellectual property protection. People hack your devices and they want to get to all the interesting tech that you've built and run on the robots. And then, like I mentioned, there are different requirements for if it's a consumer device versus if it's a commercial device. And oftentimes, the capabilities of the robot derives what kind of physical security you want in the product.
So another area which we did a lot of interesting work was around application and device pairing. And the issue there is a lot of devices have these notions of paired over Wi-Fi or paired over VLE and you pick up a smartphone and you enter your credentials and your application discovers the robot or the device that you have and some A lot of manufacturers rely on the Bluetooth. The problem with using Bluetooth as itself is that whole space is broken. There are tons of vulnerabilities in the stack, in implementations, in the chipsets. So we designed our own You know crypto mechanisms and protocols on how do we do key exchange? How do we do authenticated encryption? How do we do bonding? some of those Bonding mechanisms
require physical interaction that I spoke about like if you have to pair a new device you have to go Physically to the product tap a button only then it opens up its pairing mode and it also pairs and makes, you know, matches up with the user who is the controller of the robot and he authorizes, yes, I'm allowing a new device or not. On the OS and firmware security, I talked a little bit about the trusted execution environment. This basically is privileged code and operations which are executed inside a secure hardware processor, not in the user's space land or the kernel space. remove and harden our operating system to basically remove all the debug interfaces like SSH and ADBD and Fastboot,
basically making attacking and debugging harder. We focus a lot on what are the unnecessary services that we have to remove and how do we reduce the attack surface. We also do some engineering on the OS and firmware side to basically run these applications and processes, especially the ones that talk to the internet or talk to a sensor to run as least privilege and containerizing and isolating them for better security guarantees. Then limiting network exposure, again, not advertise the services that are not needed. We encrypt the data partition on the robot, so even if somebody steals your robot, you still can't get to a data without really going and breaking trust zone. And then file system
integrity verification is just, you know, the way this works is you basically sign and verify every block of the storage. And even if somebody tampers one block, basically, you know, it fails. And then a lot of work on hardening of the actual operating system and the kernel around stack canaries and basically recompiling your apps with certain flags which can harden the binaries for different kinds of exploits which are pretty prevalent. There are a number of challenges and trade-offs. For us, the biggest issue is should we go and use a mainstream OS like Linux, which is well supported, or for a high assurance robot, do you go and take a high assurance, more secure OS. The trade-offs you have to
make is if you go for a high assurance OS, there's not a ton of support in terms of the tooling and libraries and all the other applications you want to write. So, you know, you have to basically work with some commodity off-the-shelf operating system. You know, I talked about the longer life cycle of consumer devices. You know, you ship a consumer electronic device and a robot, they are going to use it for 10 years. your clock is ticking, you know, Windows XP was secure when it was launched and look at where it is today. So how long do you patch your products? How do you patch things that you don't have control over, especially the
end of life chips and the drivers and, you know, things that are outside your control? You know, there are other interesting attacks on like foundational platform security aspects, you know, what happens if your processors or your platform has issues which are literally things like Spectre and Meltdown or crack vulnerabilities in the Wi-Fi. Nothing you can control or do. It's basically some vulnerability in the protocol or some vulnerability in the hardware design itself. Those are pretty hard problems. And then the last one, which is a very, people who spend their time in the Linux world or Android world or embedded security world just tracking CVEs and known exploits in different Linux variants and kernels, it's just a huge nightmare. So cloud
security, because the robot talks to the cloud, not only the robot authenticates the cloud, we also authenticate the robot back. It's a real robot, it's running trusted code, so we do mutual DLS for all of our services. We use modern crypto algorithms, If you were using your laptop and a browser, you go on the internet, there's a key store in your browser and your browser has a way to verify the SSL search chain. The way we think about the key store on our robot is it only trusts our cloud, so we basically remove pretty much all the major CAs and all the other certificates. So even if somebody hacks and even if somebody tries to man in the middle and puts some certs and proxies, we basically
want to make attackers' life harder. But the intent basically is the robot only trusts our cloud. It doesn't trust anybody on the internet. So authentication, code signing and stuff, form availability aspect, we think about which services are really critical. Secure update, token management, account service, authorization service, really thinking about how do we scale these for millions of robots all across the world in different geographies. Then a lot of tooling and capabilities we have to work on basically detecting fraud and abuse. DDoS is just one example, but how do you detect anomalous behavior coming from robots, not users? We basically designed a lot of tooling for our own C and PKI. There are some interesting challenges on the robotic side as well as embedded
hardware. The first one is clock. To verify SSL and TLS certs, you need a good reliable clock. If your phone's clock went back two years, suddenly everything will start failing. If the robot doesn't have a clock and you power it on, how does it trust the validity of that search? It's not an easy problem. So you can either rely on signature or you can rely on time. And we rely on signature, not time, because the device basically powers from the genics epoch time of 1970. It has no idea of clock when it boots up. The second one talks about what kind of capabilities should you build you know, when the home DNS is insecure, when the time service is insecure, they use
UDP, the content delivery networks which are not in your control for like software distribution, and how do you think about securing those aspects. We use some interesting protocols which are newer, like GRPC, it does HTTP/2, it can do both HTTP/2 and HTTP/1 in like single call, but then the tooling and capabilities of TLS libraries are not there yet. If anybody has ever managed and run HSM in the cloud, of how do you do key management distribution and the kind of limitations that HSM gives you from even the major cloud manufacturers, it's pretty amazing even in this day and age. And then the last one is a funny one, when you do all this work around confidentiality and cryptography and security, And guess what? You manufacture in
China and the factory can't talk over those crypto channels back to you because something failed in the Great Firewall of China. So what kind of environment do you need to create in that other country to make your product work or test during manufacturing? Those are pretty interesting challenges. I'm going to talk a little bit about privacy. The hard privacy issues are if you start thinking about this notion of Yes, enable the user, give them better privacy. It also enables a ton of abuse if you can't track where the traffic is coming from, which the user is, what kind of things he's up to. The second challenge is the way to think about managed versus unmanaged robot is, yes, if this was a commercial device and you were selling a
service to a customer, let's say a robot projector, you want to build capabilities to manage those things remotely. So, you know, device management and remote capabilities of you as a vendor, how much control should you have directly on the device versus The privacy aspect of once you have that control, then totally some three-letter agency can come to your door and say, "Hey, why don't you put a back door in it? Why don't you tell me data about this customer versus that customer?" And we don't want to be in that space. So we have to make interesting decisions around how do we enable privacy for the customers. The third bullet about privacy is for AI to
be useful, you have to collect a lot of data. But the more data you collect, the more creepy you become. So how much data can you keep local? How much data do you stream up to the cloud? And how do you anonymize that? And how do you use that to basically train and make your products better? Those are also pretty hard decisions. The fourth bullet talks about, you know, we have legitimate interest in providing customer support. People call us and say, "Hey, my robot is not behaving this way. Why talk to it? I gave it these commands and, you know, it's not working." we need to give our own people capability to like, hey, if
Talha calls, you should go and be able to look at the failure and tell the customer this is why it failed. But if you build that capability, then you also have insider threat of like somebody can abuse that thing. So those are like very hard choices you have to make of what do we collect even for our own legitimate support scenarios. And then conflicting privacy requirements, you know, we sell in Europe, we sell in America, you know, if you've dealt with COPPA or GDPR, they have pretty conflicting requirements in terms of the right to be forgotten versus who can ask for deletion of data and who can provide parental consent. That's just a broken space. And the final word hits on, you know, what data do we keep local
versus what data do we keep on the cloud. What we do as a company, we took a decision and we basically said we will not store any voice data in the cloud. All the commands that the user tells the robot, says to the robot, we basically take that voice stream, we understand what the user wants, we perform the action, and then we throw away the voice. We don't have any record of what anybody told or said our robots. And we do this deliberately. It reduces our threat surface. It's also a better privacy story. If we get hacked, we don't have people's voices. We don't have anything to turn on to anybody. This is the privacy cues I
was talking about. There's a very clear visual indicator and a privacy signal when the robot is listening and streaming up to the cloud. We do collect anonymized stats, which basically is how our customers are interacting with the robots, which aspects are more popular, and those are not tied to an individual user or robot. We are mostly basically to improve our products, we're just curious about which functionality is more popular than the other, or which things fail that people want that we haven't built yet. Customers can always opt out, even of the anonymous user analytics. there are certain capabilities and you know because we use cloud providers for various things we only partner with entities that can comply with our legal and data retention requirements
this is a really interesting one which again goes to this usability versus privacy aspect of you know the keyword and the detectivity signals that you um give to the robot in terms of you know you say hey vector do this those trainings and those classification algorithms, to make them more accurate, you need more data. We don't use our customers' data for training. We buy this data off the shelf, which is completely anonymized, But it does have some implications as well as impact on how accurate your voice capability is, but that's a decision we made as a company of like, we will not use our own customer data for training our voice because we throw away the voice data. And then we encrypt all data in transit as
well as on the robot. And the vision stuff, I mean for voice, we have to stream it to the cloud because the tech of doing AI locally on the robot is not there yet, but on the vision side, that thing has matured a lot. So all the training models and all the vision stuff that the robot does is all local on the robot. We don't stream images, we don't stream video to the cloud. And all the biometric data of even the people that it recognizes, those mathematical fingerprints are basically stored anonymized locally on the robot. That is it. End user is in control, you can always just hard click, hard press the robot and it will go back, reset to the factory
state if you want to set it off or return back to us. That's all, here's my contact. Happy to, we have about seven minutes, so happy to take any questions. Sir. - Boy, speedy. You mentioned there were challenges like, I think you called it the dolphin attack, where it's, That seems like a very simple low-tech way to get around something like that is simply to put a bandpass filter that goes to only the human range of things. So nothing can go up to your cloud for analysis that isn't human voice, at least in that range. Yes, so we have capability to detect it because we write our own microphone formers. A lot of devices, they basically
use off-the-shelf tech which is already developed. and they don't have that level of granularity. So we can totally detect if it's not our voice frequency that looks like human or sounds like human. But even then, the early voice detection, that's local. So when you say a trigger word, and when the robot recognizes that trigger word is only when it starts streaming to the cloud. So in the Alexa world, it's "Hey, Alexa." In the Google world, it's "Hey, Google." For us, it's "Hey, Vector." So we only stream after we have recognized the trigger word. So we don't stream anything before that anyway.
- Hey, great talk, thanks. So on your hardware security slide, you mentioned you had some cost constraints and you had to balance security versus cost in some cases. What are some of the things that you had to out of scope because of cost consideration? And rephrasing the question, if there was no cost constraints, what would you have done differently? - Yeah, good, great question. So I think the way to think about this is, so like, let's start with SOC, for example. Your mainstream processor is the same, similar processor that you use in your mobile smartphones. It's made by Qualcomm, you know, it's a 64-bit processor with a CPU and a GPU and eMMC and it
has some interesting trust zone type capabilities. It's a smartphone, it's a processor from like three years ago, right? It's now way cheaper than when it was when it launched. So the newer processors have more advanced capabilities but like then it drives up the price. For the functionality we have today, that processor is sufficient. That's like on one aspect. The second aspect is, you know, BLE sensor. Would you buy a 5 cent BLE sensor or a 50 cent BLE sensor? when you know nothing about what is in that sensor. It's a black boxed firmware somebody wrote somewhere in Taiwan and China and it's commoditized which literally everybody else is using. So yes, if you had no constraints, you could design your own BLE sensor, you could write
your own firmware, but that basically, you know, raises a bar. There are tons of components that not just us, like everybody buys and uses, where, you know, you kind of trust what comes from the supply chain. But then on the hardware side, again, I think because it's a toy, an entertainment toy, it has a plastic body. I mean, yeah, there's no films of tamper resistance and evidence. There are no seals. You can open it up and put another camera next to our camera if you wanted to bug the device. But yeah, we could make it $100 more expensive and add layers of physical security. Even that is not perfect, but it raises the bar. So you kind
of make these decisions for price point based on what the product needs to do and how premium it is. Any other questions? Okay, big round of applause. Thank you very much. Thank you. Thank you.
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
Best. Mess it up. And that's been shown time and time again. A couple of examples are Uber. They had the GitHub repo misconfiguration where one of their developers or someone pushed some API keys, access keys, to their GitHub where hackers were able to steal those, gain access to their environment through that way. Tesla is another one. They had a public-facing unsecured IT admin console and the hackers were able to get into that and retrieve some keys they had stored there, which they then used to mine some cryptocurrency, which is a large theme in these kind of hacks. But it's really not, I mean, that is definitely a big threat, but it's not all there is when it comes to these kind of attacks. Accenture and
many, many others, as I'm sure everyone is aware, S3 bucket misconfigurations. Publicly allowing access to private data, company data, employee data, payment info, tons of different things that seems like it's happening every other week that something comes out about a new one that's going on. But the real threat, so that is authenticated compromise. That's not just having someone find your S3 bucket that's facing the internet. It's when your keys get compromised. And that can happen in a variety of different ways, such as GitHub commits with Uber, you know, social engineering or phishing to get your password or secret keys. Password reuse, this is something I see a lot as a pen tester myself. is, you
know, I go online and find a public database dump with clear text passwords, associate it with someone's email, and then I go and try and log in on a different website with the same email and same password, and it logs me in. That's extremely common, more common than I would have ever thought. Then as well as web app vulnerabilities. So there's server side request forgery. You can contact the EC2 instances metadata API to retrieve temporary AWS credentials for an IAM role that's attached to an instance. So with just server side request forgery, you could gain as much AWS access as that server has. Or local file read. So through any method, you know, RCE, LFI, any of that, you can discover keys that are stored locally on
this system. I mean, as much of a bad practice as that may be, it's still something that's common and people aren't, you know, doing it correctly all the time. And that's what I found. As well as internal threats and rogue employees. You don't even need to worry about the compromise in that situation. They already have access to your environment. Are your permissions set up correctly so that they can't end you or cost you thousands of dollars? You know, it's a lot of different things that can happen here. So penetration testing on AWS. There's a lot missing from the penetration testing industry in regards to the cloud and AWS. There's a lot of configuration scanners out there. A lot of best practice checkers, configuration checkers, a
couple of those, Scout2, CloudSploit, Prowler. There's a lot of others. Those are all really great tools. But there's not much for an attacker. how do you go and capitalize on a misconfiguration? How do you exploit something, some certain setting? How do you gain access to the environment, escalate your privileges, all that different stuff? There's not much out there. There's some, a couple of them, AWS Pwn, Nimbo Stratus, or the weird AWS Attack Library, all great tools, but those and other AWS attack tools, all kind of a common theme. they're kind of forgotten about or stopped working on. Where you see on the GitHub, hasn't been touched in two years or six months. And that's not really,
that's not what the industry needs if we're gonna advance the AWS pen testing industry because AWS is constantly updating their APIs, adding new services, deprecating services and APIs. The tools need to stay up to date with what you're attacking, obviously. So it's also something that's hard to learn without practice. How are you supposed to get into AWS pen testing? I don't know, if you don't know much about AWS, you probably have no idea. And so you need to practice it. But how are you supposed to practice it without an environment? Some sort of lab environment. What you need basically is As a pen tester, you shouldn't have to worry about deploying the different AWS resources
that you want to pen test against to simulate a customer's environment or worry about the misconfiguration of those resources to put some vulnerabilities in there and know that there's something you can exploit. And you shouldn't have to worry about a high cost as you're going through these things. You shouldn't be charged, I don't know, even... even $50 for testing, trying to learn pen testing on your own account, on your own accord. And if you're not familiar with AWS, it's hard to make something like that happen. So really, if only we had something like WebGoat for web applications where you can host your own thing and go wild on it. And so with that, this kind of brings me to CloudGoat, which is a vulnerable by design AWS environment that
we recently released at Rhino Security Labs. It's built to enable AWS security education. It's supposed to... We made it so that you can learn and get into AWS pen testing easier. Make it so anybody can do it rather than somebody who has access to some sort of environment or has enough knowledge to deploy those resources. Ultimately, it will help attackers and defenders in the sense that It will help them understand different misconfigurations, what even just normal configurations can do to your environment if a certain user has a certain amount of access or something like that. So it's deployed and destroyed on demand with Terraform. You can just run the start script, launches all the resources into your account, logs the
credentials so you can choose a starting point. And then you can destroy it at any time and it's easy to do. Yeah, we integrated nine commonly misconfigured AWS services with plans to expand on that. So that includes IAM, CloudTrail, GuardDuty, EC2, CodeBuild, Lambda, LightSail, S3, and Glue. So there's a variety of different resources within those services that we've created. So there's multiple starting points. There's a few different IAM users and there are a couple of different instances that you could start in. So it's kind of you choose your path and your simulation of an attack. Did you steal someone's password and you have console access to someone's account? Did you hack a web server and you got access keys or something like that?
Or did you get RCE on a server that doesn't have AWS or like, I guess, RCE on a web application or something like that, so you are in the server, you're on the command line in the server, then that's where you're starting. You know, a lot of different ways to do that. And with that, that creates a lot of different attack vectors. There's currently 20 plus in the environment, different ways to escalate privileges, disrupting and evading logging and monitoring, data enumeration and exfiltration, persistence, and a lot of others. We try and incorporate as much as we can in there to give you a well-rounded idea of pentesting and misconfigurations in AWS. It's inexpensive, so this is another
big worry that we encountered. Currently if you just run the environments about a dollar to three a day based on how much you're using it You know if you're attacking it for 24 hours straight obviously it'll be a little bit more than that But this is off of the free tier so you know it's it's really it's really cheap and based on the The start script, or based on how it starts up and gets destroyed, you can deploy it and destroy it within just a couple of minutes. So even though it's cheap and everything, we always suggest you can start it when you're ready to start pentesting it, then you can tear it down when
you're done. And you don't have to worry about any hidden expenses that way. It's relatively safe. It's safe in the idea that obviously, I mean, everything's misconfigured, so that's not safe. But you all ingress traffic to any instances in your AWS account are whitelisted to an IP range that you supply when you're starting the environment up. So this keeps access to those misconfigured resources to people on your IP address or people who have access to the credentials that you just created, which ideally is just you. If it's not, then you probably have other problems. Even given that, it's not recommended to launch alongside other resources though in your account as obviously they're misconfigured and if they're in
the same account they can have access to certain things so we'd never suggest deploying it next to your production environment or anything like that. We already released it on GitHub a week or two ago with the BSD3 license so it's ready to get started with and start learning. I've got a video here that... Maybe, maybe not. There we go. Alright, so basically right here I'm starting the start script. I passed in localhost as my IP address, or my IP range. So it's going to whitelist everything to that IP, and only people with that IP can access the server. So then you can... Well, so then I run. I've sped the video up quite a bit because it
takes a couple of minutes. So now Terraform's going through and creating all the resources in your account. It's doing everything for you. You can just wait for this to be finished, and then it'll be ready to go. So it takes not too long, just I think two and a half minutes was what this, before I sped it up, ended up being. I don't think that's too long. Just waiting on the finish. So finishes, you see the Terraform message. It's added 38 resources to your AWS account. And then the GPG message there is decrypting a couple of the passwords for the user accounts. And it just outputted that into a credentials file where you can cap
that And these are three of the different starting points for the Cloud Goat environment. Feel free to try those credentials if you want. They won't work. Well, in this talk, I mean, in general, they'll work. So you could start as the administrator. You just access the administrator's account. Bob, Joe-- They all have different permissions, different sets of access. So that means there's different attack vectors from each account. And you can learn a variety of things that way. And then to shut it down, it's just the kill script. And it uses Terraform to take down everything in the environment, as we can see here. And that is it. If you don't know Terraform, that only kills those 38 resources we add, nothing that exists in
the environment Although you shouldn't be running against an environment with resources already going, but separate. So we have a vulnerable environment now. And we're still missing something. There's more to pentesting and security than just a vulnerable environment. It's hard to-- how do you know what to do? How do you automate these things? How do you manage these attacks? And really what's needed is a tool that can turn complex multi-step attacks into just a couple of commands. And it removes a lot of the work on your side of things to just what that tool is doing. You don't have to rewrite a bunch of code every time you want to do this certain attack. You don't have to go find it in your operating system. You don't have
to integrate someone else's code to try and fix some weird bug. It should just be able to handle those things. It needs to be future proof. So that means it needs to be modular to allow scripts to work together. You write an enumeration script, enumerate some data, When you're later writing an exploitation script, whatever that may be, you don't want to rewrite the enumeration part. So you should just be able to assume that data is there and something will get it for you. And it needs to be extensible because AWS is constantly growing. So the tool needs to be able to constantly grow and shouldn't require framework updates every time there's a new AWS update
or something like that. And really, configuration scanning is great and very helpful and something that is definitely needed, but we need something to exploit those misconfigurations and capitalize on them as attackers. Basically, those scanners are the nexus of the cloud tools, but we need the Metasploit. So the solution to this, POKU, the Offensive AWS Exploitation Framework, It's written fully in Python 3. It covers the attack chain, or the kill chain, so it covers reconnaissance, logging and monitoring, persistence, privilege escalation, data exfiltration, and just general exploitation. So the module's simple, simple to develop. We've got a built-in template. All the modules are structurally separated, so all you have to do is add a folder, copy the template in, and your module is in Paku. And then obviously you've
got to write some code, but we try to make that as easy as possible. So like I said before, you shouldn't have to worry about enumerating data if that's not the point of your module. You should just be able to say, I need this data, and then the Paku framework figures that out for you and it gives it to your module. So you can, you know, that's a lot of code that you don't have to even think about. You can, you know, in general that makes it a lot easier as, I don't know, it takes a lot of worry and even syntax down to, you know, you have less lines of code. You don't have
to worry about errors coming up. So it's extensible. We've got a built-in API that we exposed from the Paku Core framework to module developers to make things easier. I'll get into that a little bit more later on, but in general it takes out annoying and repetitive code out of your hands and gives you simple one-liners for those. There's session data management. So it handles sessions in a variety of ways, but basically you can create a session for a company that you're working on and it keeps all the data and keys and anything that's gone on, logs, everything separate from another company that you might be working on, another client. And that ties into data management
through SQLite, which is we use a local SQLite database to manage all the data that gets enumerated as you're attacking these environments and everything that you need saved. And there's global error handling. So, you know, Paku's not going to crash and corrupt your database. It's always going to-- it'll always work. It'll always catch those things. And you don't need to worry about that kind of stuff. And just like Cloud Go, Paku is open sourced on GitHub as of an hour ago or something. So that's exciting. It's on the same-- whoops. Whoops. Sorry about that. OK. So it's on GitHub. So, Paku and CloudGoat were built with each other in mind. And, you know, many of the vulnerabilities in CloudGoat were tied in closely with exploits
and modules that are in Paku. And the plan is to grow them together as well. So, as Paku modules are built, that means we'll integrate those vulnerabilities and techniques into CloudGoat. So, you can capitalize on those and test those modules. And then the reverse as well. It's a really great way to learn Paku, learn pen testing, and in general just get an idea of what's going on before you go and test this against a production environment or something like that. So I'm going to go through a Paku demo now. Basically this is a post-compromise scenario, so I've compromised someone's keys through some method and that's all I know. I don't know anything else, I just have those keys. I'll go through
using the Paku console, numerating permissions, settings, different data, escalating my privileges to an administrator level, establishing persistence in the account, back-doining security groups to gain access to certain services, and then compromising EC2 instances through remote code execution. Well, it seems to be frozen.
Well, all right, I guess I'll go with the video. I have a video of this. Not sure why that's not working. But OK, so what we'll go through here is first, running Paku. Splash screen comes up. We get the Paku ASCII logo. And underneath here-- OK, so you can see there's found existing sessions. I already created a session for b-sides. But if there was no session, you need to create one. You have to give it a name and that'll encompass your project. So, accept that. I'll choose, put in one. Accept my thing. So then it'll come up here. We have a lot of different help information, a lot of different stuff pops up here. Not
so important quite yet. Down in the bottom left, you can see bsides Las Vegas colon Spencer. So in this case, bsides Las Vegas is your active session. So you always know when your session, you always know what company you're working on. You don't have to worry about accidentally attacking a different company or anything like that. And then Spencer is the pair of keys that I compromised. I set the name Spencer to them because I don't know anything about them and my name is Spencer. So, you know, it can be anything you want. So it makes it easy to know what you're doing. There's a lot of different help here. So basically some of this different help stuff, there's the ls
command that was just run that lists all the different modules that exist. You can see the category they belong to as well. There's the search command that you can search for a string. It gives you the name of the module as well as a quick one-liner of description for that module. This can help you get a better idea of what you're trying to execute and what you want to do. To do this, in this case, there's also the setRegions command. This is a pretty important one that we built in. In this case, I just set the regions to US West 2. And what this does is it ensures that any module that uses regions in one way or another,
you know, there's global AWS services, but a lot of them use regions. This will ensure that a module will not be run against a service that doesn't support a certain region or isn't supported in a certain region as well as it won't run against regions you don't want to target. So a module, I run a module, it's going to check what service or what regions are supported for that service. Then it's going to check the session regions and it will only execute against regions that are in both lists. So you always know exactly what you're targeting. You don't have to worry about any of that.
Okay, so then there's the whoami command. This will give you a lot of different information here. Currently there's not any information because all I've done is set these keys. I haven't done anything else. But a lot of different stuff, including the keys and then the key alias, just what I named them. Then to gather the information that goes in those different places, you can run the, there's modules that will contribute to that over time. So what's going on here is that there's tab completion for every command, every module. So you can tab complete any command and the modules that you're trying to run. I'm using the run command here to run the confirm permissions module. Then there's also argument auto-completion, but in this
case I'm not using any arguments. I'm going to run confirm permissions. That's going to run. Now we've got module summary that pops up. Confirm permissions has completed and it says it's found confirm permissions for demo. What that means is it confirmed permissions for the username demo and that tells me the keys that I named Spencer are actually named demo. That's something I could change later on, but it's not all that important right now. So now if I run the whoami command, we'll see a lot more data that's populated. So now we can see username, user ARN, account ID, user ID, rules, groups, policies that are attached to your user, the keys again, alias, and then permissions
confirmed that That's a pretty important one. Currently there's only one module that sets that setting, and it's the Confirm Permissions module. But what it means is that when these permissions were enumerated, it was 100% sure that that is the full list. And it's not just through testing it discovered this. It knows that this is the case. And so we can see you've got some IAM permissions, Lambda, and RDS. I'm not explicitly denied any permissions there. Yeah, we don't need to worry about that. Me? From the video? Yeah, there's no sound. Well, as long as everyone knows it's not my video.
So what I'm doing here is we saw that there were, you know, IAM, Lambda, RDS permissions. I'm going to run the enum ec2 module against instances and security groups. We know this won't work, but it's just to kind of give you an example. It used the dry run feature of the AWS Python library, Boto3, and it's confirming whether this account has those permissions or not before trying to actually execute the commands, which I don't. And so then it tells me nothing to enumerate. That's it. You can't do it right now. You don't have permissions. So what that means to me is that I need to escalate my privileges. so that I can run this module.
And to do that, a lot of different modules, obviously they're complicated. So there's a help command that can be run. I'm going to run it on the PrevescScan module, which is not so important right now, but basically it just gives you the arguments that you're allowed to use, as well as a long description of what it's going to try and do. But