← All talks

When business critical systems have critical design errors

BSides Oslo · 201948:01147 viewsPublished 2019-06Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Hans-Petter Fjeld & Bjarne Rasmussen When business critical systems have critical design errors, the story of a disclosure Laughing over the twitter-feed of Tabletop Scenarios (@badthingsdaily), we get interrupted by an engineer who wants to present a gloomy discovery, and possible entry for the Twitter account. Only this is a business-critical system we have installed in our production environment. Looking closely at the inner workings of the monitoring system, the engineer discovered that it was mostly based on a message bus. The secret key and passphrase needed to communicate on it was hard coded into several components publicly available, and that all of the servers we had in our monitoring system were vulnerable to unauthenticated command execution from all of the other servers in the monitoring system. This is the story of CVE-2018-13821, lessons learned during vulnerability disclosure, and the path to mitigating a central design error. Hans-Petter Fjeld An Information Security Engineer at Basefarm and an all-round blue-team member who also does penetration testing. Bjarne Rasmussen CTO EMEA Enterprise at Broadcom and was VP, EMEA Customer Success and Escalations Management at CA previously.
Show transcript [en]

cool welcome I hope you had your apple a day or a pineapple whatever was out there and are ready for the last two stalks of the day here so I'm very pleased to present a story from the trenches of all of the messy details of vulnerability discovery and handling in in real life in production so this is a collaborative story from the trenches from home inspected from base farm and Donna Rasmussen from Broadcom yes thank you

anyone want to tell me how it feels to make a mistake actually it's a trick question because when you're making the mistake it feels precisely as all the other times otherwise you hadn't we wouldn't have done it right so making mistakes doing right and wrong this is important to us this is the agenda that's so important this is most about finding it then disclosing it and when you've disclose it you're not done so let's stuff after that as well talk about that and some lessons learned so base form is a Norwegian hosting company we strive to be a bit better than the rest of them and I feel we were doing pretty good on that we're not the

run-of-the-mill WordPress hosting company we are more caring about you and trying to get you to success and hopefully he'll pay us along the way or all wins like we are we have about 19 years experience now so in backroom we have some pretty colorful gang of technicians stuff we've been able to keep I keep holding in the company we've expanded across Europe and we now have nine data centers not so important to us but what is important is monitoring because we try to tell our customers that you can give us your critical business applications you can trust us we know that if your servers go down money stops coming in to you and money wouldn't eventually stop coming in to us

this makes monitoring critical to our success before what this story is about we had a monetary institution called moon for fun I reached out to their developers said maintain errs and they could inform me that as far as they saw the worst thing about it was the name no search search ability online its pearl so when they say king of simplicity you have to kind of account for that right it's basically bosses admins force admins and yeah it works like a charm one of the developers still have one installation running and it's it's doing fine I don't believe the four downloads this week sorry search forge I just don't well it was extendable with a web GUI

but we had to find something better and this was around 2010 we looked for something better and we found Nimbus bye-bye CA it might be confusion with the Broadcom and an Amish being CA but they got bought last year so I'm gonna explain that later is anyone here using Nimbus or CA you I am in their company today nobody okay it's a good monitoring software that's doing it also has a had its roots in in Oslo it became Nimsoft and I went to Silicon Valley acquired by CA change your name a little bit when CA got acquired but this is one of the main tools in this monitoring software the infrastructure manager I'm gonna explain this with a

different graph to but it's a has a hub and the hub has a robot and the robot has some probes and it can give you alerts and quality of service data this was chosen by us at least because it had a lot of different kinds of checks when you're a big enterprise like us you get a lot of strange storage devices and stuff that maybe isn't so easy to make checks for when you're writing open software and testing with so but this was pretty good

all right this is more of a graph of how it works and I'm gonna get going to the finding we had in the moment in the top left there's an admin console this is where you administer everything from that would talk to a uim server the UAM server is basically a robot with different kinds of probes as far as far as I see it this is mostly for reporting and graphing a ump don't there's not not important for us now it all communicates using something they call a message bus and of course this is an abstraction it's it's TCP connections but this is the architecture anyway and then under here now in this example is to service

eight running will it row it's robot and then the robot has probes probes are basically processes and based on what kind of server it is if it's a database server then you have SQL probes if it's a web front end and you have HTTP probes checking the status of the service you're interested in and we had thousands of these servers and our hosting so this got pretty big but Nimbus kills all right

so simplifying a bit removing some arrows for you admin console puts a message out on the UAM server in the message bus this gets picked up by robot on the server that's fine dandy and then the server replies back easy right the admin console can both make commands out to the robots as usually done when a user or administrator is actually using the console and a probe can also requests configuration from the from the admin console so it's a two-way thing that's flexible flexible and nice alright so this whole thing started when we had a support issue with CA support and they gave our technician a compiled Java blob and said execute this and it

spit up spits out some data and just send us the data and move figure it out well I don't know about the rest of companies but we don't really just execute come compiled software in production and just sure ship along so our our technician working on this he started decompiling and reverse engineering that compiled blob

that started this whole thing he found first that this blob it could it had a hard-coded passphrase that it could send in order to receive back the database connection string and this is symmetric encryption so encrypting decrypting is the same passphrase and we've we were able to find this passphrase in the different probes that I mentioned there's a probe called probe provisioning probe and it's it does what it says it's provisions different probes but this these probes are downloadable on the websites for everyone it's just the hub's themselves there are licensed and then that you have to buy so this hard code would pass phrase was not only hard coded but it was also publicly

available for people to download so anything thought if they hard-coded this what else did they our code so he continued looking so then we find this other thing he found that on the message bus the messages are encrypted to fish encryption nice and dandy but it's hard coded so the encryption is hard coded that's bad but then the real juicy part was the third one that when you put some put some message on the bus it's not really authenticated anywhere they don't check that this is sender the it states is the one requesting stuff this means that you can have a totally legit probe in server a puts a message on the bus and this is open firewall because all

servers has to have this to communicate for legit reasons I think you tell the admin console yeah I want this new probe and here's the here's the configuration for that probe it will run this command admin console says sure server B here's your configuration basically meaning if you had access to any of our thousands of servers in any of our hundreds of customers they could send commands on any of the other thousands of servers and we didn't like that so when we when we were done running around screaming about this in her office we knew we had to disclose this tel tell the vendor and do all that so that's the next part of this journey how many here has disclosed

to vulnerability to a larger company here before a couple hands yeah nice thanks it's not so easy there's different types of disclosures but in this crowd is this is this new to anyone now you'll know this like it's full where you just tweet it or Peter and github coordinated what microsoft call it no no read that line but you work together private where you get money for hopefully or you just tell your closest friends and then there's something called responsible disclosure Google use like 90 days hard limit and some use five days it's it's not really set in stone what responsible disclosure is and it really depends on your motivation and on the situation is it is this one

ability being actively exploited if so wearing ninety days might not be responsible and in the in the end this is is more or less ethics it's a huge branch of philosophy and it's not just another presentation like its own see it's a whole semester so I I was knew a guy that was a professor teaching ethics he was one of them my best bear buddies like he there was a nice discussion she should really buy a bear to ethics professor say yeah responsible disclosure basically means you first notify the vendor give them time and then you publish it because others might also know about this and have not disclosed it the important part is to

never let your sense of moral prevent you from doing what is right meaning that you might have a framework we have thought out your morals I hope we all have thought that but in the situation the situation is what counts it maybe you didn't think about all the things in the situation you're currently in so do what is right it feels precisely like it should so yeah and the part about notifying the public if you're using this software and have bought it you might have signed some papers you might have made some agreements that you're not gonna reverse engineer the software and you're not gonna disclose intellectual properties so going to a big point a big American

company and saying like yeah how I'm gonna get you you know they have lawyers and somebody has thought the idea that it might be used them luckily I did not get that response so my take on responsibilities closure in this instance was to try to find firstly try to find a security page someplace that they have published on the website like here tell us about all ours bad stuff please but I did not find that page I'm able to find it now but hindsight so I knew if I just sent in a support tickets to their support apparatus it would they will go like well it's not an incident nothing is down and yeah this is

priority way down and we'll look at it when we get time right that's how you have to operate when you have thousands of tickets coming in like these guys probably do but I found a guy working at CIA in in Oslo through my contact network and I was able to establish a connection with him but for him to do some work internally he of course had to have a support ticket to reference so I I had to submit that anyway I gave them I gave them 90 days thought that was reasonable then my nagging and poking started and waiting I had some vacation in this period as well so I did miss that CA is a CV numbering authority so

there's some contact information there as well but again hindsight and then silence then what's in in my situation I knew of contacts and people I knew in Norway that was using this product and it didn't seem like I was able to get some reason so get some reaction from za so I actually privately disclosed some of this to another customer of theirs in hindsight that was a breach of what I had promised them initially so that's a breach of trust from my part and probably a mistake on my part at the time it felt like the right thing to do but then one day this jovial Dane called me so Ghana thank you hence pillar so I'm the vendor

and at the time this happened it was CA Technologies CA Technologies got bought by Broadcom last year in November and so now I'm Porter para Broadcom if you flip a slide or I can do it myself oh really so do this yeah so now if you wonder what is Broadcom Broadcom is big company out of San Jose in California that's been around for almost 45 years there are doing around 20 billion a year have 21 different product categories and one of them or actually a handful of them are very close to you if you put your mobile phone up so Broadcom have been doing chips and all sort of communication chips for many years so

whether you have a set up box or router or switch or anything that communicate you probably find a Broadcom chip into it and in your iPhone or Samsung or Sony or Y way device you will find probably four or five chips from Broadcom Pro Tom decided a year go to go into the software space and acquire CA and CA in its own right was around a four billion dollar company so now you can see the the revenues even going further but what's interesting with Broadcom if i flip to the next slide is if you look on all these logos that is what makes up Broadcom over 40 50 years and it's a lot of well-known well-respected I guess

companies and for all of them it's about research and engineering and build proper proper products so the the point I want to get through is that I don't think that's any vendor on the earth and I'm probably nave and I'm probably wrong but there's I cannot believe there are engineers and vendors who wants to produce software or hardware for that matter that is pokey or have bottom build C's or you know doing wrong things I cannot believe it but you know again I'm probably but it was not the agenda at CAA if you take a product like Nimsoft or Nimbus as it were call is actually very very interesting so CA decided to buy Nimbus from Norway

mm and say now I think all of Univ AT&T you should be proud of that company come in and actually buy a Norwegian company they did today the name have changed is called you um unified infrastructure mantilla and still with Broadcom it is the flagship product we use for server monitoring so if you go around the world you'll find some of the largest banks are using an obedient product that was built here maybe some of you in the room have been part of the development back 10 15 20 years ago in the original releases of Nimbus that product is now being used by many many companies and very large organization and including a big big banks around the world to

monitor 100,000 of servers and all sort of stuff on those servers so now going through what really happened so you got the background three issues was indentified and and was raised as support issues to see a at that time back in April a year go oh yeah around the year go as you said has Peter the CA support was informed that in 90 days these one builders will be reported to the public which is you know a standard process you know where there's 90 day or something else it is it's a process at that time I ran the escalation management for Europe I ran customer success globally so I had a worldwide job but one of the half date

you ever had was to run escalation and it's not like you have 25 escalations a day locally but when you have an escalation it is usually around all the ATMs in Sweden are down and everyone jumps because you know it is millions of dollars that or corner that is flowing out the window for every minute you don't have these ATMs to run so that is really when you have hmn what is the job about is about you get the customer and you get the engineering team so everyone around a room a virtual room you know a WebEx and a phone line and then you discuss what to do and you get all the right people to fix it ASAP what

happened this case was a precess person here from Norway called me saying gun there's a problem there's a issue with we paste from the three bond abilities they haven't been dealt with for almost two and in a short period of time they would be public and I was like what the hell is this about you know it was not like system down exactly as Justin's Peter said you know usually a week big crisis is when something is done is not working and people cannot get a service but this was like you know what's going on and why would I deal with this and I start to look into it and I actually slept on it a night to be blunt honest and was

like you know what's this about and then I picked the phone and called his Peter saying he helped has feel of what is this about and I'm a Dean you know I could speak Danish to him and he was like you know holy who someone is calling and talking thing Danish so we had a good chat and I think I stablish a level of trust with hence Peter where actually we have a problem and we need to get this fixed so I didn't call to try to talk it down or talk it away or you know say minimize they were saying well because I looked in it saying if if these one the bilges are really correct

and if what hands Peter have found is really correct we have we really really lost customers running out there with the same issue and you know if you can find it other people can find it so it was like you know do we really have a crisis and what do we do so in the capacity of running escalation management I have access to our executive so I went to our executives saying we have a problem we have a crisis what do we do and internally we got engineering and engineering Mandya but also some of the hits of the business on the phone and it was like we need to get this fixed so I guess you

read the slide so going to the next one so what did we do yeah we established communication between you you you might be working with Americans in an American timezone so a meeting at five o'clock on a Friday it's fun exactly I remember that one yeah so that that that's the beauty of California time where when when you are five o'clock on a Friday afternoon they're not then they're nine o'clock or something I eight o'clock or whatever yeah so but that's just a complexity of the world we live in but the point is you know we got together internally and say you know what is this about and and of course that was this you know how on

earth could this sit for two months without anyone doing anything so of course we went back to the support folks and say you know what what have you been doing for two months and you know frankly we I never spent time on really what happened there someone set on it for whatever reason and they're all sort of reasons and excuses is really not important the important part was how do we get it fixed so we got the engineering together and saying do you agree we have these issues and they looked on that code and say yep it's it's correct okay fine - when can you fix it we think we'll be fixed in a year in in the next major

release saying are you not so what this is going public in 30 days so this was in sometime in June as far as I remember and that was 30 days left of the tree or the 90 days so saying you know in in in 30 days the biggest customers will have around the world they can read publicly that there are treatment abilities of which of them as you pointed out was really really severe where you could take control or other systems in in in in in a you know environment even if you have if you search a while even other cosmos systems so that that was a big big issue so then the it went so you know it cannot be in

a year cannot be it has to be this months we have 30 days so engineering starts to work they had to delay all the projects bad luck and they came back saying we cannot do it in 30 days it's impossible so I went to his parents and spirit we have problem if you disclose on on 30 days we're not ready so you need to trust us that we are building this and one way I built that trust was actually to create hands Peter on a call with engineering and in the engineering team and we laid down what we were going to do the the cycle we would develop we invited has parents who get access to beta release

of the of the fixes that would be ready I actually think within the the 90 days and you had a real transparency with me and really grateful for that and that's why my did you hear - I hope everyone I hope everyone gets to experience big company being that transparent and at that point my thinking was what's my motivation here it's not what do I have to gain to publish this in the middle of the summer when I know that they won't be able to fix it and when they get to fix it then their customers will not be there to patch it anyway my motivation was that okay we'll wait with that so long story

short engineering we regrouped their people had them to work long and hard and and I think my point to all of you here if you have an issue like this and you sit on the customer side and you have this issue make sure you escalate don't necessarily wait and wait because there could be people on lower level or mid level who are sitting on it because there's a business rule or it's not important for them or they don't they don't get the urgency and I think it's a shame for the relationship and it's a shame for everyone that it doesn't get in the spotlight so with any company you can always escalate you know you know you

can always call the Contra menu or it is always a way to raise the flag and saying some something is broken and is those people's responsibility to escalate further and their organization and shame on them if they don't at our place it happened and when you get to senior management of the company they understand the criticality of the issue they understand that if this have gone public without a fix the impact would have on cosmos and the first presentation we saw in the morning then you can stay tick tick tick tick on everything from you know communication issues to revenue issues to customer relationship with you to all the way down to your equity or

your stock will will down but it will cost you a lot of money and it will be hard to regain your trust with customer service everything is important here and of course when you get to these people it's not hard for them to say hey we prioritize 25 people to go code this or we we find a million dollar you know to get a fixed you need to get to that label in the company to make these decision and every company can make that decision it's just a question to get it high enough so in our case we we got a fixed and I think they came out in mid-august which was like 30 days

late or 40 days late compared to them the original 30 days but it got out and all three of them got out and and at the day we released the code we also release the three CBE and make them public and we get credits through a base room and has paid authorities for finding these one abilities actually expect in this setup the technician was a style meat-eater and I just want to make that for the record okay yeah okay if we flip to the next one so basically the the next wall slide I don't want to spend a lot of time on that that is basically a summary on what we did so the the engineering

they've been base it through their code and you have to remember that this code was written years before and and when you acquire code so CA bought it from from Norway from Nimbus and you know this is a learning I think for everyone who acquires something whether you acquire a company or you acquire the code for someone else do you sit down and read hundred thousand of lines of code to check if there's one abilities or do you spend you engineers time to build the next release with new feature function we all know the answer so you all have a Harwich of code from the past and actually at a point time it is not a

bad thing something like this happened because what engineer did was okay holy Moses we need to do a complete stack upgrade we went through all the libraries all the dependencies and say we need to operate this and this and this and this and this in order to get current and beyond the latest had to go on on the latest TLS at that time at least and applying the the stack operate as you see over here and that was one of the things that I focused on in my talks with them I was making sure that they didn't just invalidate my proof of concept I was happy that like we've seen all seen examples of that you just don't like remove the

the curl user-agent and in your hat because the food proof concert doesn't work like they did a proper fixer so that was my main concern and that would take time so yeah nobody you know you do all this and it was done in in very short period of time so that meant quite a bit of people was pulled into to get this done what is important this slide is actually if you are no old release you have a problem so when engineering said we're going to fix this we will fix it in this release and it doesn't really matter what the reason number is the problem was and number of customers was on old releases and in order to get to

the what fix my issue was to get to this release that means you have to go to this one this one this one to get here now think about if your last Bank anywhere in the world and you have $100,000 or quarter of a million service running this stuff and you an old release you don't do that operate just overnight no one will do that that takes time needs to be plan needs to be tested etc so the learning here for everyone here in the room is if you are consumer of software makes you you are not necessarily lates released we all know that but make sure you are on a release that is close to the latest release

because that will if there's an issue make it easier for you to jump through every lease why you get the patches of course you can put the finger around and saying da Venda da CA at that time why not apply the fix to all these releases that's not feasible in some cases that code has already been retired it's been loaded on storage that you only will pull out in case something really weird happened so you know you don't even have the development by them and to compile the code anymore it's being retired and so that's a reality in the software industry that in order to have a flow you move and move a move and then you

retire retire trying so you cannot go pet back and patch even if you wanted to and I in in base farm I have the role of a vulnerability manager in many cases it means me going around nagging engineers to upgrade upgrade upgrade and in some cases they go like well there's not that big of a security issue in this upgrade but be closer to the latest one because when stuff hits a fan it's nice to be able to just upgrade once and be happy with it exactly so that's the learning from here for for every one you make sure you are on whatever software you're using make sure you aren't close to the the latest of

the what we call da release not necessarily the latest because then you have some other issues often so here's more information around what was fixed of this links to the slide has Peter showed before so we're now there's a secure communication between all the components and and whatnot so don't want to spend time with that and I think the latest lies is is more less around the same so you know it was interesting learning and I learned a lot very quickly but what was important in my summary was to be transparent and being honest and and work this true and rather than starts to fight it and argue and being necking and and and try to be

in a battle you cannot win so you know that's the learning I have yeah as Peter I think you have a slide yeah let's just learn from my side as more or less also I did not really know how development was done in a large enterprise on a large products I've been like I know development programming you're only using the master branching gift and stuff like that that's my level so you showed me that it's proper fixes take proper time I learned that ethics ethics is hard there's no fixed answer to that do your best it's what you got more talk less writing call people talk to them interact that helps a lot many technicians I know like the right

messages but talking together really pick the phone yeah the phone it makes it so much easier communication goes so much faster yeah also I've learned a bit recently that all wasps they have this cheat sheet for disclosures with some tips hats off to all wasp guys they are awesome but yeah you want to take a picture of that ciao questions anybody questions hi so this is to Broadcom assuming that some other company in some other country contacts you about a hypothetical similar severity bug would it today have to go through the same kind of out of normal ways escalations and would it take six days nothing happening or I have you like tried to implement this too so

that's a good question it should not be an escalation and the I think the learning we had at that time was how is it possible that we had three mana bilges of this severity that what's sitting in support for sixty days without really proper escalation that is not happening today one you know one change that happened in the in the merchants who were Broadcom and actually before was support was integrated closer into engineering the you know at CA at the point at that period of time support was one organization and engineering was another organization so you have level one level two so level one is support usually people don't have access to sort code level two is where you actually

come to people where they have access to source code and can start to write fixes to say that is now integrated under the same leadership that means you know you don't have to go up and down in the organization you can go right and left and find the right people and that's the way it should be I they also happen that is a high opener for us at that time is you know system down is a critical issue but stuff like this is as critical and have to be dealt with in the same fashion and that was learning for so yes it had changed the behavior of support and engineering but mostly that they're now put together

under the same leadership and there's one person who's responsible for the whole lot worse before there were basically two set of organizations who would have responsibility thanks guys anybody else have questions in the back here on the side yes exactly in the back

this is more of a business question but I wish to work for a company that has probably created problems for you guys the bass farm it's called it's a Norwegian company that was bought by a huge multinational but I mostly have a question regarding the change that you made because in my experience executives of big companies they don't they realize the potential of a huge blowout but that change in culture where you're just kind of being able to foresee the actual impact of a blowout that is very difficult so how did you do that in very briefly the way our Lunsford is it's actually interesting to be coming into broad comp as I said before Broadcom is

many many years of heritage of building components for switches and also of communication equipment and the the view of quality is different in that type of company than it is with software I've been in software for all my life and I'm not the youngest in this room as all of you can see I'm probably the oldest so I've worked in a number of different companies in almost entire life in in software and you know I can come up with a lot of bizarre jokes around quality and software I will not do that especially not with a camera but but when you come to accompany that habits Harwich in hardware and you understand the impact that is if you build a chip any type of

chip and release it to any om or ODM who then build one hundred thousand or million or two million devices and shipped them and it doesn't work and the impact it has on your name your brand your revenue and the whole lot so I've already now been part of of planning meetings at Broadcom that include the software engineers and the general managers of engineering and the pressure days on these folks around these products and I mean software parts they have to work and we don't want to see any flaw in the quality because it impact the brand so our company any companies should particular brand you know we all work for a company and you're interested in that your brand

your name your logo is not being damaged in any shape or form and that is the senior leadership of course but all everyone should be part of that but the pressure that's coming in this case from an executive management on Broadcom around the quality has to be second to none is coming down on software as well so the point where if it doesn't meet certain criteria you don't release so if you had a roadmap and it was we will release may with this release if it doesn't meet your criteria you hold it period it's it's not about release on time it's about your release on quality so that is a change we got coming into

Broadcom and being part of a much bigger company that is Happy's roots is hardware I don't know if their answer your question but that's at least what I've learned yeah I have a question for both of you we're talking about the vulnerability discovery remediation disclosure that whole dance what do you both of you think about bug bounties and is there any discussion in your companies about going that route either a private bug bounty through a service like hacker one or bugcrowd or something of that nature yeah I didn't really talk about that in the disclosure part it's a part of the private disclosure industry maybe I think it has its place but if you get

contacted by someone like saying how we have something to have a bug bounty system then it's maybe too late and you might be wondering am I getting blackmailed now so take the stand before you get that question and publish that stand yeah

that's why I'm flipping a bit different CA acquired a company two years ago there was the name was very cold I don't know if any one of you had looked on very cold now Broadcom spin spun it off at the November 5th when CA was acquired into Broadcom because it was considered to be too far out on what Broadcom was to focus on Broadcom wants to focus on communication and infrastructure and this was really really in the developer space for verrico is basically a tool where you as a developer you write code and when you come to the point where you click compile it will spit the code into everyone builds a checking engine and

will scan the code and say wow yeah you know that code is broken there there there and that's one way to do it and then if the developer is not skilled enough we will come up with tutorial on you know this is the way to write proper code to avoid this this issue or that issue so it's you know my point is go back to when you write code and have people to write proper code and don't allow code to be released so put barriers up where if you try to release that piece of code into through the next phase you know block it becomes a piece of software library code and that other tools out there on the market are not

advocating for where code I have no interest in the company in piane I think is it's very interesting is where you can put barriers are saying that code can not release if it is on that level of quality and you can take your code you wrote five years ago and run it through and say well holy Moses now because we learned so much or the five year that now we have all these issues and they be that compacts to the hygiene of you or your product where you would say at a certain interval you you scan your code and you you clean it up and make it you know fit the expectation that would be in the market and and you

know you don't necessarily talk about this part of your brush your cheese every night and every morning it's the same you would do with your code on a regular basis and that choose to do it today so why not use them thank you very much both of you for a story about people finding vulnerabilities and the people fixing the vulnerabilities actually getting along look at them shaking hands you

[Applause]