← All talks

Getting over the finish line: Loom Security Journey

BSidesSF · 202430:57666 viewsPublished 2024-07Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
Getting over the finish line: Loom Security Journey. Narayan Gowraj, Nishant Jain Ever wonder what security functions are built within a rapidly growing startup and what matters during a merger and acquisition? Listen to me talk about being the first security engineer and what we build at Loom to facilitate a smooth acquisition with Atlassian. https://bsidessf2024.sched.com/event/72e5560952350534e88ca8c9464e9f3c
Show transcript [en]

next up we have a panel with Gage and Jan to cover getting over the Finish Line Loom security Journey please join me to welcome our next panel thank

you okay hey uh how's it going everybody um thanks for being here on a rainy Saturday morning and uh appreciate you spending the next 30 minutes listening to our talk uh quick introductions uh myself uh my name is n um I lead security at Loom uh which got recently acquired by atashin uh previously uh doing a lot of uh product security and Cloud security at left inobe and we have uh Nishant Nishant who's also security engineer uh at Loom and uh he primarily focuses on on like abeg bug Bounty and a lot of security research um so the title of our talk is getting over the Finish Line Loom security Journey um so Loom is or or

Loom was a seven plus year old company when it got acquired by atashian and um as you can see founded in 2015 I joined in 2021 as the founding secur engineer um and uh Lumar Aire end of 2023 around December uh so we're going to focus more on like the the three years from 2021 to 2023 um obviously this St talk is going to be a combination of uh U scaling security uh not just processes but also products uh security tools automation um how we collaborated with cross teams and also we would like to share some uh War Stories as well uh especially around one of the bigger biggest security incident that Doom faced um in 2022 um it's it's there's a

public blog post but yeah we we do want to like share some nuances around it uh some transparency leading with trans is something that we we follow a lot within uh the Looms engineering culture um so for people who don't know what Loom is Loom is uh an asynchronous uh messaging app uh which is primarily used to improve Communications and collaboration it's basically recording a few clicks share anywhere collaborate better um that's that's that's kind of like the marketing term that we use um in general this is what we we going to uh this this is how we're going to start so we are going to like start from um um where our extension was launched uh that was

around in 2016 uh 2018 we had our desktop app uh and then 2020 uh before I joined uh which was uh hed2 of 2020 uh later part of the year in 2020 uh we did our first pentest uh and also we started our bug Bundy program and we did get our sock to typon uh and then I joined uh when I joined Lum was um around like 40 million users um a lot of um high-profile customers a lot of uh IP intell property that needs needed to be protected and then uh we had our procurement strategy in so and also Soto type too and um in 2021 U edge2 um so as you can see um before we

dive into like the security journey of um what started the journey for me um was I felt the interview process at Loom was was quite different uh it wasn't just about um uh just like the regular interview process uh where um it's a security deep dive or talking with leadership but more than that uh there was a pz walkth through so before I joined there was a p issue which was uh local file inclusion leading to remote code execution so there was a basically I was um that was one of the rounds where I had to work with uh the security champion who did security before I joined Loom and uh we went through uh the P walk through spoke

with leadership as well and also understanding the uh the road map and uh product as well and uh in general I think um the execution part uh where how what Security is to the company how prioritized it is depends on like if there's a 30 60-day 90day plan uh and also if there's a startup project where it's more about like can you pick something can you this is the problem that we have rather than giving the solution this is the problem that you you have uh or we have at the company and uh deciding on a solution there also a little bit about firefighting and problem solving I try to like kind of understand um the the current security

posture of the company I didn't want to get into something very where I'm just like firefighting and not uh problem solving or like not thinking about like long-term plans um also do your own research I I guess I did a lot of spot checks with different um uh folks who use Loom at different companies and also business and growth metrics and uh this is one of the as you can see it was in December uh before I joined you can see one of the engineers getting uh Kudos from our bug buy program and um about how they solved uh bug Bounty reports even before they had a dedicated security engineer um so yeah let's started u I

think um starting with the the starter project and this was something that again uh it depends on the the the problems that the company faced uh which was around a lot of um application dos and also not having like right late rate limits um so the way we started a pro start project was like understand business requirements we wanted to make sure that it's cross team it solved our business needs uh what is the top is it is it a top priority and also wanted to make sure it's it's time box we didn't want to like come up with a six-month plan as a startup project we want to make sure that it was time box so one of

the things that we did was deploy AWS W uh as our startup project for our VA um uh to our to our infrastructure and uh some of the challenges and limitations that we had was um Loom was purely on graphql so it was just one API end point and we had to like um add more conditions uh to uh narrow down our restrictions on like graphin points and operations and queries and mutations as well uh deliverables in such U dashboards obviously we wanted to make sure the Val just doesn't product rate limits but also or DS but also B deductions and account creation frauds and uh wanted to make it very self serve for support support other operation

folks um so some interesting scenarios that we had uh once we deployed W um was we actually um there was a where dosing ourself so uh on a on a on a mon on I I think it was Tuesday so on a Tuesday Morning uh what happened was we made some changes to a Chrome extension uh which is one of our clients uh um one of our top most used clients uh after the I mean yeah one of the top most used clients and uh one of the changes was like it started requesting a particular endpoint on loom.com to fetch feature flx for a user and um and it was done on every time the Chrome extension loaded

uh which was very aggressive so we basically used VAV to Short Circuit it uh where we were redosing ourselves and VAV was kind of like the solution not the pro not the the solution that we decided to or we buil V but definitely it helped uh in this case and uh some other learnings was uh the other um extreme where we we blocked ourselves which was we made some changes to our desktop app and we used an open source Library called uh electron Fetch and it used a default user agent and for some reason AWS thinks that that's bought traffic and it started blocking a lot of our traffic so I think also knowing a

customer base uh because a lot of our customers or educational users who come from a shared VPN or uh sharing user accounts as well so uh it's it's always better to kind of like understand your customer base and always test uh when you are deploying a w and we learned it that way and um continuing the journey so a lot of lot of different um factors that I uh had to continue um my journey at Loom um for three years so few things growth uh I think the most important thing is to understand your not just your Rue growth but your engineering team growth your security team growth uh business validation is also very critical I would say uh where as we

continue the journey in 2022 there were a lot of other big players like slack Microsoft and zoom coming up with their own version of asynchronous video messaging system so it kind of like gives you validation that the industry does believe in that particular space and uh it's all about like you finding the product Market fit and uh delivering the the right uh thing to your customers um cultural values so there are a lot of cultural values that we follow one of the things that I really um look forward was optimism uh in a startup um it's funny uh today but uh when I joined Loom my first week at Loom was my iring managers last week so it was kind of a

journey so uh I think being optimistic being transparent uh kind of like talking with the leadership uh making sure you are up kept up to date about the company the culture uh it's super important important and engineering principles fail forward uh I love that uh at Loom where uh they believe that failures are the stepping stone of success don't make don't make the same mistake twice but always be ready to uh fill forward do not esitate on that and uh state of security um again just like fast forwarding on this we started deploying inent tool management security and compliance team expanded we were three people uh in starting 2022 um and if I include my manager it's four people

and then endpoint security end of 20 three so scaling security so um so around like processes and uh policies right so I think the first thing that we add was a vulnerab management policy which was um the kind of like um the the the backbone of our of our expectations from a security team so we made sure there was like we had a standardized scoring we just went with a very simple cvsl scoring um add uh proper slas make sure that we build trust and uh relationships and Partnerships with team we also helped uh or we also came up with a uh leadership U um which is to um leadership review which was like basically ranking our teams based on

like security debt and also uh made sure that like when people fix security bugs uh there was definitely a shout out like in terms of the priority of security B um exceptions for sure I think um driving with empathy making sure you have uh some exceptions in place as well uh because uh in in one thing that we did differently uh at Loom uh which I didn't do at in my previous job was that the compensating controls was always owned by security we could we would add like monitoring uh we would add um rate limiting or like kind of like anything that would decrease or limit the the blast radius and uh also shunning shunning away from escalations as much

as FRS and automations as well um so obviously we couldn't we kept our team security team lean uh and that was only possible by uh procuring some software security tools suful uh we did have like a uh hybrid approach where we would buy but then build a lot of customizations on top of it uh because security is all about context and uh uh we didn't have like the the issues of top 10 OAS web application security issues we had like our own custom issues uh at Loom and um we always used to ask the two important questions why do we need this tool and what if we don't buy and alternates considered some learnings that we had

over the journey was like never go with a multie deal uh security tooling changes quite a bit uh a lot of new players come in go out so and lack of customization was never uh was a no no for for our uh procurement strategy and um working with product right so working with product was always um I would say challenging in a good way because you need to understand what security needs you need to understand what the the company needs and the product needs uh so we went with something very s simple again like simple and but effective uh in from what we saw any any we based it on like the impact in the effort and if

it's low and low then just like tank tankless task and if it was like the effort was uh high and it was like low we would just like forget it and U the quick wins and strategy strategize and plan as well and uh I would say we also had a good balance between fixing one versus product features uh we didn't want to again this again comes with like firefighting was a problem solving so we wanted to make sure that the product features or fixing one abies was given as importance as like product features as well um again this is like a um again I think this is where I think it helps uh Engineers uh Drive U

productivity as well uh we did come up with a trustb verify we implemented break glass scenarios so that people don't get blocked but they understand uh that they're accessing something very sensitive and they that those those accesses are audited and making sure those guard rails are in paers and uh we did come up with lot of good guidelines and L checks and uh we built our own uh thread Direction platform uh in addition to uh something commercial we had uh again these are part of like customizations that we had as part of our security Automation and um security incidents um kind of like followed a very standard process but something different that we did which I'm going to

just focus on that is uh SE zeros yes drop everything fix it SE one three business days uh the reason why we had SE one to be three business days was we factored in a lot of uh a lot of factors uh out of which three are like the attack surface we always looked at the potential attack surface and the reporting Source if it was external that we definitely categorized it as like SE zero if it was internal SE one and active exploitations and monitorings in place which was again like own by security and uh there was a lot of like ownership that security had with security incidents um like writing API rules checks uh SAS checks and uh

extending our Direction response platforms like basically our sources and also F tuning alerts uh to make sure that uh we have like good signal to noise ratio there okay okay so uh again this the state of security I'm going to talk more of the incident right now and the security and compliance team did expand in 2023 as well and acquir doom so okay so this is a watch story that I would like to share one of the incidents that happened in March uh 7 2022 um so this was related to a CDN confli change and um it did uh lead to uh wrong user sessions fetch back or wrong us user sessions send back back to the incorrect

user so CDN config change it's in merge queue on Tuesday morning at 9:00 a.m. and this was already one thing that I want point is this was already in staging for two weeks so it was all good and I'll say why we didn't catch this in staging in the next few slides uh so basically it's a simp so before this config change we would strip out the cookies on CDN uh the CDN um basically was for static assets like JavaScript CSS this gets cashed and uh we would that's how it worked and deployed uh to production at 1021 and what happened was um the the config change that we made uh did not strip away this cookies um and U what

happened was we would uh send back the cookie to CDN and the CDN would actually cat the entire request here so uh few things that we wanted to point out is that it was cash for one second and the reason why it was cash for one second even we had the cash control max age Eder to be zero was because uh we used a CDN manage policy called caching optimize it is an AWS manage policy and um whatever max age or whatever age that you set the minimum age that it sets by default is always one and uh basically it was cashed for once again and this was the reason why it wasn't cached um

in uh staging because staging traffic was pretty low uh it was internally uh used by uh Loom folk so yeah so this was the reason uh the staging or this change or where uh wrong user sessions could be returned back to user caught uh in staging and as you can see uh set cookie and we send the wrong cookie uh cash cookie to client B where the client B uh was client B got like the the value of uh the value a which is like the client a is value so s cie so 1103 instent declared U and then 1110 changes reverted but again this is a caching issue we didn't uh that we could

still see a lot of um support tickets uh coming in socials so what we did was we actually pulled the plug which was at 11:30 we actually again use VA to block all traffic uh coming to loom.com we just returned a 503 service not available and service restored at 245 and as you can see this is Phase One and phase two uh the reason why I say phase one and phase two is is because the incident duration was 4 hours and 21 minutes and phase one was um around 69 minutes and the rest was the service was down and uh we add around like K Loom medos um having wrong ownership so we had to do a complete uh database roll

back and as well as cash roll back uh so we had to lose those Loom videos and we didn't have a better option there and uh we did a potential impact analysis using multiple um using users coming from different IPS uh geolocations and user agents within the short time frame and also using historical data so yeah we had like around like 3K users uh impacted users to which we had to like communicate about the incident um and uh as you can see so the learnings that we had uh one of the things that I wanted to point out is um the cookie right so the cookie uh we didn't the same side was set To None uh

the same side was in like the default LS and uh as you can see uh even though the app is on w.com the CDN loom.com it's it's it's it's the same it's the same site uh because the scheme and the T top level domain and top L Dom plus one is the same uh but as you can see if it was LX then it it needs two conditions to be met which is one the get request the other one it needs to be a top level navigation but here the request was a background request uh which was sent uh to uh fetch static assets so uh default uh would have definitely helped us uh compromising security for shortterm wins

didn't help us at all and I think uh we add that for a reason though we add that for L members to work because you could view Loom not just on loom.com but on like any uh Google doc or any kind of other doc mation other third party sites where you can engage on the loom on the loom and that would get captured on loom.com um compromising security more effort on time we did a Time effort on time uh um analysis and um it did take us three days to fix this um um the the same side to be lags but it took us like almost two to three weeks of probably half the company's time uh in uh coms

and uh in other actionable items that had to be done after the incident um other action items difference in depth for sure uh low scope cookies um device spending is something that we we have right now in loom.com uh making sure we use historical IPS user agents used by user and if you see something different we do definitely do flag on that uh code AIT uh and also we started doing Security started being part of like quarterly planning more um we we were part of it but we started being more active on on it after this incident um post incident uh responses I think that that's where like the a lot of time was spent um and investigations but also one

of the things that we did was uh we did give out a free free months uh I mean six months free trial for our paid features uh that includes like audit locks and privacy settings uh which is only available for uh Enterprise users to impacted users and also people who thought they they were impacted um and uh yeah state of security and um Nishan joins the team OPP post 2023 after the incident um we did have the rec before the incident it was not like we I heard somebody after the incident we did have it before it um and then Loom AI gets launched and atashian acquires Loom so this is kind of the the the state of

security until 2023 and I will let Nishan take take it from here um hey everyone so I graduated in 2023 so Loom was actually my first job and it got acquired yay so I handle a bunch of things as a NE grad I do read I write code the production code I don't push it it gets reviewed don't worry and I help developers fix code issues more more more than that I someone who has had prior experience of being a bug Bounty Hunter I was tasked to not only Tria reports but to manage and build relations with the hackers third like Naran mentioned we have our own Swiss knife our own set of tools so contribute

to that writing scripts interacting with looms internal apis and cph ql queries and stuff like that so this is is our bug Bounty and as you can see 2021 is when the security team was created so lot of money was paid but then it started declining right so the reason for that is as the program matures right we it's difficult for researchers to find valuable and impactful security uh vulnerabilities because we are doing our job and but if you notice the total number of submissions and valid reports except for 2021 the ratio has been somewhat constant so even though reports started declining the ratio the quality of reports didn't and the Bounty change it's going down and you have to take

into account the Bounty changes the number of hackers interacting with a program and um several other factors maybe they don't like us when we Tri a report so they might leave and they might join so there's that um so how how did we do this we at before I joined uh Lum Nar and team did a very good job of making making this a habit where we bump our Bounty prices or Bounty amounts every year so as a researcher this is news or this is music to my years that hey Nom is paying twice the critical amount they were paying last year so I'm going to hunt on them next is promotions so we have short-term promotions like

paying out 1.5x or twice the amount of bounty for our mentioned uh severities but we also do something known as long-term promotion where we incent incentivize hackers to look for issues that we we want them to huntt for for example hey show us some consistency and here is your 555 bucks for submitting five valuable uh vulnerabilities on one asset similarly Loom launched AI so we are like hey if you find an AI security issue just drop this drop drop it in the report next is high efficiency so you also expect if if if as a researcher I'm putting in my time and effort and producing good reports right I want to be rewarded in time so

we try we triage fast but we pay faster so yeah we 96% of reports match this criteria so nice um innovate and learn so our long-term promotion was a direct ripoff of as Watson groups public Bounty program you can go there and watch their table so we learned from Community there were a lot of changes that we made because we saw other programs doing it and we also do spot checks where new features get tested by researchers before they get to the public public so that's also nice now some Goa moments why run bug bounty in the first place so we Loom was plagued with email verification bypasses and we have had these reports like

coming in every year and that's that's like a critical that's a crown jewel so why why was it so because Loom believes in velocity so a code that was written a year ago is now a legacy code and with lot of developers developing fast and changing requirements it was difficult for security and engineering team to C the to catch these issues in time so how much have you paid we have paid more than $40,000 and I think it's good it's better than paying millions of dollars in disaster management or you know paying fines because you got breached um so this is our overall stats we have paid more than $170,000 in bounties we have tried 250 plus reports I have tried

over 100 and our highest Bounty was last month almost $88,000 but we cannot mention that tens of plus so one big question that we get asked is how do you handle B bounties so if everyone here has some experience with security right how do you handle Demar reject policy not enabled and your support staff is plagued with these emails all right you get two emails you get three you get four you get five it keeps on piling how do we handle it at clom we have a support page and an embedded security form or like a bug Bounty form where support team gets a message hey I have found a security ISS in your website I want to I want to

report it so we send them to this form where they send the report it gets sent to our funnel where we have triers with I Naran our manager we get to take a look at it only once it's get filtered out we we don't have to deal with these reports anymore so yeah once we are done with this how do we do sast sast is something that Loom does differently we we have our own rules which is we focus more on custom rules let's say we have an incident then we would you know security team and developer team would collaborate and try and identify what went wrong and then we have an inventory of all these incidents or recurring

security issues where we would just you know we would create a postm and postmortem is where the real code analysis happens we would we would use this inventory would make a goodlooking dog where we have all these incidents and you know why this happened and what was the cause we don't manage who did it but we do know who did it so what do we do after that we usually make rules and those don't hinder development process those are just inline PR comments where we like hey you were in you know you you're not supposed to do something like this because this has introduced issues in the past so maybe you might might want to be more mindful so yeah this is

this is pretty much how we do it and the last thing we want to mention is that Loom has been fortunate enough where developers also care about security it's very rare trust me so Dev teams and security teams they they come together and they manage security so yeah that's pretty much how we do security at Loom we did security at Loom so cool thank you we we have time just for a few questions please I'll remind you if you have any go on bsides sf.org slthe letters qna a and I will will gladly readit them for you so far we have one question for you guys it says how did you decide what order to address

certain priorities before endpoint protection in parenthesis um couple of factors uh um we would definitely look at our past bugs security issues and see which tool would fit in better to um to kind of like shift left and like prevent them from happening again and uh definitely compliance requirements uh also comes in uh pretty on the top um where we uh understand uh if we have certain compliance requirements for sock type two or we are trying to get a different certification then what else needs to be done on that and I think the third most thing is uh we work with support and uh we kind of like have an again like an inventory that um Nishan

mentioned which is what are the requirements or like do customers ask for a particular thing again and again so I think we keep a note of that like for example I think Sim was uh definitely something that was on the top and then that's when we decided to uh buy a commercial product uh for Sim uh within Loom and uh yeah I think uh the the top three factors which is business requirements compliance and also what are customers need uh like is it is it a deal breaker like are we losing out on a on a business deal uh it's just because like we don't have it uh and I think those are the probably the three factors

that we take into consideration thank you our next question is how's the infrastructure security like at Loom misconfigurations insecurity infra Etc oh yeah so we are on the security team but we are on broader uh infra team um our boss uh is leads infra and security U so that's how we we are structured uh so we're very very very close to infrastructure and uh we do use certain um IAC tools uh we did try out a commercial tool um from sneak uh but also we do have uh other open source tools that we have used um uh because we use terraform AER infrastructure score and uh I think um the infrastructure team runs a particular uh lter check or

which does a lot of like Security checks and uh we also use um a cspm product uh within Loom uh which kind of like identifies this misconfigurations uh but and uh one of the other things is that we do use uh service control policies pretty extensively uh just to make sure that we have the guardrails and making sure nobody or not a lot of people within the engineering team can create like insecure uh resources in the first place we only have time for one last question if you have any other question please after the talk I'll encourage you to meet with our panelist here the question is do you use a bog bouny management provider and if not why did

you decide against it uh so we purposefully left out which vendor we use uh but we use hacker one and the reason we wanted to use hacker one I I think that was that was a decision made before I joined Loom or before even Nan joined Loom so I think only our CTO can answer that but why we have stuck with it probably because better triaging experience and I used to hunt extensively on hacker one so I know the system well and I know pretty much all of the researchers who are hunting on a program not all of them but majority of them so yeah please join me to thank our panelist