
hi there welcome to besides SF just to let you know there will be no slido today so we will have live q a at the end uh if we don't have time during the presentation you're welcome to meet him at the lobby take it off take it away test test test all right there we go hey everyone thanks for showing up appreciate it uh hope you guys are having a good b-sides I know I am so uh yeah let's get started hey everyone uh today I'll be talking about how twilio's segment proactively protects customer API tokens Secrets Keys whatever you like to call them I'm salivarez I'm a senior software engineer at twilio segment on the security features team and our team is responsible for MFA SSO really anything authentication related login logout and I've worked uh about one and a half years there I've worked on both customer facing and internal facing features and one of my concerns last year has um has been API token or in early this year has been API token security basically like how do we ensure that our customers tokens uh or our customers are informed and protected in the event at a segment API token leaks so I think we can all agree that nothing good ever comes from committing an API token to a giriba right who has committed a reap uh a token uh we we've all done it right most of us you're right it's no matter how senior you get um it happens right it happened to me a couple months ago uh testing some stuff actually committed my own segment API tokens so uh bad if uh bad if you don't notice in time like especially if it's a cloud provider key nowadays open AI uh API Keys also pretty popular so that can get pretty expensive uh quickly so bad actors are using these like built-in search Tools in GitHub gitlab running regex patterns on Source graph looking through the public events AP on GitHub and this is anything new right like like we've seen the secret leakages uh if you've talked to the kid Guardian folks they've built all tools around this right so how bad is it right um again get Guardian has published a report in 2023 saying how leaky 2023 uh has been and they they stay over 10 million secrets on GitHub alone have have been found and in 2021 I think it was like six million secrets so it looks like it's um it's trending upwards right and we're always hearing about leaked credentials uh leading to data breaches right take for oh it's very bright sorry take Toyota for example in 2017 um a subcontractor uploaded a portion of uh some source code and that source code had a API or some some token that gives them access to some database filled with customer information right uh emails phone numbers um and it that what happened in 2017 and it was up there until 2022 right so uh it's they're unable to confirm whether this data was like downloaded access like Toyota doesn't know so uh we don't know what what uh what was um if there was any uh breaches or if hackers were able to leverage that data in any way and back in 2019 there was a paper published by North Carolina State University Research team aptly named how bad can it get and they found uh and I quote consequences of even a rapidly detected secret disclosure is severe and difficult to mitigate short of deleting a repository or issuing credentials right basically what that's saying is once a secret hits like a public repo that API token is compromised right there's no other choice but to rotate it or if you want to take more drastic measures just delete the repo and I hope devs aren't just adding another commit removing this config file or moving an environment file they accidentally committed also not and you can also attempt to rewrite history but once you push uh to a git hosting provider people have cloned it people have pulled it it's not easy to it's really easy to get back that Rewritten history right and that's what we want to help keep or help our customers keep in check right we want to protect them in scenarios like this we don't want unauthorized people using um segment tokens right so I'm going to go back to White so pair your eyes so on the agenda we're going to talk a little bit about segments to understand like why we wouldn't want a a segment token out there then talk about how we address these linked API tokens right specifically on GitHub and gitlab right the most the two most popular code hosting git providers so and after that just a little bit about orphan to tokens so um in some cases I can create a token an organization and that and I leave that organization maybe that token's still around still usable in a long-ring script no one's no one have no one has touched in a while so um I'm gonna sort of explain that a little bit more on the slides and you'll learn more a little bit more orphan tokens and how we handle those so what is segment you might have seen a billboard coming into SF saying twilio this segment is twilio's awesome customer data platform and you're like what is that I have no idea um and if you if you attended Leaf risers talk on tracking uh meaningful security product metrics talked about how our team the security features team leverages segment internally so to sort of track um usage related to our the security features we build so if a someone creates a skim token um scim token we want to know when it was created by what uh customer just to get some metrics on like our business to your customers using um a skim at all so that that would be instrumented in the sources uh that's a back-end server we instrument the segment call and that backend server is considered a source and we want to send it to a destination we like so snowflake we're able to visualize metrics in that so we so snowflake is a destination it's one of many destinations segment offers right and you can manage these sources and destinations programmatic or in the UI but we also offer a programmatic way to do it with an API token right what if that API token fell in the wrong hands right well theoretically you could spin up a new add a new destination start siphoning off customer data uh you could wreck some Havoc start halting your data pipeline deleting some sources adding your own source and just like polluting that data pipeline so not hard it's not uh pretty easy to convince people that to build something to get those uh um like revoke those API tokens right so how do we do that um luckily we don't have to build a secret scanning solution from scratch right no need to scour public commits ourself and GitHub gitlab offer programs so they offer secret detection or secret scanning for GitHub so your detection git lab more or less the same thing and they can alert users when a secret is committed uh if you pay a little extra to GitHub they give you a nice little dashboard saying like oh these are the commits we found uh go do something about it but they also offer um partner programs for assess providers like ourselves or SAS apps like ourselves take advantage of these features to hook into that um uh platform and we notified ourselves when GitHub finds a segment token and all we have to do is uh two things right we need to one provide an API token a regex pattern and uh two set up a public endpoint now how does that look like well at a high level a user commits pushes a change uh the the git provider runs a list of regex patterns and then they send matches matches to our endpoints so that looks a little like this in the request they send us a request with a token uh what type of token it is where they found it because not only do they look at commits they look at pull request descriptions issues anywhere really there's user generated content um the alert us on so you send this over and on this on the segment side we can really do whatever we want here we what we do is value the token make sure it's actually legit revoke the token and then send an email out to that workspace owner or admin and this is like this is where you're like if you choose to implement this this is where you can differ in your implementation right you can really do whatever you want when you receive that exposed token and it's one of the very first questions we asked ourselves is do you want to warn or revoke right um one we like the very first option we came up with is just warn the customer it's really easy we can leave it up to them to take action right but what if they don't see that email right what if it happens late at night they see it in in the morning and it's too late right so do we just revoke the token and then notify uh well what if they have an important workflow right that relies on that segment API token um do we disrupt that and then the third option which I don't recommend uh is we can even warn them and revoke the token after 24 hours but sort of has the downsides of of the first two where it's like what if they don't see the email it's the token's still usable after 24 hours and we still end up revoking it so uh we really wanted to like like decide like would we rather have our customers have a security incent where the end user data basically the our customers customer data is potentially at risk or an incident where the their script doesn't run because the API tokens revoked but the solution is just to just to rotate it right you probably want that one right um and we also took a look at what other secret scanning Partners were doing and this is a small sample of of what who's on board on the GitHub uh secret scanning Partnerships but most of them just autorific some of them are fancy give you an option to to um either Auto revoke uh or or just notify so we wanted just a first cut hey um it's Auto revoke and after seeing what most Partners do we felt justifying our decision to do that so one of the probably the most important aspect is prior to joining a partner program is the API token pattern right you want your API token pattern to be identifiable you want to be able to look at and say yes that's a segment token uh twilio token and up and one of the very first projects that that I took on when when joining twilio's segment was this the on the top that was what our uh tokens looked like it was a generic 64 character string very hard to even real like uh you're going to get a lot of false positives if you try to write a regex for that right you're going to get a lot of um noise so inspired by GitHub they have a great blog post on this describing how they tweaked their um API token formats I have that those links on the end of the slides but what they did was just Add a prefix to their API token and it did something more fancy like they added like some checksum to like quickly validate if a uh GitHub token is actually GitHub token but we opted with the easy slap a prefix on it SGP standing for Segment public API token and then we can easily now write a regex right and provide that to the to the applicant lab so take away one and easily identifiable prefix makes it easy to leverage existing tools that detect hard-coded Secrets right get Guardian get leaks uh truffles uh triple hog open source tools that you can contribute your pattern to so there's just usually a list of regex patterns that you can you can now that if if you have a pattern that you can easily match on you could contribute to those open source projects and now in the future if your customers are leveraging any of these tools these pre-commit uh get uh pre-commit get hooks uh uh tooling you prevent segment tokens from even like or your token from even making it into their code base to begin with so like armed with a regex pattern we went to GitHub we're like we're excited to partner with you and they're like no way we are too here's a bunch of contracts so you got to go through and uh after that we provided a launch date and we proceeded with building the service right uh to receive these exposed tokens and we call this the exposed poppy token service and again we chose GitHub first because it's the most popular but we are onboarding to gitlab currently as well and I'm going to mention a lot of like this is in the docs this is in a docs because I'm gonna explain some code but GitHub provides really excellent docs to get onboarded onto the secret scanning stuff so excellent documentation on this and since segment is a node shop uh this exposed puppy token service is just a simple typescript express.js framework using expression JS framework and we were and we just expose one single endpoint slash revoke and yeah once this plan was finalized approved it took about it took me about a month to get this out the door most of it was just infra uh related um stuff so up here there's pseudocode that looks a lot like JavaScript and really what I want you to take away from this this is the bulk of the business logic related to actually building this this uh service it um and some of it is specific stuff but really the two most important parts are the GitHub verification middleware basically verifying that hey uh only ensuring that GitHub is allowed to send us um requests or only uh only listen to requests made by GitHub and they do that they pass in two um things in the header uh key identifier key signature and we do some again in the docs we do some fancy crypto stuff to verify the signature and if it's valid we move on to the Token verification and we take the parse we parse tokens and then we just call our segment specific revoke right this is going to look different for you we call internally we have our own token verification endpoint that we call internally so your VOC is going to look you know obviously different uh very personal uh coming to company but for the most part um again don't really have to understand what's up there but what we do in our revoke is you first validate if the token's actually legit and then we do some and then if it is then we choose then we revoke it right also have like a feature flag thing up there if for every reason like a company wants to like hey so I'll stop revoking my tokens I want to commit uh public I want to commit like we could disable it for them so um yeah now that if you ever commit a accidentally commit or purposely commit a segment token you're gonna get this in your inbox right um you're receiving this email because we resolved a public API token and there's some metadata uh we send them the exposure link which is uh which GitHub or gitlab provides to you so you can so the customer can see like oh where where was it found and then we sort of have explain a little bit about the um what impact they had on their workspace like okay now that it's revoked what can you do and we recommend like hey go check your audit Trail ensure that um nothing fishy happened with that API token but I did some very unscientific tests I committed to prod segment API tokens and for the moment I commit and pushed uh to the moment I saw in the system that was revoked it was under 10 seconds right it's really really quick and another 30 seconds ish for that email notification and then just a little bit more about like like why they're receiving this and then I'll link to our documentation and I told my team like oh like I'd be so happy if I revoke one token within like six months like oh like I I uh um I I didn't think would you get uh within the first two weeks we like revoked our first token within the first month we you found we revoked three tokens right it's like oh wow people are actually committing segment took it's like ah uh go figure so here's some metrics on that um this is like taken after like the first month or two release so I think I'm released like February and a lot of invalid stuff um some of it is like GitHub like settings like uh sending test tokens I think because they happen around they always fire around the same time and are identical so some of them are revoked so I think those are the three or four and then some valid ones that's messing with that feature flag I was talking about so take away two leverage existing secret scanning partner programs to easily notify your users about lead credentials and help prevent security incentives on your customers behalf right like being able to just Auto revoke your token and prevent that security incentive on for your customer that's a good feeling right and what about tokens that are not necessarily leaked but known by users that are no longer part of your organization right um at segment like I said sources destinations workspace is a logical grouping of them in some applications API tokens are tied to the user right I create a token I mean the API took in the performed some actions on my behalf and when I get off boarded usually that API token also gets deleted as well in some cases but at segment it's a little different they're tied to the workspace instead so this gives us the advantage where it allows you to sort of add different roles and permissions to that token separate to the user and when you create an API token it's so I can create an API token side to the workspace I leave that workspace I technically have knowledge of an API token that can still perform actions in a workspace I'm no longer part of right an orphan token so do we just Auto revoke those as well right um not quite right it's this is where we've made this into alert space owners or admins instead and they can decide to whether to take an action right it's going back to that whole warn versus revoke right like in the case of the leaked token we know it's public it's on the internet people are not nice on the internet so uh we have to revoke it um and in the case of a orphan token you know it's an ex-employee left on good terms it's probably powering a script that you don't want to touch um and we just recommend hey get a nice uh warning like hey we recommend you rotate this the tokens Creator is no longer part of here uh part of this workspace and along the similar vein along the similar vein we also alert on unused tokens right they provide no value they're just a liability they're hanging around not being used and that usually is like if a tokens last to use over three months well you'll learn on that too so we also sent an email where it's kind of like the leaked API token we also send emails out for the orphan tokens right like hey uh just for just for your knowledge here's an orphan token consider rotating it and this is all powered by a Cron job that fires once a day it's really simple basically fetch all the tokens for a workspace if that token Creator is no longer part of that workspace we alert right and uh here's a chart of all like the tokens we've marked as orphan so it happens more often than you think and uh previously I mentioned not wanting to Auto delete orphan tokens and that was a decision made during the planning of the project and it was further reinforced when we saw a couple more metrics roll in where we saw that 25 percent of Orphan tokens were deleted after the first like email notification and 13 after the follow-up notification I think the numbers are even down more uh um uh last time I checked so if maybe we saw the optic Center our customers casually cared or being were proactive about deleting reference tokens maybe we in 90 of them deleted after first notification maybe we consider um Auto revoking that but for now uh we felt justified in our decision so um yeah take away three consider notifying users when a tokens Creator is no longer part of or it works uh organization or company right the list of SAS apps with GitHub partnering with GitHub git lab is growing and I really haven't come across another company that like notifies me about orphan tokens if they have that concept so that'd be pretty cool to see right and at the end of the day like we hope well we I know that these two features if you look at the metrics are keeping our users tokens safe right and preventing at least pretty devastating security incidents uh and we're sort of able to strike you know considering that balance between user experience and security right