← All talks

API Scraping For Swim & Profit

BSides Bristol · 202522:3131 viewsPublished 2024-01Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
A case study in API security through the lens of a real-world problem: automating availability checks for a members-only swimming facility using Python, Lambda, and Chrome DevTools. Melissa Augustine Goldsmith walks through how an open API, combined with serverless automation and basic web scraping, can give an unfair advantage over manual users—and explores why organizations should care about securing APIs even when they seem low-stakes.
Show transcript [en]

need of a microphone so welcome to my talk uh scraping apis for swim and profit uh I want to the case study it's more of a story so come along with me on a story of about apis and why securing them evens the playing field uh my name is Melissa Augustine Goldsmith I am a lead threat Hunter at NBC Universal SL Sky slash illumination SL DreamWorks we seem to own quite a few things but I've been doing forensics and IR officially since 2007 and I started doing threat hunting in 2020 during lockdown because there's nothing else to do uh I used to teach things back in the old school day when I was consultant I

taught uh malware forensics and incident response bunch of places around the world uh I code badly I am not a coder I'm a bodger I'll I see a problem I'm like Can I code my problem my way out of this if I can excellent and then I let the professionals deal with the rest and I've spoken at places before but um yeah nothing to do with this so let me set you up with a scenario so has anyone ever been to the Bristol Lio it's beautiful right very lovely right um but is members only and you are only allowed to go if there are spaces so you have to go and there's a system that you can try and book and it's it's

a bit difficult and there's apparently a 2,000 person waiting list to become a member for the Bristol Lio it's in a beautiful area of Clifton uh in Bristol and I highly recommend you try and go it's very beautiful and I've been wanting to go for a while I have a thing about Lidos I don't know it's a British thing it's just fun I like going in them the colder the better um the one in Cornwall is actually quite nice as well but anyways uh it was my 40th recently and I was trying to figure out where to go with some of my girlfriends and I said let's go to Bristol I haven't really been to

Bristol uh and this is a good excuse for me to now try and go to the Lio and my friends are like well it's your birthday sure will go but even 2 months before the said date of this event there were not enough slots for five people so the swim slots I tried to go it's like oh sorry there's only slots for for two people like this is unacceptable right I don't like being told no especially for my 40th birthday so one of my friends who is coming with us she is a database admin and she noted that the API or application programming interface was actually completely wide open you could interact with it however you see fit so

this led me down the path of scraping the API to try and ensure we could be made aware of any cancellations and then jump on and book that right because I wanted to really go to the light up so what is API scraping um people of a certain age will remember when the web was web 1.0 and static something called web page scraping you go to a website You' use like curl or you get or python if you're feeling really fancy and get the information from a web page and then you could do with it whatever you want you can put into a database you could aggregate it and you know just modify it or massage it to Output however you

wanted it to do the API is kind of the same way you can scrape that data because now it's all Dynamic and so you can't just scrape a web page anymore so you can use the API to scrape that and get that data and then de with it as you see fit so how did I go about doing that so hopefully um any pentesters or network Engineers probably know all about developer mode in Chrome uh it is this amazing feature it is free because Chrome is free it's F12 uh if you're evering Chrome and you're wondering what the heck I'm talking about but it shows you a bunch of information about your web browsing activity including Network

traffic so all the requests when you go to a website you can see all the requests that the website is making on your behalf for you so all I needed to do was then go to the website it for each of the services that I wanted and note the r request and then massage that URL to match my specifications so if you see here um this is in developer mode down here and then up ahead is the Bristol Lio website and you can can't really see but over there is like you know what you where what day do you want your booking to be and then down here you can see the URL to go to that to to get that

information back and you can see here there's like an offering ID and so that is the swim option and then over there you can see the response which is in everyone's favorite Json so you know you can actually massage it the payloads here and the requests you can see all this stuff you can do in Chrome developer mode it is amazing I highly recommend it so okay proof of concept time right so here's my bodge code python to basically get this information so I use uh request quests to get the URL and you can see the URL parameter and I set the date to a specific variable the 14th of October and then you can see here

parsing the Json again this could probably be done much nicer but again bodger not coder and I print the time slots available the time available and then how many slots are available so right I'm looking not only am I looking for number of slots but I'm also looking for a specific time because I wanted to go to the Clifton suspension bridge vaults tour which is at a certain time so I had to do that I had to do this before that because I'm 40 and I find Bridges very interesting and that's fine so that was cool and that worked I was like hooray this is great I can now get this information but why stop there

um one of the things that F interesting about computer security is that if you have a curious mindset you just kind of keep going a little bit couple inches farther and farther and see what else can I do well how about this and how about that it's always it's always always a fun fun feel to be in for sure so the Lio offers something called Swim and spa packages and in a pinch we could buy one of those to grab a spot at the Lio so I messaged my girlfriends and I said would everyone be okay to spend 90 Koy to do a spa and a swim package they said we are all parents yes please would

someone give me a massage for the love of goodness so once I knew that I could again go to developer mode to see what the URL looks like for the swim and spa packages and you can see up here the beautiful curl command and then the beautiful Json that you get vomited back out at you and obviously this is not pretty humans don't really read this very well but luckily there's lots of Json prettifiers like um JQ which is you're ready for the animation here it comes so suddenly this becomes much more easier for us humans to look at to parse and then I'm like okay this is how I have to parse my Json and my python code

so then I know you know it's it's it's nested within these structures here so that's nice so again I have my codw workking you know do a little happy dance have a cup of tea but I'm like this is this is all well and good but I don't want to sit here and have to remember to do this every single day so old school people or like to set up a PR job I don't want to do that I don't actually leave my laptop or desktop online anymore I just turn it off because at the end of the day I just want to turn off don't want to have my laptop on so I thought hey I could use a

Lambda function which I've heard so much about and I want to kind of learn and understand how it works so this is a great opportunity to do so so I figured okay can I leverage Lambda to do this code for me and then alert me via email or SMS or smoke signals wherever you know is the easiest if a slot becomes available and this was actually very easy um learning Lambda is not as hard as I thought it would be literally all you do is you plop your code in that function Lambda Handler and it works very magically um so this was I was already to you know learn lots of new stuff and I'm like oh this is all you

have to do okay cool yay um so I had another cup of tea and I was I felt like I was winning but I still needed to get it to run right I don't want to have to the whole point of Lambda is uh server list code code running code running without a server sorry so it basically can run your code when a trigger has occurred so somebody goes to your S3 bucket somebody makes requests to your website it can trigger a Lambda function um in the NBC Universal World somebody wants to go watch a certain video so there'll be a Lambda function to potentially go and start pulling all the shards to then show you guys videos um

but I don't need that I just need it to run like like a Cron job so how would I how did I do that I use something called an event Bridge so I can set it to run at whatever interval I wanted um and then from there you can see here I set it up to run every hour and this was all free this didn't require me to have a super special AWS tier um so this whole thing cost me absolutely nothing just my time so okay cool I have a trigger to run every hour I have my code but now what am I going to you how is it going to tell me that this is now

available well I created a burner Gmail account I was going to do an SMS but there was a lot of extra setup for it and I decided ain't nobody got time for that I'm just going to do it the oldfashioned way and I set up a email account and then I used something called an app password so basically it allowed me to make code that would allow Gmail to send emails on behalf of that code if that makes sense so I could run basically send an email from a python script um and you can see how to do that over at that lovely URL so here's my again BGE python code says if there are

any results from that python thing that I sent you before then I want you to send me an email and again Lambda is very powerful you could have it do all sorts of other triggers um like web hooks to slack Channel and all that good jazz but again simple so the result success great success yes um October pool availability uh so I used the 13th as a test subject because that's actually when there was availability unfortunately the day that I wanted for a while there was no availability but you can see here it it does the job that I needed to it says hey we have the following open slots for the pool and we have the following

massage appointments for you that are available you know like yes so now I had to do was set the right day and then just hope to goodness that it was actually going to send me an email Emil so the question is is this is this is this a bad thing I don't know like I didn't feel like it was but when you think about it it gives me a distinct advantage over somebody who doesn't know about apis and understands that you can change the query to be a certain you know the date to be whatever you want and you can parse the results in Json so it doesn't seem very fair somebody who has to like set an email reminder like

every day okay I need to go check and see if this slots available at 9:00 because we're human we forget these things and you're like oh God I might have just missed that one slot because I forgot to check um but what if this was you know glenberry tickets or Defcon tickets or Taylor Swift tickets suddenly it becomes much more like hey this isn't very fair and that's why obviously Ticket Master and all those other companies have lots and lots of things to try and prevent this kind of thing from happening but okay so the question is how do you fix this right there's no point in having a problem without presenting some solutions so there's

session tokens or oo that you can do to try and authenticate uh everyone who's using the API which again for this doesn't really make much sense because this is open to everybody everybody can go to these you know the Bristol Lio website and interact with this they just don't realize they're interacting with it you can do request limits so you know if you see this IP making your request more than four times over x amount of times that I want you to block it but that'll then block potential valid users right because if you're trying to just find any available slot you're just clicking you know next day next day next day and you might get blocked which is

not a good user experience you can do anomaly checking or user agent checking but that's very trivial to just change your user agent to be whatever you want it to be so that doesn't really work it only will protect you from behaving Bots really um the true way of doing this would be to have visitors ha hitting the API front end and then that host will then go to the back end get the information and then return that back to you so you're not actually interacting with the API it's just like that middleman proxy doing it but again it's like the cost versus the reward scenario here and they're probably like this doesn't make much sense to do because

it's open right so again is this such a bad thing I don't know but I could go on so if I find available I can in Theory start doing more coding to go through the process of buying that slot automatically and then emailing me saying hey Melissa here are your five slots obviously requires a lot more work because you start dealing with you know captas and credit card information which is a lot more I don't say protected but has a lot more things in place to try and prevent this kind of thing from happening but anything is possible correct uh it could be swimming as a service uh you tell me what you want and

when you're trying to get it and I can have you get emailed or SMS when it's available for a fee not really probably a I'm not going to be retiring to the MDES anytime soon off of this swimming as a service but again it's possible right and then you think well this is just Bristol Lio well there's actually other systems that leverage this and there so there's about you know this is on virus total um about 29 28 whatever uh spas that leverage this API So in theory I could say not only can I try and make sure that you get notified if there's a slot available for the T you know for Bristol Li but how about the

temps Lio or another Lio nearby so you again you can start increasing your swimming as a service to uh suddenly start looking a lot more lucrative but uh in the end we were very successful and uh we were able to go to the Lio and that is four of us unfortunately one of the the people had to to leave but um that was just a fun way of how I was able to finally go to the crystal Lio so does anybody have any questions oh yes hello s things for a lot of Elric car um and some of them are surprisingly this is open public for me yeah surprised how many big car is of Industry

standard best practice any have any

standards I am pretty sure there there are some standards but unfortunately I don't know I feel like nist would definitely have a standard on securing apis or like locking down apis or Sands they might have come out with some white papers on it but for a UK perspective I'm not 100% sure to be honest but it would make sense especially I've seen uh talks where someone was able to have like it was the Tesla API and they were able to like change like the windshield wipers and like the lights to turn on and off and you're like that's not that's not good is it but um yeah it is a big problem I and there is there was a

question and so uh I drive ran this with my parents and the first thing my mother asked was is this legal and I'm like yes but I do think there actually there are some there's an aspect on that website though that I'm double check because if it isn't if it's as not locked down as this I'd be a bit more nervous about it um so I might actually have to tell them about it but it is a problem and people don't even realize it's kind of like the S3 buckets that people think is secure because Amazon has that that you know shared security like oh it's totally locked down but it's actually not so people

might not even realize their API is exposed and bad things can

happen yeah yeah it's the uh the amount of time that yeah you're willing to just like work around it yeah so anybody else yes what scanning

yeah well I think is anytime you're starting to like actually try and change so like if I could suddenly find other people's bookings and change it to my own somehow I think that would start to blur the not even blur the lines that would definitely be bad or starting to access other people's personal identifiable information then you start getting to I think we probably need to talk to them about they have a potential exposed thing that's actually exposing per people's personal information but it's a great question and that's like the one thing the like I said the the one part of the website that I'm a bit worried about is the thing that I I'd be afraid to start

scratching that because I think it starts to blur if that's legal or not and so I'd have to actually start talking to somebody who has a degree in law so yes sorry I go back to that question the way I would probably say is at the point where it's you're faster than what a human could do I would say that when you're abusing the system so for example if you're scraping that data and then making an instant book say that is then crossing the line yeah because I was running this code I think it was like once every hour so in theory like you know I could do that every hour if I just remembered but I'm a human and I

can't remember anything so yeah thank you yes and you said did you tell the I'm go so once I I'm there's this one part of the API that I'm a bit nervous about and if it if it's open like this I'm going to tell them but this I'm it's that question of is this it's it's kind of working as the system intended right they're just I'm just skipping that the UI interface the goey and just going straight to the data so it's I think it's kind of a gray area I'm sure my mother would prefer me to to tell them I probably will just because it's just I just don't want them to ban me because I

thought it was a really lovely Li though might get a free swim I might get a free swim which would be really nice so yes it's like if you paid someone to check every hour for you would exactly no it would so this is it it's like you you know you could pay people to queue in a line for you so you can go and get your cup of coffee while you're waiting for whatever ridiculous thing that someone is selling it's the same concept so I don't think there's anything wrong with it but it's that question of do they intend is that how they intended the API to be used and do they do they

care enough to spend the money to fix the problem when in theory there's nothing personal behind it there's nothing you know super sexy that that they're going to be worried about non crbs or anything yes sometimes you can actually tell people stuff like that and they either don't understand or they understand but they don't care yeah like so you could tell them and if the person or the business actually doesn't care it might still be wrong but it's kind of like well I don't think I would go to the Lio specifically I think I would just go to the try. be these these people and say hey do you guys realize that this is possible because I was just

doing it once every hour but what's to stop me from doing it once every minute could I potentially dos their system I hope not but I don't know like I don't know what the resiliency is and I'm not you know I like my job I don't want to lose it so yeah yes more of a follow-up comment because you're like well how can this be a service but there is actually an app that you can currently sign up for with the monthly um fee to do exactly this for driving tests cancellations and then immediately and it's quite funny of that gray area that you were talking about because it can immediately book you in for the test oh

see sound like suddenly my swimming as a service my sass my SAS platform is suddenly sounding very lucrative my island in the SE shells is going to become a reality so there those websites right like stuck in format but when when there was like GPU sages oh yeah they do exactly the same thing exactly and it's fine right you use the API exactly the API exists to to allow people to find slots and that's what you're using if you then decided to you know it said the slot was unavailable but you then still took that slot I and tried sending it to the book end point yeah that's that better becomes yeah start taking somebody

else's from them so oh yes I guess it's an economic thing right because it's a small scale nobody yeah I know what you mean interested but I guess the Li and all the sort of users or customers of this provider API service provider you about it and they then it with provider more like what happens if there was some somebody visiting to Bristol for that day only and they really really really wanted to go to the Lio and I took their slots and then they were leaving and never coming back seems unfair but yeah Oh thanks thank you very much