← All talks

Patrick Colford - Scraping Pastebin for Obfuscated Malware - BSides Portland 2018

BSides PDX · 201826:21168 viewsPublished 2019-02Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
About this talk
Patrick Colford (@kaoticrequiem) Started in 2002, pastebin.com has become the largest service of its kind in the world, serving 18 million visitors monthly and hosting 95 million pastes. Though used for lots of legitimate content, malicious actors have been using the site to distribute obfuscated malware and other malicious content for years. In this presentation, I’ll demonstrate FIERCECROISSANT, an open source tool for scraping Pastebin and decoding obfuscated malware. I’ll also talk about how to tailor FC to your needs, whether that’s to find data dumps, malicious pastes, or other potentially harmful content. Patrick Colford is a Security Analyst with Cisco Umbrella (formerly OpenDNS). Formerly a Customer Service Representative with nearly 10 years of experience, he joined the analyst team in 2016 to help support Umbrella’s London office. He is passionate about security education and hopes to inspire people all over the world to learn more about whatever interests them.
Show transcript [en]

hey friends as blessedly introduced by Mickey my name's Patrick we're gonna be talking about the bottom of the barrel which is scraping paste bin for obfuscated malware this is an open-source program they're gonna be talking about its name is fierce croissant you can check it out the source code there's gonna be released later so the plan introduction who the hell am i paste man what the hell is that first croissant why would you do this findings what sort of terrible things are out there on paste bin and stuff you can try with fierce croissant when you download it so me what do you know cor I love video games dancing musicals helping people you can

see me dressed up for your Dickens fair and San Francisco come check it out pretend to be in Victorian London because honestly it's kind of less scary than now and the barrel that has paste bin so if you don't know paceman is a paste repository service tech sharing service basically and it was founded in 2002 it allows you to store snippets of text or more commonly code the idea of a paste bin originated in the 90s when programmers were like I have this cool program can you help me check it out sure paste it in a bin and chat with me and that's what happens as of 2010 it reached 1 million pace but that was

eight years ago it's got a lot more since then and as of 2014 it has one and a half million active user accounts with about 3 billion paste views like each paste was viewed not each paste is viewed when three billion times but you know what I mean one of the strengths of paceman is that you don't actually need a user account to paste anything and you can delete stuff if you want you can set it to delete it like 10 minutes an hour couple months from now and you like I said you don't need a user account you just post anonymously as a guest and for that reason there are lots of benign stuff

that you may not want to attribute to yourself bad slash Vic you know your pokemon evolution strategies how to run you know speed run anything there are tons of benign uses for baseman it's a really popular service for that reason but there's a lot of terrible stuff one basement too this is deep web links how to access the dark web and you know download tor get your drugs on Silk Road isn't that awesome if you see up here it has about three million pages and so a lot of people have seen this there's definitely more nefarious stuff disinformation campaigns Clinton underground child sex scandal the pizza gate scandal that came out late 2016 this is posted on pastebin it has three

hundred fifty seven thousand views people totally thought there was an underground pedophile ring in tiny pizza restaurant in New Jersey opk up KKK hoods off 2015 D masking planners in you know the South although you know if it's a detox campaign who knows if those are actual Klansmen or not this is posted in 2015 it's probably got new views and there's stuff like this this is a dotnet framework assembly just running and evaluating this base64 string that starts like that I'm sure that's totally benign like I love to run an obfuscated code all the time more stuff here's PHP running similar stuff evaluate these base64 strings and then just do them what the is happening this is the beginnings of a

project I got into when I started becoming part of the security analyst team I've been approached for a loan while as part of the customer service team and then in the October the winner of 2016 when I joined the team we had a company-wide hackathon and we read this blog by Securi saying websites backdoors leverage the baseband service in essence what happens is compromised websites use something like that executable to go and look for raw paste content on paste bin execute it and then do whatever the the program is because you know like our keynote speaker was talking about their malicious executables everywhere you can find them on paste bin - so this blog post sparked our interest you know what

kind of malware can we find we malware protection company this is our dub it's a little hanging fruit let's go out and grab it what is there and what else is there besides malware that we can you know analyze because paceman is used for more than just strictly you know malicious stuff it sees for questionably malicious stuff to disinformation campaigns has talked about before data dumps security breaches you know you get I mean those are not questionably malicious but it's hard for an automated intelligence or an automated program at all to just analyze it you need some brains looking at it so this is the start of fiesta croissant which is a scrappy little open-source

kai thon scraper and there are three steps we use to block bad stuff on paste bin the first two are what fierce pissant will help you do the third step is up to you so first step Python super easy for a non-coding guy like me my degree is in English literature I rounded wound and this job because I like helping people and we're looking for non-human patterns regex is really good for this fierce Besant's a little greedy it'll just grab 200 characters in a row and that's the start of where it looks so you grab a lot of stuff like links to various fishing sites or YouTube channels where you can view pay-per-view stuff or you know slash fake where as

people are screaming like let's do hundred characters part two is the coding commonly used obfuscation techniques to to find what those actual malware bits are common malware common obfuscation techniques base64 hex binary ASCII those are pretty much the the ones we find in to you but we were on into wrappers a lot and then step three is where you come in you've got your d op you skated malware you can do whatever you want with it at cisco umbrella we throw it into threat grid because that's what cisco has its proprietary for stuff for us we take out the dns information but you could totally use virustotal joe sandbox any sort of open source or

proprietary stuff you have and your own work your own you know whatever you're developing on your own to find out what this malware does so step on scraped them in this is Python uses HTTP request so it's a cool library for accessing pretty much anything on the Internet go Python and this is the core of what it scripts for we grab various things about the paste check to see if it's untyped because malware authors are lazy and won't tell you if they're writing in PHP or Java or C++ whatever grab everything from baseman and check out its size like how big is it this is kind of one of the failings of first croissant which I'll get into we

look for something over a certain size if it's under that will just miss it need to change that but you can - here's where I mentioned we just look for 200 characters in Ralph that's not a human string you know humans weren't meant to read characters or words and then are 200 characters long but computers are really good at that so let's find out what they're finding out here's the start of the magic sauce if a base64 encoded structure starts with TV and then OAP be p q QQ q aro and PA it's a base it's a 1 it will decode into is an ms-dos executable the start of base64 as it starts like that will

always always always decode into and i'm a stock's executable and we reverse that search because malware authors are lazy sometimes i will take their program literally just reverse it and then put it up on base pin to binary matching just looking for ones and zeros 200 of them in a row and then hex matching because people will put things on paste bin that are malicious in hex format either straight hex or encoded as raw bytes and then part of what we found led us to these next three things PHP matching looking for anything that starts with PHP declaration image matching because despite paceman only being four texts people have managed to put images on there through encoding and

then ASCII matching looking for anything that encodes with ASCII values step to you decoding it all these decoders come with paste bin or come with fares croissant and they use native libraries to Python because it's easy this is hex just 22 lines of code they ASCII 23 binary 24 base64 25 easy and so we check out what we've got we grab that base64 we throw it into a decoder I can't read assembly thank you and we find out that this program cannot be run in DOS mode it has to be run at the command line so for our purposes we run the malware this is a threat grid we're looking for declarations from threat grid to tell us

what this is and like I said you can use your own open source intelligences Joe sandbox has something similar virustotal will tell you like hey 50 people flag this is malware in our case we found blotted bindi which is a Trojan being hosted on a sprin and importantly for our case we check out the DNS information it found out krypter stop dude org which is a D DNS domain because well similar to why you would want to host on pastebin to begin with hosting malware on DDS domains is easy you typically don't have to really pay for infrastructure all that much you can't be blocked by a list records if your IP address changes all the time

and if anyone finds it out you're one of thousands tens of thousands hundreds of thousands on domains on the same like under the same structure what is anyone going to do to you no one's gonna call the internet police on you so interesting finding is that you found has found fifty seven thousand pastes since May 20 2017 and we found about two thousand two hundred domains worth of that a lot of the malware seems to be sandbox aware and so it'll get false responses like it's just checking Internet connectivity making sure I can you know get out to Google or Facebook or whatever but almost half of malicious domains that we've found are ddns domains DNS org dine org Haupt org

whatever this allows actors to just host stuff and not care about it what I've learned in my time as a security analyst is that a lot of malware authors are very lazy phishing is a low cost high payoff kind of attack because even if you only get two or three people to bite on the hook like if you've made your money back right you can spend 100 bucks to send out a bunch of phishing emails to tens of thousands of people you get banking information for one of them and they have a hundred and one dollars in their account great you've turned a profit malware tends to come as either straight binary or base64 or wrapped in

some other language I've seen Visual Basic I've seen php.net etc people will also tend to go an extra distance when obfuscating their base64 for whatever reason talk about that and there's a lot of stuff on paceman that surprised me see 99 shells back door kits defacement kits phishing images may be hard to say and a lot of well a lot of gibberish - I pick up so many pay-per-view channels I'm really sick of it especially in Portuguese I don't understand oh and Filipino there's a large Indonesian company that are there's a large Indonesian community that uses pastebin for various stuff so example it's clear base64 I check that out who wouldn't wanna run that rapped 64 this is a

visual basic just dim this string and then run it here's that obfuscated base64 I was talking about they just substituted capital a for this I believe it's a Chinese character reverse base64 because malware authors are lazy binary what do you want it's ones and zeroes similar to base64 if it starts with 77 91 44 0 3 0 they'll decode in terment ms-dos executable now some more interesting findings it's weird to me that malware authors would actually use their own accounts to host things but they do and you'll see you like these untitled things that have been hit like several thousand times because this compromised website has been on for a long time and more examples of check you know all my base64

all my binary more stuff to decode and find out where it goes see 99 shells are hosted on pastebin a lot which I found very interesting if you don't know shells are an interface for malicious actors to use their server that they've got without having to use a command line stuff like this is very ugly but they can have a shell like this which gives them access to poke around the server they can read all the files they can grab files from the server see what's on there make files available delete them what have you other cool examples phishing images I mentioned that images are hosted on paceman because browsers are so smart that they can take base64 encoded stuff

and translate it for us because we love computers and we want to make them easier for us so if you start with data : image and then a slash and then the image type if paying JPEG whatever and then tell it the encoding structure in our case base64 your browser will turn that into pramada bank or at least this image pramana Bank probably don't know eighth largest bank in Indonesia transparent backgrounds I can I'm really confident that's used in a phishing email somewhere or a phishing website you can just paste your you're sorry you're compromised website can just reference back to the raw value of this paste and suddenly you've got this image here other phishing examples well we've

got this image Google cool you just want to you know fish people a Gmail and maybe some hidden information too one of the coolest examples I came up with is this which doesn't look like anything to human eyes but similar to encoding structures this might be steganographic hidden steganography hidden information computers can see a pattern in this if you can run it through a stego library bonus that has recently been coming up and is a talk tomorrow which I encourage you to go to people put PowerShell executions on paste bin check out that talk tomorrow because there's a bunch of people there so yeah last part of my talk that I want to talk about is

changes you can make on Pharos on yourself one of the earliest iterations of course on used keywords to search for and if your company's in the financial sector this is probably a good idea you can search for SSID data dumps logins you know people post all sorts of stuff on paste bin that they didn't get legitimately and you probably want to know about you can change the alerting system by default fierce croissant uses HipChat as its messaging system because that's what we used at work at the time but you could totally adapt it for anything else email Python has pretty good email libraries natively SMS to do Twilio they're out there they probably want to

talk to you slack and everyone uses slack you can scrape for particular users as I mentioned you know some malware authors will just keep posting stuff on their own account in one notable case there is a user called like master blazer or something inane like that who hosts a bunch of crypto mining stuff on his Facebook account and then deletes them so I'm particularly interested in him and one of my variants of this scrapes for just him and you can change the size of the pace that you look for like I said Facebook looks for anything about 40k but there's a rat that run has been thrown up a lot on pastebin recently that's like 31 point 12 K and it's

annoying because I haven't fixed fc2 Sartori yet so that's about it actually um any questions you know sometimes 20 10 minutes 11 minutes go yes

so c99 shells are just like via shells that back to our users who've compromised the server will have so that people who don't want to rummage around in a shell system that they don't know or rather a command-line interface they don't know can still impact the server plenty of low-grade malicious actors are just clicking on interfaces like this to make changes without having to know how does the UNIX file system work like how do I make dear how do I delete stuff how do I am yeah script kiddies for sure and this goes online with like fishing kits and rootkits and botnets being like low-cost you can rent these services out to people so you know I don't want to

learn how to RM dash RF my entire directory it'll teach someone how to do it through this shell stuff like that one of the things I should mention that I want to improve on for Hassan second edition is the ability to report itself and to do uploads into specified sandboxes so if you've got just a sandbox account you can have FC just automatically upload it and then download the reports virustotal direct grade if you have access to it that sort of thing good question other questions yes fierce passant natively grabs all of the mated metadata about paste so it will grab the size the user the contents of the paste if it has an expiration date and then it includes

based off of what we have seen its own defined value called an encoding type which we're like is this ascii hex base64 is it an image is at PHP or is it something some gibberish that human mind has to look at later so natively it will grab it and also uses by default as its database so because it's easy and really doesn't care what you throw into it it's just like yeah I'll take it as long as it's the same structure so yeah we we do grab all the metadata about a paste and if you play around with it later and you grab a paste that has an expediter expiry expiry of zero and it means it's never

set to dumb to delete but expiration dates are also set in epoch time so when one thing I'd one thing a user did when I was teaching this at an in conference talking about scraping paste bin for obviously it's tough is he took the X X prime value minus the current time to check if pastes were set to delete so that he could like save them in a special place to go over them later

unfortunately Pistons reporting function is on the web interface you can't do it through an API or any call I would be cool to alert paceman themselves about automated stuff like hey this is a rat that's just surfing but they don't have really good mechanisms for that which is kind of why FC exists as a tool to give individual users the option to do what they want to with it either block the domain in our case at Cisco umbrella maybe proactively go out and report I'm sure you could use selenium to just like hey I found this bad paste I've got the key I'm gonna go and click on the airport button and get people in you

know get bad guys in trouble as Captain Planet says the power is yours and do do what you'd like to I know that at least one banking company that that saw this presentation was like and we're totally going to use this to report all the the phishing images BC yeah

yes paceman has a and much to our consolation one of the earliest versions of fierce croissant made requests of each individual pastes and like very quickly choked paceman to death and remember like marrow is dubbing we've since tuned fierce croissant to be better about scraping and paste bins fact on scraping is actually really good they recommend scraping once every minute the latest hundred pastes so FC does that by default now one of the unfortunate things about paceman is that you have to pay for a paid account in order to scrape the service so it's not too expensive it's 40 to 50 bucks and it's a really good treasure trove of information but I have contributed way

too much money placement and service of this project so yeah that's something I've learned but rate limits were definitely a problem and I as someone who did not know what coding was really when I got said was like why does it keep timing out I don't understand and then I do it curl to paceman API they would tell me stop it like just slow down say yeah rate limits where problem know to do for later that's a good idea and I think one of the cool things that the keynote highlighted is that various perspectives will give you a lot more information a lot cool ideas so I would totally use that Mamie and suggest it to the rest of

my team it's particularly good in our use case because one of the things we want to try to do is automate fishing blocks and so like if you're doing you know fishing images that are just hosted on pastebin and we can find that we'll be like yeah like that immediately

that's true paceman does do Black Friday sales they do do deals all the time you can say on top of it and I really recommend it just as either hobbyist or professional there's lots of stuff there you'll never know what you'll find [Music]

to be honest I have not a lot of the images seem to be just like people using paceman is a cheap version of Dropbox like people will put images for like Facebook fishes and then also their own personally used for the stuff I've seen a lot of animate portfolios it's been weird but I also have seen like less images than I thought I would expect over you know we we've caught like fifty seven thousand pastes but I imagine the number of images like we found through this structure are probably less than 200 to be honest yeah absolutely I mean people could could upload Excel spreadsheets that have macros uploaded and like coming into all those problems

so there's a wealth of stuff to find I'm sure I'm sure it has been there one of the things about pastebin bees for malware's it's been this way since at least 2012 people have used pastebin in this way there was a talk at RSA like 2012 or 2013 saying like this is happening be aware of it in the last question oh yeah open source up on github I'll give you the you get github calm slash chaotic Requiem because I was in ng 90s teen slash fierce croissant and feel free to add me on Twitter or email me and stuff like that last shout out actually is a Twitter account I followed because of this project called

scum BOTS so at scum BOTS which does a lot of the same thing but it uses Twitter to broadcast like hey here's a malicious paste and that's how I found out about the ascii encoding and the powershell executive all's cool so thank you so much for your time and yeah have a great conference

you