← All talks

Cross-platform Compatibility: Bringing InfoSec Skills into the World of Computational Biology

BSides Las Vegas31:27100 viewsPublished 2016-08Watch on YouTube ↗
About this talk
Cross-platform Compatibility: Bringing InfoSec Skills into the World of Computational Biology - Candice Schumann, Rock Stevens Common Ground BSidesLV 2016 - Tuscany Hotel - Aug 03, 2016
Show transcript [en]

without further ado our speakers today are Candace human and Rock Stevens who are grad students in the university of maryland college park so guys take it away thank you very much sincerely want to thank everyone for coming out to the last talk of the day it means a lot I know there's a lot of drinking between now and the pool party and dinner or not what not so I really appreciate it so the bottom line up front you know what we hope you all can take away is everyone in the audience right now can you use the skills knowledge and expertise that you have you currently have to make an impact on the world in

the news you see lots of things about bug bounty programs you know all the new 0 days are coming out getting a million dollars to the latest iOS exploit but at the end of the day as a security practitioner what if you can use your skills to save someone's life and you know this we're gonna give some novel examples of you know how you can use your skills right now but you keep that in the back of your mind and maybe something that something you learned can inspire you to carry on and make a difference later on so all right so please ask questions at any point because I know biology can be a little

scary um but anyways so what exactly is computational biology well it all started when people started sequencing the genome right I'm sure some of you have heard about the human genome projects and first you know it took quite a long time to sequence a single genome and now you can do it in the span of maybe one or two days or a week and that means that we can sequence a lot of different things like not just humans or animals but bacteria and plants and there's thousands and thousands of genomes out there that have been sequenced and that we want to do analysis on and you know it's all well and good to do lab testing in a wet lab

with biologists but that costs money and takes a lot of time where is computers they're relatively cheap and they're pretty fast so running biology experiments in silico on computers really really helps out the biologists so an example would be some researchers we're looking at corn and there was an invasive species that came into corn crops and started filling out all of the chlorine and this was a major problem because corn is a big source of food so what they wanted to do was kill out the invasive species while still having the corn available and not kill the corn right so they did this using systems biology and basically what systems biology does is it looks at the

metabolism so again when I said that you started sequencing a genome so you have a genome and that translates into jeans and then you also have transcriptomics which is the expression levels of genes so that basically means like if you have blue eyes you have that gene highly expressed and metabolism the genes that we're looking at our genes that transcribe proteins which means they do physical chemical reactions inside of your cell and those reactions turn into metabolism which is which creates various pathways right they're kind of like logic gates and an extra added thing onto the logic gates is you have rate of flow right rate of flow of a protein right over a reaction so what

these people did was they took the metabolic path or metabolic network of corn and the metabolic network of the invasive species and they tried to knock down genes so lower the gene expression of various genes to see what would kill the select the plant of the in facial species and not kill the corn so they basically created a genetically modified corn that was able to withstand the crop killer and that's why we still eat corn today for some of it so with that example you that's hacking at a molecular level right you're exploiting something changing the way it is adapting its structure so that you can achieve an outcome that's hacking so what we're talking about are some

examples of things that you may have in your daily lives that sort of correlate to the research that we did together so this beautiful gift right here comes from the National Institute of Health and what you're looking at are the number of databases that are publicly available with computational biology information there's hundreds some of these are federally funded programs they're freely available and some of them are you know graduate students postdocs other researchers that just made the data publicly available for a lot of these instances the data is just publicly available and unmaintained and we'll come back to why this is an issue later but it for us the media concern was let's say I have a pet name for

something if you don't know me very well you probably don't know why I referenced this pet name frequently and you might have a different pet name for your dog it's vastly different but when we start talking about our dogs together I have no idea what you're talking about you have no idea what I'm talking about and this leads to disparate data sets where a lot of these data sets are talking about the exact same thing but use uncommon terminology so that when you start to aggregate all this raw data available there's no bridge between them so for many of you that have ever dealt ever dealt with intrusion response intrusion detection analysis this is a

common problem for you right you know you're looking at a piece of malware that's on a host you want to know you know all of the hard disk forensics information you want to port that to a centralized repository for analysis but they do I start pulling the over the wire data what happened on the server logs how did this intrusion enter the network and you know start building a larger picture of you know adversaries techniques tactics procedures for you know what happened within your damn network we want to do the exact same thing except looking at a disease looking at a pathogen something that we want to target so with that we want to all this freely available data you can

now ingest using these sort of hacker methodologies to make something that's easily queryable if you're familiar with elk the elastic set of elastic search log stash and cabana it just makes large datasets easily queryable and that's exciting for us because the old way that we had to do it was actually building our own custom web scrapers a lot of these databases you know you just have an XML file of semi-structured data or the actual entire database file that you can ingest but a lot of them just a website and so there's data that we needed office pretty particular website we're having to build a manual crawler for every single website to pull what we need using a methodology like this saved

us a lot of time so if you have bash scripting experience python scripting experience very similar when you want to actually go in and hack a cell so what we're going to do is do a real quick hands-on look at what I'm talking about but before then what we're going to do in this little hands-on is trimethoprim it's an antibiotic drug that's used to treat UTI unary tract infections that's caused by E coli so we have a drug trimethoprim a pathogen ecoli and we're because we know this antibiotic kills e.coli in computational biology we expect it to say yes this thing is now dead when you treat the cell with a drug that's what we're gonna do right now so literally

this is all it is in Python we're leveraging a system cobra pi it's based off of open Cobra it's a suite of machine learning and competition biology tools that's available from the University of California San Diego and importing that two lines code and then the third line you'll see on here is us actually loading the metabolic model into memory if you remember the extremely ugly diagram Candice talk to you about earlier on a lot of things going on with the metabolic network this file is all that represented in digitized format all of the reactions are the logic gates every gene at how they interact with one another is represented here in this file so we have

two drugs trimethoprim and halothane we know that trimethoprim should kill the model and halothane not really sure what's gonna happen that just another drug so with when you have a cell and you're focusing on it's a metabolic network inherently as all cells want to thrive and reproduce spread the seed reproductive be fruitful and in this particular instance we're assuming that all cells want to have optimal biomass growth they want to achieve the largest spread of that seed so what we are what we're doing in this metabolic network is targeting the biomass growth and want to find what drugs kill it effectively denial servicing the cell so we're to run the real quick script and see how when we apply to different

drugs using knock downs which are adding additional constraints to the model saying that you know given these targets for electron methoprene we have two different targets be 28 27 and B 0048 in the absence of these genes being expressed within the model what happens so according the results trimethoprim results in zero biomass growth which means it's dead but halothane has it takes it from the original level of 0 point seven three down to point 138 when you have drugs that don't completely kill a pathogen it's important to note that you're actually breeding superbugs at this point because you're killing the weak ones and the sort of strong ones survive and they breed so the ones that

are laughing at your job essentially are the ones there now the superbugs that you cannot treat with the same drug again so that will come into play later on in our research that we talked about so that's how you use some of the methodologies that you may use in the accommodate workplace for computational biology purposes but there's also a surprising lack of knowledge that you guys take for granted that just do not exist in the world computational biology for example this is a scrape screenshot of a website this particular website allows you to query various data and underlying database but does not sanitized or tokenize the input so this was a website that's bit older I'm

assuming has never been audited for security and why would something like this matter and it's not like a high-value production system is freely available data right well so a lot of this data is being used as the foundation of a lot of research so when you have you know a couple grants going towards fundamental research and you're unsure as if you know some adversaries hosting malware or is corrupting the file in there that's a problem moving forward right and just the fact that we actually found one website that allows you to run raw sequel queries through a form and let you do whatever it wanted whatever you wanted without any repercussions that was a huge concern

for me as a security practitioner doing biology research so if any of you are curious about where to find the metal bach models for various pathogen species whatever this is the website big now you go in there and download various things that you may be interested in but you'll also note that there's something missing from the options that you download things from there's no integrity check there's no checksum there so there's nothing to verify that this thing hasn't been tainted hosted server side or corrupted while downloading so the foundation of your research has zero integrity coming off of the wire moving forward which brings all of the research you produce into question because you can't validate the validity of the

underlying data structures which is a huge problem that people just are considering right now so another problem is that a lot of biology stuff is uniquely identifiable right so my genome is uniquely identified from rocks genome but also our genomes are pretty similar right you can just take out a little bit and it'll still be the human genome and you won't know who to but some research right now is looking at the microbiome which is all of the bacteria that lives inside of your gut or your mouth or wherever on your body your skin this is extremely unique so my microbiome is completely different from any of yours and in the future with this lead towards

precision metal medicine you're going to have your genome and your microbiome on file at your doctor's office and if anyone were to get ahold of this that is completely uniquely identifiable you can tell exactly who you're looking at so and as of right now a lot of this data is not being taken as sensitive because there aren't you know ways to weaponize exploit this sort of data presently but because people aren't looking forward at the privacy and security considerations of this data resting unencrypted on someone's hard drive that's a problem that we're trying to bring light on so yeah yep our research looks at bacteria that have become resistant to drugs so superbugs right just a PSA this happens when you

do not finish your course of antibiotics so if you don't finish your course of antibiotics you're creating superbugs which is bad for everyone so finish your course of antibiotics is what I'm saying but some people don't and there are bacteria that have been have become resistant to a lot of antibiotics so the pretty much the solution to this would be to create new drugs right but this is really really expensive and takes a really long time you know you have to go actually find the genes that it took that would kill it which takes time then you have to test it in the lab and then you have to go through and tested on animals make sure it doesn't feel

anything and then finally you get to human testing and the FDA has to approve it and this just takes a really long time and its really expensive and major drug companies just don't want to do this right because the problem with antibiotics is that bacteria are eventually going to become resistant to it so put they put all of this money into it it lasts for a little bit it's not an expensive drug so they don't get much money for it and then it stops working right so big drug companies are not creating new drugs for antibiotics so our solution is to look at current drugs right there are hundreds of drugs out there that we

know the side effects of and we know their toxicity levels and they're available so what we wanted to do was to look at current drugs that target human genes and then find the orthologues in bacteria so orthologues are basically genes that have the same function in two different species and this happened at a speciation event when we're talking about evolution right so basically if we if it targets a gene in humans it should target the same functioning gene in the bacteria and what we can do is repurpose these drugs so say if we have a drug for a cold or you know some sort of arthritis drug and we say oh here we can use this as a new antibiotic right that

will take less time less money all right so what we did was we started pulling together a lot of those databases those disparate data sets and we started building bridges together trying to find any way to stitch these data sets together to make them work for us so what we ended up with was an enormous list of existing and experimental drugs and we have a human genome we have the genome of e.coli and then we started building bridges towards understanding what drugs would work against ecoli for this particular example and what we ended up making was a fully modular set of code that would target any pathogen so tuberculosis strep staph infections any sort of model that you load into

there with the similar data you can now use our methodology to check what drugs would be viable against this target so the first thing that we did was we knew trimethoprim worked and so we we looked for that in our data set and we found it were cool and we also came up with several other viable candidates that would be used to treat resistant eco i but as you're starting seeing a lot of these results pop up they're targeting the same genes be 28 27 you'll see as a common thread as what drugs are commenting are targeting what gene within e-coli so given that trimethoprim this is one of its targets and it's now resistant to this particular drug

targeting the same exact the same exact gene is madness right because you're going to get the same exact effect so what we wanted to do was we ended up coming up with a list of 12 candidate drugs that would be viable alternatives to traditional antibiotics but we also wanted to know are there any other combinations of drugs that would work and so essentially we took the same exact list and started performing double knock downs of their targets seeing what would actually work our research showed that there are 0 viable alternatives for e.coli given this methodology of double drug knockdowns eliminating this list there were some drugs that had a dampening effect on the metabolic rate

which means yes in a viable alternative is going to create super bugs and that's exactly what we didn't want so this is our candidate list and some of the things that these drugs were used for largely we're treating cancer patients chemotherapy there were some other experimental drugs in there one for rheumatoid arthritis multiple school of sclerosis what if you think about this if you have a particular disease that your body is resistant to and it's now a matter of life and death you're going to be more willing to use these chemotherapy style drugs to save the patient and here's some life advice for you whatever you're doing you want to save that patient right so if you

have a particular patient on the deathbed and you have something with known toxicity levels you know the side effects you believe that you can save the patient with this alternative drug despite how immunosuppressive it is the impact on the patient if it can bring that person back to life spoiler alert it's probably something you don't want to take so moving forward we've talked about a couple ways that you can use skill sets you have right now to help find alternative drugs to resistant diseases well then the day you need the understand the fundamental understanding of biology to make this sort of thing happen one of these candidate drugs is actually a disinfectant so a traditional

way of inducing you know treating internal issues with a disinfectant is drinking a bottle disinfectant which in turn would probably kill the cool i right and the patient breaking the previous life advice rule so while we were able to use our various methodologies and competition biology to find viable alternatives some of them simply will not work because of the mode of treatment so if any of you are interested in learning more about the fundamentals of biology there are some ways that you can do that if you're like me and you need structure in your life there are several extremely well polished free courses available MIT opencourseware has the fundamentals of computational biology if any of you are

familiar with Coursera or EDX they also have amazingly well polished products available for free that'd take you through an entire semester's course at your own pace if you like a little bit less structure and you like CTFs or coding competitions there's websites like Rosalind that will be very familiar with a project euler other websites like that that teach you mathematics this is another example like that that you need to solve computational biology problems it's not going to tell you how you go out and find it you build the scripts that do it and it's more of a hands-on approach to learning so in recap we've gone through some of the methodologies that I was able to use from my emphasis background

to help our research and computational biology and I just also thank the people who made this trip possible for us and we're also open to any questions that you might have right now about our research please

sure

oh yeah I mean a lot of the funding comes from the federal government so there is a lot of regulation that goes on as far as some mad scientist weaponize in it no and with all the data freely available publicly then there's nothing to say that you can come up with it on your own and do your own private tests so yeah also you have to get a biologist to agree to do the actual experiments so because you know it's all well and good to do things in silico but it's not perfect you still need the wet lab tests when you've narrowed down your search yeah all the metabolic models and the data sets that we have they're all

imperfect it's a representation of our best guess at what's going on through its right i mean science right but it also is a best guess at what's going on in a particular moment given a reaction going on please

yep

what stop hmm

right so that's the in silico process of that is being done you know you going through with embedded block model you can enumerate every single gene and you can perform single and double knockdown of these genes to find single or double pairs that result in lethality if you want to kill something and this synthetic lethality is a well discovered arming a well-researched area and doing that in reverse to find new or to make new drugs that target that is being done but as we mentioned earlier it's a very lengthy process very expensive process and if you have something available that's already approved right now our pitch is why not use it yeah absolutely yep and then you know with big data are

with a big farm it's always the return on investment at that particular point so

I for me particularly know I've been doing if a sex since I was like a little kid so it's something I'm passionate about but we actually took a class together on metabolic networks and that's what sort of kicked this research forward yep combined our skills I guess for ya she works in the competition of biology lab full time and I'm a security researcher at Maryland so yeah it was fun working together and I think we have some cool things and hopefully get a publication out of it as well so please

yeah I think I my recommendation would be doing some of the fundamentals that we talked about the biology getting back into I know you have extensive experience but computational biology even though the different libraries that are available to you or your brand new largely so doing that and then for collaboration you know like with UCSD in Maryland I'm there are a lot of huge labs that are always looking for collaborative partners and you know with all the freely available data you know if you set up your own experiments and you're like hey I think I might have this interesting binding and sharing it published publishing it you know you could have other biological researchers you don't when they pick that up right I

mean there's a lot of papers that are coming out right now that combine these two fields so I mean even reading them and then say oh you know maybe I can change something like this oh that's a there's tons of stuff out there there ebbie is like it's really low hanging fruit that you can yeah so the particular example of corn we know the individual that was asked to help with it and he was offered several millions of dollars to help do that project and he didn't because you want to pursue other things but that project went forward and now they have you know crop resistant core our weed killer resistant corn essentially you can guess what kind

of company would have invested in that but yeah there's money to be had in it with like she said low-hanging fruit

sure

yeah I mean so yeah I mean there's there's this a big effort to create these communities all over the place so I know some people that are doing that right now or attempting to so like their their community spaces that are being created for this kind of collaboration yeah so if you want to contact us we can try to find more precise answers for you other than saying yes it's being done cool any other questions I thank you all for coming I really appreciate

[ feedback ]