
Thank you. Good afternoon all. Small crowd. I'm not at all under pressure. Uh okay. So I'm here to talk about breaking DNA. How we can cook malware in the lab. Um I'm Ram. I'm here to talk about this topic. Uh so first what is DNA? DNA building block of life. The smallest unit of uh life or data that you can see in real life. So they define is think of it as code that defines what you do. Think of it as code that you run. So that the DNA is instructions that you execute. So that's your hair color or teeth color or eye color, whatever it is. So just think of DNA as code. That's what I'm trying
to do here. I'm trying to make you think DNA not just as a bunch of chemicals, but also as zeros and ones and things stuff like that. So it's a ladylike structure. It's a double helix. We all know that. I'm here to change that perspective for you. Um I'm going on. So all you need to know is about the the the the ladlike structure has rungs. Each rung is made up of four chemicals. Four bases basically. A T G C. I don't want to go into what it what they are. All you need to know is that A and T bond. G and C bond together and they form the ladders of the rungs of the ladder which go on to
decide their sequences decide go on to decide your genetic characteristics. So let me take a quick break. Let's we let's all refresh on what is data. Data is just zeros and ones. Modern technology runs on that. What I'm here to do is uh give you a new perspective. As I said try to see DNA as code. So when I'm saying data is zeros and ones why don't we think DNA as base four base 4 that is let's say they are made up of a t g and c let's think of them as 0 0 1 02 and so on so so can I use it as storage now when I'm talking about bits can I use them as storage yes of course
you can what is genes but apart from storing your genetic information that you're going to pass on to your progeny later. It's just long-term storage of genetic information for about 60 your lifetime basically 60 to 70 years. Uh
so apart from so just think of when when you can think of DNA as that just think of something that I can code my uh data into DNA. So that's basically it. So I'm going to use a base four uh base force base 4 system where a t gc again and I'm going to assume a is going to be 0 0 c is going to be 0 1 g is going to be 1 0 t is going to be 1 and 1. So that's how I'm going to proceed. Is is somebody already doing this? Yes, Microsoft is doing this from uh 2015. uh they have the uh DNA archival storage project they're running from 2015 and uh present
uses they are being used in nanotechnology to code nanobots and they and also for effective uh vaccine delivery and of course in research. So we know that we can encode data now how are we going to do it? So got to go you got to encode your data into DNA. How do you encode? As I said just assume the each basis to be assume or assume them in a binary format just for your sake. Now if this is preserved well if you finish your encoding and if you preserve the DNA sequence well well you just freeze them you have it forever. Isn't that what uh sperm freezing and egg freezing is all about. So it's it's
it's very dense. It's very very dense and it can sustain and live long for forever basically. How do you do it? So you use a synthesizer. It is basically the cooker where you cook the uh cook your data into into cook your data into DNA. So now that we now let's assume that sequencing is nothing but encoding is nothing but placing the basis that you want in the order that you like. So that's encoding. Now how you're going to sequence it? How you going to read it? So you go to a process called sequencing. What do you do? you feed a sample of data to a sequencer and uh what that sequencer is going to do is
detect it's just going to detect what are the bases that I'm detecting and it's going to put it onto a digital file. So I'm say so each base has their own unique fluorescent light. Let's say A has blue hue for example. So it's going to detect a blue hue say that it's going to be A and then it's going to place it onto a digital file. You just keep going on and on and on. You just put your sample in. It's going to detect all the bases and then it's going to give you a output file. It's going to be of the fast Q or the fast A format. So it's just a uh plain text file that is
just a sequence of bases that it has detected. Now we have read the data. Now I'm going to the complication. What if the input data? So what happens typically is so this data this fast Q file is fed to an analysis software. the analysis of what are we going to do? We're going to compare our output with a known database to see if they are matching. If this is a DNA sample of a bacteria, is this a human DNA sample or is this a viral sample? So, we're going to compare it against a known database and that is what generally happens in research and in uh that's what happens in biioinformatics research. So, you have a while of you
have a you have a sample, you compare it against known database. What I'm going to do now is I'm going to so these softwares these sequence uh bioinformatic softwares are basically coded by scientists who may not be very security oriented. So they may use un unsafe unsafe languages like C, C++ and may use uh outdated functions like str copy which may not have input validation uh parameters. So what I'm going to do is I'm going to overwhelm the software. I'm I'm going to overwhelm the uh function with a input longer than it can handle. So so assume that uh the figure that you see here is typically how a buffer looks like. A buffer is where it's a temporary
workspace where data is stored and uh it has a very finite size. So you're going to uh put your data into it. Let's say it can hold five bits for example. So you're going to put five bits in the buffer. What if I'm going to put 10 bits for example, I'm going to overwhelm the buffer with more than it can handle. What happens? And if there is no proper input validation, then what's going to happen? It's just going to overflow onto the next nearby memory spaces. So if you can see here, I've overwhelmed the oh, you don't see the sassa. Uh so you overwhelm the uh stack the buffer with more data than it can handle and then it spills onto the
next as you can see in the green zone and the very light blue zone it just spills onto them where it's not meant to be and that's where buffer overflow happens and what so excess data flows into places where it is not meant to be and what if that excess data is malicious what if I have an affairous intention what What if I have a shell code for example or what if I have a delete instruction there for example when flow of execution goes on from the top to bottom at some point this nefarious instruction is going to be executed it could be to open a shell code or it could be to delete a file
what's going to happen I don't want to say so one so how this is this is how buffer overflow works and how I'm going to adapt it to this situation is just put a fast Q file which is very long more than what the buffer can handle and just overwhelm the software. Place my malicious code in a strategic position such that it executes the moment the actual data is read and then there you have it. You have control of the system. You can uh corrupt the database or you can delete the entire database or you can have a remote shell session on. It's up to you. Would this work? Now theoretically I'm saying this would this work? Yes, it
may. Why? Because again as I said uh legacy code, legacy functions, old unsafe programming languages C, C++. And how would you verify a physical sample of DNA if it has malware in it? How do you know if this is a harmful input or a harmful batch of samples? You won't know unless you sample it, unless you sequence it. You won't know what it has unless you read through it. So there's basically no input validation possible. So and as scientists when as a scientist if an sample is coming to me I'm going to assume that it's going to be for research. I'm going to be assuming that I have to sample this for some some purpose. I have to log this on to and so
on. So I'm not going to assume that somebody's going to send me a polluted DNA sample in that sense. So theoretically it will work but then it did work. So University of Washington in 2018 um was able to do this successfully and they were able to implement the buffer overflow attack successfully. It was the first case of how it was the first case of how malicious or synthetic DNA was used to spread malware into the system. So malicious sequences were hidden within the DNA and physical detection was very high. Um so ma the malware is born inside the lab that it it manifests within the lab. You don't even know unless you sequence it. So that's the
biggest challenge. What does this mean as an attacker? I am I have the perfect opportunity here. I can be as stealthy as I can. I can be as uh because it bypasses all the known security measures. It can bypass firewalls, antivirus systems, intrusion detection systems because the threat is physical. It's not in the network. Uh now we have to shift our focus from not from from securing just the network. We also have to look into uh securing the supply chain. How are we going to do that? We have to do new SOPs, new principles, new uh laws and all those have to be done. And even if you do manage to find a malicious DNA sample, who are you going
to attribute it to? How do how does one know that I was the one who created and gave it to you? attribution is really hard. So it's not just buffer overflow. I can just corrupt the entire database. The research that going on may completely collapse or I can just change and misdiagnosises can happen. Corruption is possible. Trust is lost on a given uh healthcare company for example because and intellectual property is lost. Uh all data is leaked. So proprietary information all of them are lost. So these are the some possibilities. What I found really interesting and this is away from the cyber security space is that what if a DNA that has biological meaning let's say it codes
for a particular characteristic in us for example let's say a sample of a gene sample which codes which displays a manifest as eye color what if that in some way or form happens to be malicious to a computer what are we going to do about it we cannot control the gene that controls eye eye color. So what are we going to do about it? What about vice versa? What if I create a DNA sample that was meant to be malware, but then it found out to be a cure for cancer? What are we going to do about it? It's going to spoil the software, but then it's beneficial for humankind. So what are we going to do about it? That was my
biggest questions and I still don't have answers to that. Some ideas, how can we handle this situation? Well, use better languages, use safer languages, Rust, Go and so on. Rust, Go, etc. And then better coding practices uh input validation, memory protection techniques and then invest in uh talent, bring people who who have knowledge in interdiscipline both in biology and in cyber security. uh new laws, new standards, new SOPs, new ethics standards have to be created for this situation and uh hire more. Uh that's it about my talk. Uh about myself, I'm uh just recently graduated from the University of Birmingham just a couple of days ago and uh thank you uh with the masters in cyber security. I've
uh worked before in uh in the DevOps sphere of things as AWS and I've also worked in slightly auditing tasks and uh my dissertation was about uh forensics and I have interest in forensics cryptography. So I'm looking for opportunities now. So feel free to contact me and uh many thanks to my mentor, my friends, family and bs for organizing this. Thank you very much.