
so welcome to my talk thank you everyone for actually coming I really appreciate it now while you were all having coffee having a croissant living your best lives I did a little bit of a social experiment on my side now you're probably wondering why I have all of this Cutlery and possibly in the next 10 years I could have broken the gdpr law now that sounds really crazy but I essentially have four people's DNA on the seat and the idea of it is maybe in the next 10 to 15 years this could be someone's key this could be a password to someone's entire life it's a scary thought but this is where we're heading and the idea is is this where we want to
head in the future so I'd like to thank the four people for their DNA I'll be taking bit back with me to South Africa um thank you no I'm joking I w't so let's look at DNA cryptography and what I'll be talking about so the talk will focus on two main areas which is using DNA cryptography for the purposes of data storage so if you're working in a data center or you're really worried about the environment this is where that comes in and the next part is basically trying to understand what DNA cryp cryptography actually is what is this really cool word that everyone keeps talking about in the cryptography industry and then going into the idea of using
DNA cryptography for our own personal use so the idea that maybe one day I'll have my own DNA ke that I'll be able to encrypt and use for any and everything right so let's have a look firstly at why I even decided to do this talk the first part is because I hate myself and the second part is because I realized one thing which is that computational power will evolve right the computer we have today will not be the the computer we have 10 years from now meaning the algorithms we have now are not going to be suitable in the next 10 or so years as it is we used to have md5 and we
thought md5 was amazing and now it's not sorry so now unfortunately we have to rely on the math that we currently have in place and unfortunately as rule mentioned we have what's called Quantum Computing the idea that at one at some point computers are get are going to get to the point where we are not going to be able to beat them and so we've come up with a few ways of possibly beating Quantum Computing the first is Electic cryptography which is the art of cryptography by basically Division and the other is quantum cryptography which is messy and then DNA cryptography which is why I'm here hi so elliptic crypto cryptography and Quantum cryptography have issues they're unfortunately prone
to person in the- Middle attacks and we've also seen cases where they're actually prone to Dos attacks so it wouldn't be an ideal obviously we can use them but there will always be that fear there there will always be that blur at the bottom questioning whether or not we should have done this in the first place so cryptography what actually is it so just in basic terms for you there is a huge difference between encryption which is secure and encoding which is not so think of basic C4 versus aes256 BAS 64 is not secure you can go on to Google Now and you can get your base 64 in clear text it'll be absolutely beautiful as is
a little bit more complicated it involves a specific algorithm now cryptography is also a bit like computers based on our Assumption of the computational hardness of how something actually is and we also have to remember that cryptography isn't a new thing it's been around for ages the earliest existence of it was during the Greeks so say for instance NAA over here was attacking me and I needed to go to my comrade over here then I would have a piece of parchment with a specific substitution Cipher or a ceser cipher and I would give it to my comrade over here and my comrade would then be able to relay that information and this is how the Greeks did it it's very cool but
obviously it wouldn't work today but just to give you an idea cryptography has been in research for a really long time so DNA how do we actually understand what DNA is so DNA is basically a long chain of molecules and they're what's called base pairs and molecule types so if you look at your DNA right now it's all composed of adenine thiamine graine and cyanine or atgc now the way I like to think of it is that a base pair needs to syn up so G will never be equal to T but a will always equal to T and G will always equal to C so the rhyme a t g c now what's really cool about this is that we
figured out that if we use DNA cryptography what we could actually do is we could take your typical DNA sequence and we could then convert it into binary so adenine would be equal to a binary of 0 0 pamine would be equal to 0 of 01 and so forth now if you want to think of DNA and the structure of it think of it a bit like a zip on your jumper all of these things are meshing together and that's where the at and GC comes in is that mesh or that string that links up now the really really really cool thing about DNA we're all composed of it but what's even better about it is that we can store a shitload
of stuff in it so DNA is basically an information carrier it has a large storage space because it basically holds all of the information about us in it so information about our race information about our ancestors over a million years ago it's all in that little DNA string now what we realized is that memory cards and ships they're simply not going to be sustainable for the next 5 years so what if we could create a form of storage that wouldn't ever need to be updated and if we think of these large DNA centers like if you go to Iceland and if you're lucky enough to see those DNA those data centers there they take up a lot of space they also take up a
lot of power and in terms of the environment they take a lot of electricity and consumption and overhead to actually even deal with those so what if we could make this more en enally friendly and what if we could cut down on our storage this is where DNA cryptography comes in cuz what we realized is that in a single molecule of DNA we could roughly store over 108 terabyt of data in it which is super cool so what we would do is we'd create a synthetic DNA string and from there we would then store data in inside of it through the atgc and then from there we would convert it so we would first have
it as a bite of 01 02 Etc and then from from there we would convert it into the atgc that I showed you before now it's really great it's really cool but the only problem is that it takes a really long time to do this kind of thing so the reason why for instance we haven't looked at DNA cryptography in focus is because of the length of it so if you look at your cryptography methods like de triple Dees Blowfish it doesn't take that long to do the encryption and the decryption but if we were to do DNA cryptography it takes a hell of a lot time and the reason it takes so long is
because we're thinking of it as bites versus strings so when you're doing encryption and decryption we do it in terms of bites but when when we're doing DNA cryptography we're doing it in terms of DNA strings and that's where it gets a little bit more complicated and where it takes a whole of a lot more time and this is where it can be a possible Pitfall because unfortunately we're human beings we're not particularly patient people I don't know if you've ever tried to drive in the traffic you would know this so how does does DNA cryptography actually work and how have the early researchers done it so the majority of the research is very new there isn't a
lot on it the earliest paper I've seen comes from 2017 so it's still a very new realm of research and there's still a lot of questions around it so unfortunately in terms of research we've kind of got to use what we got so the University of India and the University of mumbay were the first guys to kind of look at this and say how can we do this now the way they did it was through What's called the Java implementation with an SDK so what they did now this is not obviously a secure way of doing it because it makes use of a substitution Cipher where you would have your atgc which you would then use as your
substitution so say for instance I have a PDF or in the um explanation over here you have a a plain text message saying test then the raw DNA message would be something like t c a c g c and that would be the DNA so that was how we did the substitution Cipher with the Java implementation and what we used was a Java a bio Java implementation with their apis and the way we did it was through a get byes method which then would return an 8bit asky string for every um character that we wanted to then encrypt very cool but unfortunately this implementation works but it's not necessarily secure because obviously if you know what the substitution alphabet
is you can do that decryption so it's not ideal but it's cool now let's think about DNA symmetric cryptography and they also did this through the Java implementation cuz what's really cool is that the Bier Java architecture already exists it's used a lot in the biotechnological field when they're doing all sorts of like research app development anything they'll use the bio Java framework along with their symbolic alphabetic API so what they've done here is they'll use a three-step process to do it where they'll create the key gen then they'll encrypt and then they'll decrypt and they'll usually do this with either an OTP or a substitution Cipher like we've seen before now this is very
early in the research stage it took a very long time to do and unfortunately when we did this we had to have access to public libraries to do it meaning for instance if you are involved in any kind of medical um what you called so say for instance you got cancer and you wanted to be a part of a medical trial you would then offer up information about yourself and that information could get stored in a public DNA library now what we do as researchers or those in the biotechnological field is then we would at some points request access to those public DNA libraries in South Africa for instance because of the Cradle of humankind we can actually access public
DNA records of that nature which we then used to do which we then used to do the research it was very hard to get access to that there was a lot of planning and because of Poppy and gdpr regulations it was very difficult to get access to those libraries which is also what makes this kind of research particularly difficult to do because unfortunately no one wants to offer up their DNA willingly as you can see by the cups and the plates I have over here I'm sure those four people aren't exactly impressed with what I've done and I don't think you'd be particularly impressed if I took all of your DNA in theory to do my
research so the other so the way in which we did the DNA cryptography using the OTP is that one key was used for encryption and decryption and we tested it with de rc2 and 3 de where we took plain Text data we then combined it with a public DNA library and then from there we added what's called a random key or pad through a s operation it was very cool and it worked it just took a hell of a lot of time and many computers and hearts were broken in the process now we then have a more secure version because what researchers wanted to do is they wanted to remove interaction between the user and key
authorities CU what often happens with asymmetric cryptography is we have the impersonation of key authorities and this is where attacks happen so we looked at asymmetric cryptography using RSA and what we found was that it worked but it was very computationally intensive it took us a long time to actually do this and so we thought that it wouldn't be an option for us in the long term just because of the speed at which it took now what are the PFS for DNA storage and DNA cryptography as a whole well there's a couple so DNA storage firstly is incredibly expensive it's an incredibly expensive form of research to do um it costs anywhere in the millions to do in South Africa and
what's worse is that there is a high risk of DNA contamination so earlier I was speaking to a colleague and they asked me well Taylor if we use a synthetic DNA string and we put it in a data center What would would happen I was like well it would be stored he was like okay great but what if Amazon had data in a Data Center and Google had data in a data center how would we ensure that contamination doesn't happen well we don't have an answer for that it's a huge issue like DNA contamination will be a huge issue the other thing is that there would be possible data leakage and so you would have gdr gdpr
on steroids it would be great and the other issue is that there's lacking research around it so I'm one of maybe a few people that are looking at this research unfortunately and that's also because you need people who are experienced not only in biotechnology but also in cryptography and computer science so you need people that are able to blend three different Industries together and what's worse is and this is where it gets really hard you need specialist skills and specialist Laboratories for it now thankfully there are people that are doing it so for instance the military in Tel Aviv are actually using DNA cryptography for all of their storage and there are certain big tech companies that I will not name
that are also using this and working on this for their internal storage it's very cool if you want to know who they are you can ask me afterwards um and then at present we also know that the US military is also using it to do a lot of their encryption for very secret confidential files which is awesome or not depending on how you feel about them now the biggest issue with DNA cryptography and research is that no one seems to be able to agree so you have this you have an issue where unfortunately you have SN over here and a men over here and you want to use RSA and you want to use 3 days and none of
us can kind of come to a solution there's no streamline agreement as to how DNA cryptography will work or how we should be going about doing it and there just isn't enough money that's being put into it which is why I decided to do this talk mainly because if we can get more companies on board who are interested in doing this research we can actually speed up the timeline and create DNA cryptography that can be used for normal everyday users now we get into my favorite part of the talk which is also the more sci-fi part of the talk so put on your your Spock ears be prepared to be teleported Scotty is about to take you on a journey so
what happens if we for instance wanted to use DNA cryptography for personal use what if I wanted to use my personal DNA to encrypt something on say I don't know one of those little keys and that would be my key for life I would never have to use a password again that would be amazing well it's a bit hard it's a bit difficult what we found firstly is that for instance if you are having cancer treatment or if you are exposed to any kind of disease your DNA will change so theoretically what could happen is say for instance we created a personalized DNA key for you on Tuesday great Tuesday then on Friday Taylor gets HIV and now her DNA
has changed so in theory her key would change so we actually have to figure out a way of creating a personalized DNA key that would use a part of the nucleus that goes unchanged regardless of disease or exposure the other issue is if you have cancer your nucleus will change so you're so you'd go from having a healthy nucleus to one that is mutated so essentially if we have personalized DNA encryption we would then have to question whether or not we'd be leaving those who have diseases or going through treatments at risk would they then have to go back to the old way of doing things where they have to remember 50 passwords I certainly don't want to have
to do that and the other thing is identical twins so thankfully I don't have a twin CU there's already enough of me in the world but if the thing is that identical twins have the same DNA makeup because they come from the same father same mother especially if they came from the same egg Etc so what for instance if we had something like an evil twin attack but in the version of like people where there's Taylor and there's Tanya and Tanya decides that Taylor is like they're going to recing each other right so that's also something we're trying to understand is how do we create non-identical keys for twins using the same DNA how do we do that now
the answer is that some nucleuses in DN in twins can be different but that's really labor intensive and the other thing is that it's still really expensive to do this research and this kind of technology so there are big it companies that I won't name that are currently doing this research because what they're trying to do is they want to be at the precipice of DNA cryptography cuz they kind of want to be the first ones to say hey by the way SN you no longer have to ever use passwords ever again you can now use your DNA or can you so then we get into the bigger problem which is the ethics do we
even want to do this so if I take you back to Cambridge analytica back those days was great everyone had their data stolen we all questioned our lives now how would we feel about a data breach involving our own DNA where our own DNA sequence could be stolen granted DNA sequences are a lot harder to read than like your password and your email from your grandmother in the' 70s but my point Remains the Same and the other point is that our DNA is everywhere look at the cups and the plates and the glasses that I put over here your DNA is everywhere but the question is would I be able to say take a piece of my hair
I'm going ow would I be able to take a piece of my here or someone else is here and use it to decrypt their data that's a huge question and a part of research that we're still trying to understand and figure out and hopefully we will in the future okay so there was a lot of reading that involved in this talk a lot of questioning of life decisions as I said before but now I'm going to open the floor up to a few questions and yeah thank you for listening to my Babel I appreciate it
uhoh here we go hey there thank you for the talk it was awesome uh so I have a question now uh you've obviously showcased how easy it is to get someone's DNA now to solve this problem could you potentially use the DNA in combination with some other biological values to build some form of biological passport that wouldn't be as easy to steal so that's a really great question so what we did in the early phases of research is we looked at like DNA from a public library and we took one DNA string and then what we did is we would then add an algorithm to it on top or we would add like assault to it
as well and that kind of helped with in terms of creating something that looked a bit like a b iCal passport unfortunately we're still in the very early stages of This research so I can't give you a yes or no answer to your question but we're hoping in the next 3 to 5 years that we'll have answers like that for you but for now what we've realized is if we use something like an algorithm that we have now and add it to the current encryption that we've done it could be a solution thank you you're welcome hello uh thank you for your talk it was very interesting um I'm grateful I'm thankful I have remarks and
questions but I try to so first the thing is with with having your DNA and passport password is a bit problematic because every 10 years because telmer got cut off you don't have the same DNA so I'm not sure this would really work also you could forget any Witness Protection Program because you can't fake your DNA unless you can totally rewrite it but my actual question is are there actual symmetric or asymmetric cryptographic algorithms in DNA or do you just just map our cryptography and encode in DNA then because this looked a bit like that at the moment yeah so like I said before it's still in the very early phases of research we're still kind of beginning
to understand it so what I will say is at the moment what we've done is we take say for instance I have sorry what's your name Dennis Dennis cool so say for instance I take Dennis's DNA I'm just going to use it as an example if that's okay okay cool so I'm going to take thanks Dennis so I'm going to take Dennis's DNA and then from there what I'm going to do is I'm going to add a public DNA Library which could house probably over 500,000 people on top of it and then from that's why the encryption and decryption takes so long and then from there we then add what's called an algorithm to it so we'll use
like an as or a DE to it um at the moment yes it is very rudimentary it's not exactly like super cool and a smart way of doing it but we're getting there the point is that like if we don't try we won't know and in terms of your earlier remarks in terms of like witness protection and that sort of thing this is probably why I've also decided to do this talk is because if we're aware of these issues now and we're aware of these pitfalls and challenges and social questions then we possibly have an answer we'll be able to re react to those issues as they happen it's no use saying oh well this is going to be a
problem and walking away from it if we actually talk about it now we can come up with the Solutions in the future if that makes sense thanks alternatively sorry alternatively what we could do is because we already have public DNA libraries an alternative and this is just me like spitballing here we could assign someone in a witness protection program with a synthetic DNA key for the purposes of their growing life but something that has no match to an actual human because what we did with DNA storage is we created synthetic DNA strings based off of the current strings in process I'll talk to you I'll talk to you Dennis don't worry that's what coffe is
for thank you for this talk brilliant topic and um very informal presentation and very well exposed thank you for that you're I have a few very few a few questions but I will keep it very short you mentioned that um we need to think in strings and the example that you reported uses three pairs in order to encode a string three pairs with four characters actually make 64 I wonder if you would use four pairs you will get 256 variation which exactly a bite why not go there so I don't have that answer for you cuz I'm still thinking um but what I can tell you let me just go back one please hold your call is important to us
we will assign you to your next relevant agent there we go right you are now one in the queue congratulations youve made this pits off so as you can see here so what happened here is the substitution Cipher mhm right um so we knew regardless of the depth or breadth of that algorithm that it just wouldn't be secure so whether we add more or less it doesn't make a difference because a as it stands a substitution Cipher it's a bit like I mean it's it's a table like oh this is fine um I I fully understand that for the substitution part I'm saying that he he you said in the slides before I we have to think in strings and
I think maybe two sles before I don't remember no sorry I'll get there we go so we have to think in strings which makes sense because it's a sequence and I fully understand I'm saying that four of those pairs couldn't code a bite one to one so in principle having a data set could be actually mapped in any four sequence of four and so that's why uh I not sure that because also in the data we have bite sequence right and in principle those could be mapped by using four pairs yes so we could definitely use the four pairs um so what I quite like about DNA is that it won't be your typical t a d GC TC like no DNA sequence
is the same so for instance and that's what kind of makes us unique and so whether we encode it with two pairs four pairs that doesn't really factor into it it's based of our different DNA strings as they stand so for instance my sequence will look a lot different from your sequence so when we add extra base pairs to it we could do that and that's what we've done with the public DNA Library key uh libraries where we actually added those libraries to our already existing DNA that we had on site okay so basically if I understand correctly you're saying that we cannot add an arbitrary sequence to what we have
well maybe um after the maybe during lunch or something I'll talk to you about it okay sorry thanks um I have a question related to the storage aspect what are the error rates or in general what are the durability statistics of DNA based storage so as I said before we've had many issues with doing DNA storage in terms of DNA contamination DNA leakage and then just overflow because obviously when we did that initial research we didn't have the computational power so we were doing it all three crack machines when we actually did this specific synopsis and what we realized is that we didn't have the computational power to actually do it retroactively so it would take us a probably a few years
to get to the point where we' be able to do it and a lot of time and research and money to do it but we already saw issues in terms of DNA contamination and leakage in terms of like normal data um I'm happy to talk to you more in detail if you'd like if I didn't properly answer your question I do apologize um okay am I done now um I I guess then the rest of the question will come to you in your coffee breaks a d thank you very much