
I'm Thomas PR and um this talk essentially is all about open source intelligence um I will prefix this with I'm not an expert in open source intelligence I only started looking into this about a year ago um so I'm kind of novice in that that regard but this is kind of my experience in what I kind of found top tips I tended to to use and everyone does a sort of who am I slide well this is an open source intelligence uh talk so you're probably find out a lot of information about me as we go along just from the sort of stuff that comes on so I'm not going to bother filling this slide in um I'll let you
fill that in in your own time disclaimer as usual um anything I present here is nothing to do with my employer for the most part um and I have no affiliation with any tools or you know the kind of things I'm showing use them at your own risk sort of the usual type stuff um folks on the back of the room there's there's Fair few chairs kicking around please you come in don't sort of hover at the back it's nicer to sit down for this kind of thing rather than uh I guess kind of hovering and kind of having to stand there's quite a few at the front as
well so what is open source intelligence gathering um so generally it's any information that can be found um that is kind of Legally obtainable is kind of very much passive so you're looking on the internet you're looking at social networks you're looking at blogs you're looking at gener anything you can find that just kind of exists there on the public internet or in a public space doesn't necessarily have to be in the internet um and obviously this is you know I'm applying it to improve my kind of online presence but it's often used like I guess as part of reconnaissance during the sort of attx stages it's very much that information gathering so you can apply it not just to an individual
such as myself you can apply it to organizations and that kind of thing as well and that's where it kind of has a lot of value so the challenge I gave myself was use ENT use kind of whatever techniques you can find find out or learn about use it against me see with just a name of myself and profile picture what can I find out and I guess the challenge there is to assume no prior knowledge about myself which is quite difficult one um but that's kind of the challenge and how I approached it and generally May being well why why do this um so you know anyone as I said can use ENT um it could be malicious
individuals it could be um essentially anyone that wants to understand about the data that's out there about you whether it's you that wants to understand that or someone else it can be part of an attack um and you know this sort of information can be used in a variety of ways typically what you'll tend to think of is those kind of targeted fishing attacks if you know more details about the user you can make that fishing attack more believable um if you know what accounts they're using what passwords they have then you can start to do account takeover if you're in that malicious mindset and kind of towards the more extreme endpoints you have kind of stalking where you know
people get your dresses they'll literally you know kind of make your life hell and swatting which is possibly the worst thing that can really happen in this area that I know of well there's probably worst things but it's it's awful thing to happen uh for those that don't know what swatting is that's where someone online will take offense at you for some random reason and they will then essentially find your address and use that to call in like a bomb threat or like a gun threat and the police will bang on your door well they knock the door down a full armed Squad will come in and kind of pin you to the ground you
know that is a really horrendous thing to happen to anyone and the amount of you know the amount of effort it takes for a malicious individual to do that is quite low using these kind of O Tech techniques so you know understanding this understanding what is out there about yourself is really quite useful to try and reduce that down and stop the the chance of that kind of thing ever being a possibility as you're going through this journey of kind of exploring what's out there about about yourself note taking is probably one of the most fundamental things that you kind of have to be in a good uh good technique doing because um you know you're going to go
down lots of rabbit holes you're going to go down lots of different Avenues and you want to make sure that when you dive down into one Rabbit Hole one Avenue and you go okay there's nothing more to explore in this you don't miss the other 12 different places you could have gone at that point and so I found multigo which is a tool that's often used in this area uh I didn't use any of it Advanced functions like automatic um kind of pinging query data sets and stuff like that I just used it purely as a graphing uh database to go along with my written notes and that produced quite a nice uh kind of graph at the end to
show my kind of complete impact exposure of all the different data sources and and things that I've scattered around the internet and with most things you kind of want to just methodically explore every Avenue it's going to be a lot of manual work it's going to be a lot of manual browsing there's no like automatic tool you can just point that will do all the work it will help you but you know the best thing is literally just get in those web browsers get in that and look for the the information and kind of dive in you're looking for alternative profiles you're looking for alternative aliases looking for email addresses all that kind of good stuff
because you can then use those details to Pivot and find other things also source code hidden content metadata that's just in there is always really good as well so where did you start if you want to do this well with most things search engines are probably where you want to start um that's where I started myself um the thing to know here is that every search engine will typically have different database on it so it will produce different results so search across everything search you know search engine wise you're going to get different results in different places and so if you only search in Google you're going to miss some pages that might provide some critical data that
reveals a lot more about you or or kind of expands that graph so search everything see what links pop out see what applies I found myself fairly easily on a couple of search engines others less so and then just generally anything you do find start diving into it have a look you'll find a personal blog if you look for me it's quite outdated but again there's lots of crap on there right that you can then start to explore and look into as you're starting going down this journey you'll often find more profile images you'll find more images and this is where reverse search in the search engines again can come in useful because people will often reuse profile images
across multiple services so if you found it on one then you can plunk it in and um essentially it might come up with some other links on articles that didn't come up in your original searches so it's another place you can kind of pivot and find more information in this case it it popped up with some more articles around kind of something I'd presenting in the past for a different a previous job um so again another Avenue to explore some more details about myself that you would find as I said a lot of this is going to be manual but that doesn't mean you can't get tools in to help you out with some of it um so photon is a tool that
uh is kind of just a crawler that you can give it a website and that'll be quite useful uh along with your manual if if you find something like a personal blog or something that is kind of self-contained about that person um that will produce stuff like any intel so by Intel it means like if it finds an email address or something it thinks are secret it'll try and put that in a list for you um all also importantly it'll kind of produce a list of all the files it finds which will become useful later um but it just gives you a load of lists that you can then use and then kind of move forward with that right so you've
done a bit of prising you found a few pages you find a few sites um where did you go next you know you've kind of covered the the guess the surface of some of that stuff and file metadata can be a really interesting Avenue to explore explore in particular image metadata now a lot of services that you put images in they might Strip This metadata not everything does it's very hit and miss it's not a standard across every service so if you go on to a site and you find some images pull them down you know store them have a look see what metadata is in there you'll see in here that we've got some GPS
coordinates and so what can you do you drove those straight into Google and it'll take you to wherever that photo has taken if the photo is perhaps showing the inside of a building maybe that's indicating it might be someone's address and these can be very like accurate right at this point I put it in and the address of one of my old front doors popped up of a place I used to live and you know it's like okay that's in that image I need to now scrub that image right and so you don't really think about it you don't think about this this metadata a lot of the time when you upload it you just kind of
Chuck the IM is up there but it can reveal very important and very useful details about yourself and kind of as an aside um kind of just on the sheer power of metadata to kind of get people thinking there's an open source tool called photo prism that works a bit like as an offline image viewer it has some machine learning built into it I ched my entire catalog of images into that machine and it produced a lovely map of everywhere I'd been from the GPS coordinates and all those images It produced a kind of clustering of people so here you'll see me in various ages I guess it had absolutely no issue linking me from my current state to my old state
it didn't know I was Thomas priest but it knows all these images of the same person and that's kind of scary in some respects because you know we upload this image these these things to Facebook we upload them to social media and yeah if they do strip the metadata out most people can't use that but that's still a lot of metadata you're giving Facebook you know they probably clustered around all those images and gone right you you live here you live here you work here because it can see those images it can see the GPS coordinates um so that's kind of just a bit of an aside to kind of get you thinking just like on the the
larger quantity of images the larger scale how much metadata and extra information is is there just kind of under the hood but obviously it's not all about images so if you find PDFs documents uh executables anything else it all has metadata in it so another tool I found quite useful if you're on Windows is this tool and that will essentially uh you give it a domain and it will go off and try and pull any files it can find from that domain and then run some metadata analysis on it to see if there's anything in those document files obviously there's other tools you can run for each individual type of file but again a quick and easy
way of doing some of it that leads me on to the next stage so I've been a developer for uh many years before I moved into security as part of that I used a lot of git uh for those that are not familiar with Git is essentially a software service a place you can put source code for your software that kind of tracks commits and sort of stuff like that um in there again we have the idea of metadata so whenever you make a change to the code with in Git it will form what is called a commit and under that commit um you'll have your email address this email address might be completely different
from the email address you have attached to that online service like in GitHub um it's just something you manually enter on your machine that's prone to getting wrong a lot of the time and what you can find is you can run these automated tools over just you basically point it a profile it'll then look go through and pull all of your source code and it'll look through all the commits and pull all the offers and the email addresses out there what I found is that as soon as I pulled that I got alternative email addresses that I don't use anymore I got an acuk address from me that is uh either I guess would imply that I either
worked at a university or that I've uh gone to particular University they also pulled out uh essentially BBC email addresses which would possibly imply I may have worked for the BBC at some point so again a lot of leakage um this is kind of you know from an initial point of view of just searching my name you kind of got to this point um so if you do have git repos again it's worth diving into it because we tend to just shove all sorts of crap in there um you will might find secrets you might find and all sorts of stuff in there so gets a great a great place to find this kind of thing as as I said you know it's not
just about the metadata the content itself maybe there's a file in there that I've I've put some data in there that might link somewhere else as well and another thing to be aware of as we were talking about with the file metadata it all exists you know if you upload an image into git you've uploaded that image git doesn't filter that it doesn't change it so again all of that does it have any GPS coordinates does it have any metadata does it have any metadata that says I took it on a particular phone a particular brand of phones is that going to tell you whether I prefer Apple or Android is that going to tell you whether I own that
particular device and you may think well what it doesn't matter if you know that I own like a pixel well it might not do but you could potentially use that in fishing you could potentially use that in your kind of further confirming that this account is related to me because it wants to verify via pixel 6 you know that's the kind of things you'll start building up those kind of breadcrumbs another area that's kind of interesting and kind of uh slightly scary actually is kind of account recovery and enumeration and so what you can do on a lot of services you go forget password and it it will want you to put in some bit of data that might be
a phone number it might be your uh Gmail email for that um account but what it will do is it'll like well I've got this other email address on record that's a recovery email or I've got this phone number that's another recovery email in the case of Google it will kind of star out some of that stuff but you know if you have done some other oent and you found well this user could possibly be one of these five email addresses you can start to use that account recovery to confirm yes these these two email addresses I've collected are definitely linked this is definitely the same person because the account recovery is showing me a partial
email that matches with the data I've got some other places like Facebook if you put in a phone number um it will give you the full email address of that user um equally if you put in an email address it'll give you the full alternative email address of that user um so again that could be great just to pull information out the other thing that is kind of interesting is kind of account numeration so um usernames that you pick on a lot of services will be often public and they also can often not be changed once you've set up that account and so you can use this and this tool such as like what's my name put in a
essentially a username or an alias in there and it will run off to a load of different kind of social networks and and websites and Services I think there's about just under 600 in there at the moment and it will say does that account exist on that service and so that can give you a really quick way of basically looking over a load of accounts that may be related to this person obviously if you're doing it for yourself you're going to know whether that account is yours or not if you're doing this against someone else then you're going to have to do a lot of extra verification techniques to definitely confirm that the account is
owned by the person you're kind of investigating um but it it brings to the mind like why do we use consistent usernames a lot of the time what benefit does that actually have particularly considering that with passwords we're often sticking them in password managers so we're kind of using those anyway to log in we might as well stick a random username in there as well um so if you want to kind of reduce the linkage of your data different accounts random usernames can be very helpful this brings us on to social media as the next kind of pivot that I tended to move into and this is quite an interesting one I don't tend to use that
much in the way of social media if you do find in my accounts they tend to be long mostly abandoned for the most part um these can be absolute gold mines of information in a lot of cases so Facebook Twitter Instagram Reddit you can start pulling breadcrumbs pieces of information together and and kind of build up a bigger picture and reveal more interesting more sensitive data from that what I would say is your forgotten accounts too are probably even more variable um in my case I had a Facebook account I don't use it I use messenger only for the most part when I stopped using the main part of Facebook many years ago I thought I'd left it in
a fairly secure fairly privacy friendly point I went on it recently went back to it my friends list was public there was a load of posts that were public there that stuff and I think that's kind of the privacy settings change so if you have an account you're not going to use for a long while delete it it's probably the best course but equally in the case of like Facebook and that sort of thing if your friends list is public then you someone can go on there they can find you on Facebook they can look at your friends list and then they can start to find your family and friends and as I'm sure is the case with a lot of you your
family and friends are possibly not as privacy conscious as you might be so you know if you can find that person's family and friends can you also find a load of information they've shared about you on there and that's often I mean I managed to do that I found very easily from a public profile I found myself on Facebook I then found my uh essentially relatives and siblings and from there I could gain a load more information because that linkage existed in Facebook and I had allowed it to be public so again check your settings and think about that educate where you can but obviously you know it's often up up uphill cell um so reduce as much damage
there as you can really and as it keep saying pieing together the breadcrumbs it's quite an interesting thing in O because a lot of the time you may consider one particular piece of information you share as being quite unimportant unimportant not that sensitive not really that interesting but if you share enough individual bits you may start to be able to bring it together and this is quite interesting from kind of an image point of view and like here we have what a picture of a front door and on Facebook maybe they've said we're in the area edale well if I go on Google Maps and look at edale there's a few roads on there how long is it going to
take me to go in Google Street View go along those roads and find a fairly unique looking door like that granted there's a number on there as well so it makes it even easier but you know if you're just sharing a photo outside your house that could still be used to track down your exact address because that data kind of lives in Google maps and so you know something that's fair innocuous I live in this town something that's also sort of Fairly innocuous this is you know a picture of me in front of my house tie together produces something very sensitive like my exact address right and that's kind of the the approach you need to take with all of
this stuff is why mapping everything is very useful because you will come across a load of information that together will will be a lot more sensitive than on its own okay the next thing that I tended to dive into was domains um so as you've seen from on the previous slides I own a a top level uh well not top level I own my domain that basically I don't run and prep onto a lot of the time um as part of that registration for that domain you have to enter a name address email and phone number if no one's ever if you've not done it before um typically that data is publicly viewable um there has
always been this author ability to have a privacy service where if you opt in your your details will be replaced by kind of redacted for privacy or that kind of thing right um typically nowadays you'll find that those kind of services are usually on by default and free they never used to be um I think gdpr kind of changed that or pushed it towards the kind of better but you know at the moment if you look at my D domain you'll get all that kind of information however there's historical data there's databases that keep hold of all of this old data um you'll often find a lot of like sites you may have to pay for to access this data I found one
called um Wy that seems to work completely for free at this point um so that's possibly one to use if you're you're thinking of of putting stuff in there but again you can put a domain name in there it'll bring back details such as the name the address of that person when they registered it the phone number email address and you again can use that information it's like okay well that email address registered that domain let's put that email address back in the database what else also registered that that uh via that email address and you can then start to bring up more domains and obviously this data only really applies for Stuff PRI 2017 16 something like that um most of
the stuff nowadays tends to be protected but there's still a huge hunk of historical data for those of you that have those domains that kind of exist or existed before then but it's not just um kind of those sort of things that you can do in domains domains have a load of Records under them so name servers um which is kind of often you know used to point to your your hosting provider in this case you can understand which hosting provider I am you may think well big again right it's a bit of information but if you find that hosting provider um that might might be able to fish them directly and say okay this hosting provider you need to log in
you now can provide them a portal you can steal the password get on and completely trash that website so that's again a useful piece of information even though it's kind of just out there in the open txt records are also really useful on domains um because these are often used for verifying that you own that domain in a different service so if you look up Google here you'll see that the txt records imply that they're using alassian products so they might be using jro they might be using Confluence maybe bit booket they also seem to be using some form of Doc your sign Facebook so they're obviously on some kind of Facebook console that kind of thing so
you can get a whole heap of information from a lot of domains just by asking what's the txt records you might be able to understand what services they use on the hood Microsoft obviously is in there so maybe they're using Microsoft 365 if you do it onto my domain you'll get Google and you'll also get photo proton mail so maybe okay maybe this person uses proton mail maybe they don't um but they have a verification record there so again another piece of information that just exists certificate transparency logs for anyone that's not familiar with that whenever you register a https certificate nowadays it has to go into a log that is publicly searchable great for enumerating
subdomains you can go on there and basically like give me everything under Thomas pr.com and it' be like here's all the the certificates you've registered in the past how many you know year or something and obviously if you've minted them for subdomains that might provide you even more attack surface more places to look under the main domain that may not be findable via their search engines so again it's a great place and if you kind of want to automate this sort of thing then mass is a great tool um it searches not just certificate transparency logs but his search is sort of a way back machine like historical archives and a loot of different apis um
so it can really kind of give you a huge list of domains that you can then potentially look at further and that's kind of uh the domains covered there's a lot more you could dive into there but um the next thing I kind of wanted to look into that didn't necessarily directly apply to me but it's useful uh for anyone that lives in the UK it's kind of the UK based data sources so company's house is one that is essentially fully public so if you've ever registered a company your details will be on there with the name of the company you'll be able to search for that person and the company it includes dat of birth which again is quite
sensitive and it'll also usually have the correspond address which may be a company address but if it's a small company then it's probably just going to be the home address of that person this is publicly searchable it exists there's no way to wipe this for the most part similarly open electoral register if you've not opted out of it that will appear in kind of directories like 192 um you will need to kind of correlate that with where you think they might be located to really get dive down but you'll be able to dive into it and it'll give you the full address of that person so again more information that kind of exists and another one that's kind of
Novel or interesting is if you own a house and you put in a planning application for whatever then what you'll need to do is you'll need to fill in a form like this and that will have your name on it it'll have uh the agent's name along with the addresses and phone numbers and that kind of details well this information is public right you if you want to look up a planning application you can quite happily get that and each UK planning portal separated by councils tends to be fairly similar you can usually go on there if you know say the person lives in the London area you'd be like okay go on there go on the relevant planning
permission uh portal you'd search the name for the person here's all the list of applications you go on there and be like okay this person lives at this address because they've put in that application um they're really bad at redacting this information so there kind of Novel little source of information if you definitely know the person owns a house and has done something like extensions or that kind of thing so this leads us on to kind of archive data and this is another kind of gold mine um so way back machine for those aren't familiar are kind of this nonprofit that will archive have been archiv in the internet since 1996 um but it's a great thing to go on
you put in any site and it you can basically say it will say why have captures for this in 2012 2015 etc etc and you can scrub backwards and forwards and look through those sites why am I saying that well if you put a site on the internet many years ago and perhaps you didn't do the you know perhaps it was crap or there was a load of extra details on there that you realize crap I need to remove those at some point well if you're unlucky and way back machine has captured that that is still publicly available via this interface um so that's kind of really useful to know about as a thing because you can always
scrub back to older versions of sites maybe find other details extra details stuff that's been removed since that's no longer on the live internet Next Step password breaches so I guess by the time you got to this point you've probably got a huge array of different email addresses the person used you've probably got a huge array of addresses maybe You' got some phone numbers um you've got a load of different information about some of the services they might be using um you can also get some more information about what other services they're using so have I been honed which I hopefully a load of people will know um you can stick your email address in
there and it will say okay you've been in all these different data breaches so this is great as a quick like okay what stuff has been breached with all these email addresses you found about yourself um you can go on step further and you can actually pivot on the content of those data breaches so there's a few sites out there that will host essentially the databases in full the W redact the data um so I use that to essentially look up my email address and on that it would then give me full passwords in some cases sometimes it give me hash passwords um it would give me names it would give me ip addresses um and kind of with IP addresses emails
and that password you can then pivot on that and be like okay give me all the other stuff that is from the same IP address at that point and you can then again see more data breaches it might bring you to different email addresses it might bring you to different data and if you're really looky you might have uh something like this dieser data breach that for some reason has data birth it's a music streaming service why does it want your data birth I don't know luckily I was sensible enough when I set this up that I gave a complete fake dat of birth because I was like why does a music service need it um but you may not be as
unlucky right so you might find this just as public data that exists and kind of moving on from that like a malicious entity is not going to stop as just public passive information they're going to log into your accounts they're going to see what extra you know if they log into your Amazon account maybe they can dump your credit card details that sort of thing right um so I think the ultimate thing there is like try and map out exactly what sites and services you have used you know for 30 years of using the internet I have scattered accounts across the far for corners of the internet I've created loads of accounts everywhere I only gu
recently became more security aware um and only recently started using more password managers so I have a password manager with a load of stuff in it but it's not going to cover a lot of the stuff I may be created 20 years ago um so it's about thinking what other stuff is out there trying to find that and if you do have a password manager have a look at the exposed passwords report in there that is very useful at least for you to clear up those high value items if you look over that list and see Amazon has a breach password in there change it right if you see that a forum that you give no details to that you
used um maybe once maybe that's less important on on that like list but equally you know don't use weak passwords and you know if you've used the same password for 20 years because you know that's what a lot of people did in the early days then um maybe that you need to fix that right so those are most of the areas that I found that revealed quite a lot of information about myself it sort of surprisingly so it came up with a lot of information I'm not sharing most of it here for public reasons and I've tried to clean a bit of it up but we'll see why that's kind of problematic um if you want to kind of do
it yourself then uh I would highly recommend this YouTube video um the 5 Hour by the side Mentor that's kind of what I started watching and it gave an excellent rundown and start into ENT um but there's other stuff on YouTube there's other blogs and articles that are well worth looking into obviously you know the stuff that I'm on is probably going to be different from the stuff other people are on so it might be um you know it might be that you want to dive more into Instagram for instance whereas I have absolutely no interest in that so it won't make sense for me to investigate that overall my tips is basically it's time really like the main barrier I tend
to find is that just putting in the time to manually scrub over these sites and kind of look over the details as I said earlier notes are vital if you don't have good notes if you come back to your your kind of Investigation half a year later and you don't remember what you investigated you're going to have to start again um and so making sure those notes there making sure you've got that graph of all your data is very useful as well to kind of try and help clean up stuff later and as I said expect many many rabbit holes you're going to go down certain avenues that just don't tend anywhere as you would expect and I've
tended to find that very few things required me to pay some money for the most part I managed to find something free um so you know it is public data in in in the the ultimately so for the most part you should be able to find it and hopefully a free service but there's maybe one or two you might have to pay for um so this brings us on to okay you found out that you've got tons of data everywhere and it's a massive problem what do you
do well this is where it gets tricky right because yes you can delete those accounts you can delete those old posts and that's definitely a good first step you can share less but that won't help your historical data but as we said before that kind of web archive stuff if it's in the web archive if it's been archived by a third party not necessarily way back machine just if that data has been archived somewhere it can be incredibly difficult to get rid of it and so really what you want to try and do is work towards removing links to that content if you can't go from one of your aliases to a different one of your aliases because
the links no longer exist they're not going to think to look that up in the Wayback machine so you know it's it's kind of the best you can do is not remove the content but just make it harder to find and I guess I promised you 30 years of data leakage and I've probably not shown you 30 years of data leakage well that's cuz I'm quite lucky in that when I grew up um when I grew up stuff like way back machine wasn't archiving as much um the stuff I contributed to the forums I was part of no longer exist they're long since in the dust and I can't find the kind of any of that data still present
from what I can see um I also deployed on Free Web Services which had very little search engine optimization and were deemed very much not important so most of them are on archived which I'm very happy about most of my data seems to kind of start from the 2012 period onwards which you know is not great but still it's it's less than 30 years and so in most cases if I was talk talking about preserving our culture through preserving the internet I would probably be saying oh my God link rot that's a massive problem um link rot being kind of just the natural tendency for the internet to kind of disappear over time for pages to disappear for links to
disappear however in our case link rot is very useful because we don't want that information on the internet and so you know stuff starts to disappear and hopefully you would hope you know over time stuff will start to kind of disappear and you stand a better chance of the information that you want kind of be removed as we get further on with web archiving you know there's more effort into this nowadays so it becomes a lot harder a lot more stuff is co copied captured copied um but it's interesting to understand that stuff like the uh Wayback machine it kind of archives you know a very shallow depth so if you've got stuff that's like in a forum post
within a forum within a forum it's probably not going to be deemed important enough to capture so you stand a good chance whereas if you have your own site that you've done search engine optimization on that may be considered more important so it may actually be in those web archives um but it's interesting to kind of look at it from that point of view of of like you may be lucky that your stuff when you were younger wasn't classed important enough to be captured and archived um but with all that said I guess the the take-home message here is you know apply these techniques to yourself see what you can find it may shock you you
may find accounts that have been dormant for 20 years that surprisingly are still there and still exist you might find a Blog that still exists that you haven't posted to for years but has something sensitive in it and you know you can use that knowledge use that graph and go well there's a link between here and here how can I sever that link how can I stop that from being a problem and generally just using that knowledge to essentially sever those links will help you you know kind of help you to not be as easily have that information found as easily um and ultimately you know as as youve seen web archiving stuff is becoming more and more common a lot more
efforts put into it a lot more stuff is duplicated nowadays because as face it hard drives and storage is fairly cheap compared to what it used to be um so I think you know we just have to be more aware of what we share nowadays than we used to be so I think going forward is probably your easiest easiest thing of like okay I'm just going to draw a line here I'm going to be better at this going forward but obviously that's a bit of a a crap point right um but hopefully this will at least you know allow you to investigate have a look at what's out there about you and it might shock you enough to actually
put the effort into maybe being more aware of what you share in the future um with that that is all I've got for you today um I guess if we got any questions um I guess we'll have to wait for the mic got a
mic
uh hello uh you can remove stuff from the Wayback machine with robots. text if you had a robots. text then Wayback machine won't include your stuff yeah so I guess um in the context of a website you can I guess discourage people from archiving that site you discourage people from browsing that site um but the you don't have to follow that right it's not mandatory for that site you know it's still publicly available that doesn't help in the case of stuff like data sources around domain registration or other data sources like companies data and and other stuff like that um but yeah you know that is definitely a good technique however you've got to balance that on kind of if
you disable crawling for your site then you'll likely going to have an issue with search engine optimization because they're not going to have any content so you're trying to balance that of how much do you want to get found maybe you want to be found you just don't want that your all your data to be found um so it can be a tricky one um I not personally disabled that on some of my sites because I generally want me to be found in some ways but not all of that data if that makes sense cool may I got a question over here I'll go she's H can you hear me okay that's Co about you have to Pi up a bit um yeah I
just wanted to ask gosh um I I just wanted to ask that um for example when it comes to deleted accounts and things like that do those still pop up like does is that how the Wayback machine would work yeah so the Wayback machine the way it works is you input uh a URL a full URL so you can't search it and go give me any pages that reference Thomas priest it doesn't work like you have to give it a full URL so if you delete that account in the first place it's got to rely on the person knowing that that specific URL that specific account has been archived at some point they then put that into Wayback machine and if
Wayback machine has archived it they will find it you've not removed that content from the way map machine you've not removed it from existing but you've made it significantly harder to find and I think that's the the strategy I would say is probably the best here more realistic if you know delete it we'll remove a lot of links to it it will make it uh harder to find um so yeah it's still there but it's harder harder to find sure hey buddy um I really appreciate the talk thank you for sharing today I no it's your very technical guy kind of guy with mathematics degrees and whatnot um what what triggered you to to find an
interest in no sense um I think it's just it was an area i' not really looked into and it's one of those that I think has a really low barrier of Entry right for the for the majority of the stuff I've shown is get a web browser out and look on the search engines so it was just something that it's like hey this is interesting I have 10 minutes spare here I have 10 minutes spare tomorrow and just kind of carried it on Kevin having a look having a bit of a research seeing what was available um and I think that's kind of what triggered it and then as I got more into it I was like what crap I didn't
realize you could find that oh my God this exists somewhere it sort of spiraled and I put more energy into it but I think it's just it's just because it's a low Barrow entry and it's generally I find it very interesting we'll go second line first and then come you uh hello so um there seems to be a growing number of like companies who offer a service where they'll like get rid of your data online um for some monthly fee um I think these tend to be more like targeted at companies that you've given your permission to have hold data about you and so they they're going to go around and try and find all
those companies and then get their dele requests and things is this something you've looked into at all um I I've never looked into it that much I'm wondering if they're effective at all or something worth paying for yeah it's not something I've looked into um I think a lot of the way they do it is kind of just Spam take down requests at a load of companies and a lot of companies go you know what we can't be bothered to fight the legal ramifications cost in this so we'll just remove it sometimes that data might um exist and you might issue a takedown request and it might embolden the company to actually go you know what stop being an
idiot like this data is public we're going to keep it there and you know if you try to take down request at the wrong site wrong service it might mean that you actually put more um you end up with more focus more attention on that because you kind of kicked up a fuss rather than just kind of letting it quietly die in a corner of the internet you're probably giving your data to another company as well to then collect it one place which might be not a great idea yeah sure um yeah I guess You' got to trust that company to not also store in a database and then sell it onwards that kind of
thing hi thanks for your talk um I was wondering if you thought that there is value in scrambling some of the data rather than removing it completely just because there's so many companies that are trying to index stuff on you and if they find things and it's missing then they'll just keep what they have if they find something and it has new data there maybe they'll overwrite in the database and that will have a better net effect that's really interesting I'm not really thought about that uh I guess my worry would be how you're scrambling it like for instance a lot of as I say before a lot of usernames tend to be static constant and unable to be changed so if
your data is still exists but it's scrambled under that username I can then go I can still find that username fairly easily and maybe then I stick it in like an archive that brings back the old version so um yeah it's very interesting I'm not not really considered that but I think the might in C circumstances it might still make it findable
hello um have you looked into face reverse image searching where a website will use AI to identify features in a face and then follow that back through its uh database of images yeah so I I did try um I don't think it was quite AI I think it was more just slightly more advanced uh not Advanced the word it had kind of captured more images and been more focused on that kind of search type thing around faces only um I didn't pop up as important enough I guess to find my face in there um I think with the reverse search from what I've seen is still pretty good what you may have to do in um if you're looking particularly
like say an image of me and it's like a picture like maybe I'm taking on holiday you might have to do some processing to remove the background because the the search engine might get confused say oh you want to see pictures of beaches not pictures of me um so I think there's definitely some processing that may be required to get the best out of the reverse search but I've not played specifically with kind of AI based ones [Music] you thank you um for things like the the government information company's house you are legally required if you're a director of a company to have your information available is there any kind of legal legislation that we could force
the government want to change this sort of situation yeah it's a tricky one right because I think there's there's definitely public interest there of knowing where the company is right if you're dealing with that particular company and they can hide behind complete privacy of like we don't have an address it doesn't exist that you know it's kind of that balance between privacy and like sort of other other things coming in I don't know if anything particular that can force them I know in the UK we generally uh have better um so if if you compare like the data sources we'll typically use in the UK around this stuff versus the data sources that might be in the US or
whatnot you'll tend to find that finding information about an individual is a lot easier in the US because they're a bit more um like those data services exist a bit more that data is more available so I think we're we're still in an okay position in the UK with like General finding people but yeah I must admit in that kind of the planning portal stuff was a bit of a shock and the the company's house stuff is obviously a fairly big problem if you want to not tie it to yourself it probably is best if you have like a company premises at least that's like one step away from your personal address but yeah I don't
know of any I'm not really a legislation person so I don't know on that one all right awesome well I'm around so if anyone has any more questions feel free to come um ask me um yeah but cheers all for your time