
Please welcome home. [aplausos] Thank you so much. I am from Spain. Sorry for my English. It's my first time um I going to present indexing the chaos. To be clear, I am not a ransware operator. I am one of the good guys that only want helping the people to protect their cells under data. Ok first the ransonware problem evolution sign 2022 before 2022 classic ransomware only and demand pay but sence 2022 they use double extorsion that not only incre the files they steal and demand run on both public humiliation plus financial damage. public internal emails, customer leads, financial records, internal database, documents and confidential documents. My project overview for beginning I seen a crowler to bridge.
House and you can see the we need the possibility to check when a ranson where leak is available to download and download when is available. But not only the crowler crowling ransonware leaks, theyl traditional bridges, info stealer and leads. And the tool hafabinranson.com you can check your piv your pii has been linked by ransong gang. You can search by email, name, form, country, domain, search by driver license, passport, more, runware specific leaks and privacy results. Science 2022, runware involv beyond cryption because nobody else so I did. Why is the different as you can know half pound the haset focus on website bridges SQL database dumps Jom structure database when a hack hacker hack a website they dump in structure
or in the other way on the other way info steeler tools like spy cloud Hanson R hanson rock focus on malware death if When you download a cracket version from Photoshop or play or play games, you can download the malware that do that install the password, browser data, credit card info and wallet credentials. But my tool focus on ranson where leaks that the the gun public all databases deep messi and and all not only pdfs database email database all and this is my tool entry so the system architecture as i say before we have a crowler that crowling the ranson league. when is available download to parallel download by network and pass into extract they can do with file
classification to normal documents or advance advance document is a when you take a passport email passport driver l so on like This is an advan we use OCR plus LLM. We extract the info from passport, images, curriculums. Ok. Technical type for fish we can we can design a a crowler. The crowler not only crowling ransonware leaks, they are doing with dark telegram, ranson league channel, info stealer, bridges, traditional bridges, a leads all in one crowler. Ok the first issue is a capture because all rans got also rans website half a capture but the 19% have a basic captures that you can use scripting ghub to bypass but the other the other 10% use advance catchups that you can do
with a company like Cap Monster Cloud but is so expensive in on the other way you can use um you can put manually and if the connection doesn't low you can cring until the connection lows to or step us the don't worry don't worry as you can know t is slow by design MB per second. So you can up how many T circuit that you can. For example if you if you have 600 megabit per second you need between 50 and 100 t circuits to bypass the to get the full download that you have. On the other way we need control errors in that into download for for for example the ranson website not always available
sometimes get down and you don't you don't need you if you are don't like 4 teraby if you are commencing when the website down it's not worth it you need tool to control the errors while downloaded. This is why I use get. This is control error bu retri logic and resume interrumped ok in the simple extraction we can use we can do with first organizes two delete errors and three use into a pdfs names or database to extract names email forms or something like that as you can see in the PDF we can extract the email and the email context because not not worth it is only the email you need the email context to
know more things in the other PDFs and the database like Jon CQL SQL light 3 or something like that plus into in email database like box or like that this worth it to ok in the advance spy strikes we use OCR plus AI with the TET with yolo driver len passport SSN and faces into PDF images and so on a municipian to extract the information organizate without errors minimum errors. CPM as you can see we can extract the we don't care if the language the idiom if they are driver lent passport or whatever you want they extract and ordinate the information as you and the idiom as you can see the in Chinese people and in Chinese ord the in Chinese the
name the birth date and the ID number in the bad photo that are so much resolution they can good also ok the last issue indexing into a database we have problem with languages different encoding duplication and you need script to search f into a massive Oke. Elastic elastic is like my girlfriend they languages, control encoding, delete duplication and ser into a massive data. To final result, this a clean structure normalist jason peridentity. As you can see in the website, you can view for example.com the identity leaked by emails. For example, you can see 1000 identities leaked when Medusa had Arandel Corp from USA. Enterprise like pound I have all datab database indexed in real study real sample
for honeywell by rans ranson group club the original size are 200 gab the original pdfs 1000 the original database 2000 and when pass the scpt P you got the size only 4 GB of PII 600 PDFs and 1000 database and results around 100,000 full names 183,000 emails and 1000 ID cards and you get from from where as you can see the Honyell is a Indian company and you can correspondia you can see the company is real hacking for for massive data Philippine H insur by Medusa original size 4.4 4 TB 7000 files and 10,000 files database when pass descrete the only with PII is 53 gab 3000 PDFs and 3000 database. Alri the 53 million full names,
300,000 emails and 4000 ID cards. As you can see de as you can see the deal hacking Philippines real insur company withinity. Oh sorry. We with life demo. Life demo. Tesla with NSA.
As you know, NSA have 300 identities leaked when have televerde. You can see 200 identities when had cryptod by dito secret and so on. But if you more data you have to pay me, [risas] you have to pay me and use the dashboard. Sorry. Give me a second.
For example, Tesla.com.
You can view the email, the domain and the email context and from where bridge information. This email corresponded where in this bridge and phone in this in this archive inside the leak inside the league in this archive Tesla order Teslajon have this email if you can see and so on to check if your email with NSA G as I said before you can say the email the email context and in the leag that you can download free here in this archive have this email and this information if you want to download ok with the on the other way with the ID cards.
For example, in China the number is so so long because so much Chinese people. Let me see that is correct.
So much number.
You can put ID number and search the name, the ID number, the authority, the gender is female. You cannot the face but female and the name as you can see is similar and where bridge you can phone but not only Chinese we can do with and sorry we can do with not a good resolution you can see as you can see this is a bad photo that no good resolution but mini CPM extract the info correct and ordinate you can see S H 26 798 and you can see the name this the country and document California the authority date you can see the same besides oh sorry
with a another language like Arabic 4 93 F 18 you see the name that not a typical Latin the country the of be authority it's amazing and and the where the bridge leak Okay. Following this presentation
the future. Oh, privacy the final boss. I am not going to be a I only that one helping the people not onexing and to damage the people. I only want to helping because I only metadata if you don't pay if you pay we can say more info we can do a victim support but never become a doxing engine the future darky.eo as the crowler capture the inf steeler leads trans leak traditional bridge i want to union in aling i you can search the ransong le lead you can check if you mail has been leaked into a info stealer traditional leads on ransong where leaks alling one unify in the besting platform like As you can see the name, email,
username domain because in the info stealer there are two carpets that is a very important tata folder. This is a where is the Telegram data and nobody have nobody have a description to get the Telegram account ID and phone number that is in the carpet but not only in this carpet you can build the his ID account and phone number there is the all chats and all ids of his chats the ID numbers and I made a decp to decp T data tata folder and you can see the for example ID icon and get the name or email from info stealer but besides there are another carpet crypto wallet descriptum the crypto wallet I made a script that take the
carpet and the cryption all crypto wallet from his inf stealer bitcoin ethereum all it's amazing Because I want the future I want final tool the dark monitor. But what is dark monitor? Dark monitor is the art of pivoting. You can for example you can start into a telegram ID or wallet address. I'm pivoting into a email with infostaler. Pivoting into a name like a traditional bridger. pivoting into a car, passport or something else into a ransomware leak and with his face monitoring in all public can to where the to find where is the people in the planet. It's not amazing that you want Ethereum wallet or telegram ID. F where is the people in the planet
because you you can pivoting into stealer run somewhere or something else to get to get his face and monitoring all public cams this is my final tool final if you are investor or half a oversiz crypto wallet I am not saying no But if no don't worry it is so much work I am alone and it's so much work if you like and helping I am open. Ok sorry I know all right finish.
All right. Do we have any questions?
So the process of extracting PII from PDFs from just kind of the whole body of corporate documents. Are you familiar with the idea of e-discovery in legal senses? So in court cases um a lot of times you need to go find where is all this information in this chunk or and then if a breach happens and you know data was leaked it might be a body of emails which you don't have to do anything with but if there was PII in them you need to um disclose uh or do notification and so if you considered The benefit of taking the parsing pipeline that you've created and currently have pointed at leaks and making it available
to companies to point internally either for e-discovery or for data parsing um after a breach to find PII that has to be disclosed. This is a public data. You can you can see and download from this. I I only I only dec
the the you can if in this link you store netw and you can download free so you have a tool that you're pointing at public data. Have you considered what would happen if you let companies point it at their internal data? Yeah but I only show you metata because all all the people have the right to know if your data have been leaked. You have a legit interest to know if the data I only show you where is the data. I not public the data. This is
anymore?
Hi. Hi. A great talk. Thank you for this. Um, similar to the first question, um, this reminds me a lot of what's in the book, uh, hacks, leaks, and revelations. And I'm wondering if this is the same sort of technology that they describe in that book. And similar to what the first person was talking about, have you thought about making your tools available for people who are looking at, you know, sort of leaked data sets uh other than ransom ransomware or data sets that have been um you know ransomed? Does that make sense? Yeah, I know. But this is similar like high pound you have in the line across if you are bad or good. Don't know I I
only want to help to pays you tool like they h you not like you ste the internal they h you to use on but the internal the inter the internal data of the of of the company is public I only show you but for example from Dis leak, sorry, this leak global you can find
in this link, you can download for free and you can view all the data. You ask your question. I I don't only show you. I only the your personal information, right? But the the thing that you're adding is, you know, you're parsing through all that data and providing tools to search through it. And that's that's really pretty neat. Yeah. Thank you. Thank you.
Any others? Any more? Ok. All right. Yep. Could you talk a little bit more about what the Yolo model and the mini CPM tools are doing? Like how have you divided the analysis and the structuring for those tools? No, I use Yolo to detect faces. I am train to detect faces passport and driving license to detect filter h if to minisipan to extract the info. Yeah. To extract the info. Yeah. And this is this is more fast because mini are so slowly but if you can use yolo, you can use more fast. Ok. And so for text is that just mini CPM? Yeah, mini only mini CPM. Thanks. Uh, I believe you requested a sacrificial goats
and a fair cage. Sorry. No, blanket and a goat. [risas] Thank you. Thank you so much. M.