← All talks

Unmasking online commentators like a Noob

BSides TLV 202611:0015 viewsPublished 2026-03Watch on YouTube ↗
Speakers
Tags
About this talk
Using only AI tools and no manual code, a researcher demonstrates how to extract deanonymizing information about pseudonymous blog commentators through linguistic analysis and open-source intelligence. The talk reveals both the surprising sophistication and dangerous limitations of ChatGPT and similar models in identifying individuals across platforms based on writing patterns, domain knowledge, and casual biographical details.
Show original YouTube description
Various online forums allow users to post pseudonymously on a wide range of worldly matters, many tinged with deeply personal political or religious undertones. Although the posters may hide behind a veil of an obscure handle, the very elements of the discussions can provide deep insights about them including personal characterizations that could be used to identify the posters in the real world. All this is known by corporate (and government) intelligence analysis, who may use sophisticated tools and researchers to collect and collate this information. However today, thanks to widespread use of generative AI like ChatGPT, any noob can get it on the action. This talk will demonstrate this approach to reveal insights about various posters to popular, public blogs. In line with the premise of the talk, all complex code and analysis will be handled by AI: no human brain cells will be harmed ... or exercised.
Show transcript [en]

Our next uh speaker came to us from abroad. Uh thank you for uh doing the honor of speaking. Ari is an overeducated hacker wannabe who is much better at breaking things and fixing them. He spends most of his time social engineering academia for profit. We usually say no profit, right? And occasional fun. Uh that being said, Ari is a professor of electrical and computer engineering at Boston University. Thank you. >> Thank you. Thank you. It's a pleasure to be here. And thank you to Israel Rubin for the pizza. Ah, I need a clicker. Yes. All right. So, uh this talk is about an unmasking uh com online commentators like a noob. And I guarantee you no human brain cells

were harmed or exercised. So, what's the motivation? Everything you write on social media uh leaves little bits of information about you. And if it doesn't, then you're probably wasting even more of your time than you thought. And here's an example. If you write something like, I have known a lot of people with yearslong practical gun experience, ranging from combat infantry to backcountry elk hunting guides. Simple, innocuous sentence. Says a whole bunch of things about you. says you have long-term experience, social embeddedness, cultural range, domain knowledge, c uh cognitive sophistication, value orientation, emotional restraint, high verbal skills, etc. So, the goal of this uh research was to identify non-trivial information about so-called anonymous bloggers. And I

looked at seven uh seven blogs uh five of which I was successfully able to get some information about it. But but here's the key. Anyone can extract information from blogs. The job here was to do it without any human code only using AI commands and u I was allowed to execute code that the AI told me but I was its slave. I also had system access to one box. Okay. So those are the those are the ground rules. I cannot use my brain. I can only do what the AI tells me. All right. But the real goal was that I wanted to come here and see all of you and uh speak here at Bsides. This is

this is uh quite a pleasure and I enjoy it. All right. So, what were the challenges? It's actually not so easy to get information from from blogs, from websites. Uh many have non-standard comments. They have dynamic pages that interact with the user, require you to scroll, click, move things around. They have pageionation. Sometimes they have rate limits to prevent exactly what I'm doing. They have so-called GPT safety. So, for example, this picture on the right, union tied to anti-ICE riots hired child molester local chapter role. Because it has the word child molester in it, no GPT will ever give it to me because it's considered unsafe even though it's a news headline. Uh, some have payw wall, some have terms of

service. Bottom line is it's actually pretty hard to scrape for AIS. Uh, if anyone knows that Barbie, that's the math is hard Barbie that was a catastrophic failure by Mattel. They had a teenage Barbie who said math is hard uh and got people very upset. So AI is also has trouble with this. So let me just give you one case example. I don't have enough time to go through all of them, but uh I looked at the Valik conspiracy. It's a blog uh for legal scholars. Um okay. So first take try to collect some data. Okay. So I asked uh I looked at this was based on chat GPT4 because I originally made this in May but it also

I updated with chat GPT5 and Gemini and Claude and various others. So how would I scrape information from a blog? I asked and of course GPT gave me the standard response. Well, no, no, that's a bad thing to do. You shouldn't do that. And if you do do that, you know, you have to be careful about e ethics and legal. Okay, give me a pearl code that will do this. And it gives me some pearl code, but the pro code has some dependencies. And I'm a noob. I don't know anything about dependencies. So, that doesn't work. All right. So, I give it more precise instructions. Give me pro code. Make it clear what it needs to

do. Each comment needs to be marked. I want to know the author and the comment replies etc. And I even give it three several several examples of websites of of of from the blog for it to use as a as an example. Again, it gives me some code and the code doesn't work. All right, let's skip this entirely after a few hours. Give up on that. Let's try something else. Try a different approach. So, it tries hybrid node pearl etc. Let's try that. Uses puppeteer. Still doesn't work. Okay. A whole bunch of a whole bunch of steps later. You know, each time I would come back to the engine and say, "This doesn't work. Fix it. This doesn't work,

fix it." At no point did I ever engage my brain in this process. Okay, the idea was, can we do this as a noob? with enough effort eventually eventually I'm able to scrape some of these blogs and then feed them in and uh try to get some analysis. Okay. And and uh it's able to provide some pretty pretty astute analysis. I have blanked out the names of the h the handles of the particular authors but it can do political analysis left right libertarian. It can do professional analysis. What kind of profession is this person likely to be? and it gives you the text basis for that so that you can check it and you can see there's

some text there that tells you what these uh how these people came to that okay but anyone who's known the GPT knows excuse me that they make up stuff so um we need a control we need to test this against something we know the result of okay so first of all I just asked for some general observations and these are consistent with my own observations of the blog so it gots that pretty accurately ely. Okay. But then I did another thing. I took well-known personalities on the blog such as Eugene Val who's one of the main contributors. I anonymized or actually deidentified him as user 31415 in all of the blogs. Again, only using instructions from the

AI, not using any of my own scripts. And it was able to give me a whole bunch of information about him. uh formal American legal education, advanced degree, domain expertise, etc. But when I asked it for, you know, try to guess who this person is, that's forboden. No, can't do that. Okay, fine. Give me an example of the most deanonymizing text by this user. And that can do gave me some text. And then I said, all right, so he wrote some work with Randy Kennedy. Who who could have possibly written work with Randy Kennedy? And sure enough, it comes out. Okay, I tried a different person. That was a well-known personality, publicly established. Try a different person

whose name isn't quite as wellknown, though I know it. And you can see the yellow stuff is all things that were correct. So, it's able to extract an awful lot of information from this. Okay. All it takes is one mention somewhere that, you know, even my 11-year-old can walk while chewing gum. And now immediately you know this person has an 11-year-old child, right? That means the person is probably middle-aged. Okay, based on other evidence, the person is an expert on Darur conflict dynamics, UAE foreign policy etc. Okay, last control. I tried me. I'm not going to tell you what my handle is, but the yellow stuff is the stuff that it got right. It got a pretty good did a

pretty good job. Okay, then I took it to the next step. I tried to cross-correlate it with other accounts and again initially it's hesitant but if you ask it in the right way and pet it and say please uh it does a full analysis for you and the kind of things that it's able to extract are just amazing. Okay, so this particular person, for example, at one point on Twitter, I think it is, asked about a Honda S2000 spotted near Mount Juliet Road and Lebanon Road. Okay, I don't know where Mount Juliet Road is or Lebanon Road is, but it turns out there's one place where those two roads intersect in the US and ChachiBT found it out. It's in

Tennessee. And so therefore deduced that this person was somewhere from around that area. Okay. Profession etc. Age, whole bunch of information. Okay, I tried the same thing with Red Hat Israel to see what what you can get out of that. And you know, there are some interesting figures. There's this guy Am is Kai who posts every 30 to 90 minutes, 24 hours a day. Okay, so either this person doesn't sleep or it's probably a bot. There's a person who has 78 comments, 71 of which are identical. Okay, there's one one person who always praises the same Israeli party. Okay, it's possible that that's a real individual or more likely it's domestic astroturf. At any rate, these are all

kinds of these associative connections that chat GPT does very very well. And in fact, I don't even need to know the language. I tried it on a Persian language uh blog. Again, I don't know any Farsy. And you know, you can get all kinds of interesting information. political leanings anti-IRGSE for IRGSE pro-Israel anti-Israel war veteran etc. Okay. So, conclusions, AI uh features a one some of the features of AI are one, they're lazy. They infer from very few examples because of the attention, the way attention works. Okay? And it has a hard time with dynamic content. It's also sloppy. It makes mistakes. It hallucinates. It makes OS int. The thing that I really want you to get

out of this talk is you are not anonymous. No matter how much you think you're anonymous, no matter what handles you have, okay? Anytime you share anything that's meaningful, it means that you're sharing something of yourself. Sharing is forever. Thank you very much.