← All talks

A Socio-Technical Approach to Cybersecurity Systems

Bsides CT · 202034:3043 viewsPublished 2020-11Watch on YouTube ↗
Speakers
Tags
CategoryResearch
StyleTalk
About this talk
Socio-technical systems can be viewed as a linked structure of a set of entities which provide and spread information through a linked structure. A set of linked Web pages, social network of acquaintance or other connections between individuals are common samples of such systems. In this talk, we will discuss how considering both technical and social components of a complex system can help improve the proactive cyber threat intelligence applications to protect privacy of Internet users. First, we talk about the history of socio-technical systems and the recent work on their evaluation. Then, we discuss wide range of applications where considering a socio-technical approach can improve the security of a complex system. Mahdieh is an assistant professor in the department of Computer Science at Central Connecticut State University, CT, USA. She works at the intersection of cyber security and network science with the aim of improving proactive cyber threat intelligence applications in protecting privacy of Internet users. Her research particularly focuses on applied machine learning and characterization, measurements, and analytics for complex cyber-security and socio-technical systems. She received her PhD in Computer Science from Wright State University, Ohio, USA. She serves as a reviewer of prestigious journals such as Computers and Security, ACM Transactions on Intelligent Systems and Technology, and ACM Transactions on Internet Technology. She is also a reviewer and program committee member at some conferences such as the Web Conference, Web Intelligence, and Intelligent Systems.
Show transcript [en]

all right hi everybody and welcome uh back um we're gonna get started this is the blue track uh for today uh we're gonna get started with our first speaker in just a minute uh let me introduce her and her talk um she's going to be speaking on a socio-technical approach to cyber security systems um socio-technical systems can be viewed as a linked structure of set as i said structure of set of entities which provide and spread information through a linked structure a set of linked web pages social network of acquaintance or other connections between individuals or common samples uh in this talk she will discuss how considering both technical and social components of a complex system can help

improve the pro proactive cyber threat intelligence applications to protect privacy of internet users first you will talk about the history of social sociotechnical systems and then the recent work on their evaluation then she will discuss the wide range of applications where considering a socio-technical approach can improve the security of a complex system so welcome our first speaker my madison and if i butchered your name i'm very sorry i guess i owe you a beer next time so thank you and welcome maddie thank you very much and uh good morning everyone thank you again for introducing me yes let me share my screen

okay good uh thank you everyone and uh good morning thanks for having me today at uh besides conference and uh my name is matthias levi my one as introduced correctly and i'm an assistant professor of computer science at ccsu i would like to talk about a socio-technical approach to cyber security systems as the outline first we have a short description on sociotechnical systems and dark web as an example of social technical systems then we review the research conducted on dark web and we have two studies information ecosystem analysis and information leakage assessment to see how uh changing our view and considering dark web as a socio-technical system helps us gain a more holistic view about this

network uh socio-technical systems are viewed as a link structure of a set of entities which provide and spread information to the network and dark web facebook as a social network and even the communication network between employees in an organization are some examples of such systems both structure and information are two critical aspects of any socio-technical systems and because they have interaction with each other we need to consider both of them together when we want to evaluate a socio-technical system for example in facebook as we see the predefined social structure that defined in a network has direct influence on the information and the amount of value each user can gain by participating in the network on the other hand we have the type of

information shared by each user that has a direct impact on the social network the user can gain and connect with other users uh let's uh begin with the definition of dark web as we know the world wide web has three categories of uh surface web deep web and dark web surface web is a the regular part of the internet that we daily use and navigate websites using search engines the other part deep web contains the information that cannot be indexed by search engines and this data can be the private organizational data academic research in an institute or the private records of patients in a hospital a subset of dark web is called defeb is called dark

web that requires a unique application layer protocols and authorization skins to access generally speaking we have tor as the most popular dark web in our society and as we see the network contains over 6 000 volunteer relays or routers that connect to each other and provide anonymity for tor users there is also a set of tour authorities that regularly monitor and authorize relays if they are active and publish the list of the authorized relays through the network to start the communication with the server the client first needs to incrementally establishes a path of at least three relays to encrypt data and uh send it to the server here all the communications uh utilize tcp and the first relay in the path is

called entry or guard and the last one is exit tor hidden service also is a feature that added to the tour network to provide anonymity for service providers the users who want to publish and provide services on the internet and a computer that is used to spread information on tor network is called a domain and the information provided by a domain is called a hidden service and the addresses in tor network have a suffix dot onion that indicates the address belongs to tor network and the rest of the address contains 16 digits or letters that are meaningless i mean is not a real name as we see for the other addresses on surface as uh the related research on dark web

indicates the original purpose of designing dark web was to circumvent internet censorship around the world and it's a good tools to release information to public and uh buying and selling goods and services through the marketplaces and these novel users make tour an important sociotechnical system in our today life you may also heard about the negative or dark side of uh tor network good tools for drug dealing human trafficking terrorism activities or the other negative applications of this network reviewing the related work on tor network we can categorize these studies into two groups tor structure characterization which is at the beginning to be studied and tour information characterization the first group of research concentrates on some features like

sparsity mirroring and redundancy of tour domains considering the hyperlink structure of tour domains to see whether they follow bowtie structure robustness and fragility against node removal and how long these two domains and their hyperlinks can exceed on the network the second category characterizes the information hosted on tour dominance crawler extract and analyzed topics like drugs and human trafficking and suicide methods therefore the current evaluation of tor concentrates narrowly and exclusively on either structure or information while if we consider dark web as a social technical system we require to study the interaction between structure and information so uh to fill uh this gap in the current related work on dark web we want to see how by studying dark web

as a socio-technical system we can gain a more holistic view of this network in our first study information ecosystem analysis we perform a comprehensive crawler on tor network and through topic modeling and network analysis try to answer such research questions how diverse is the information and services provided on tour we like to see what is the main application of information and services provided on this network and whether there is any core service onto our network structurally core uh service that we may have onto our network also uh how siloed are the tour information sources uh answering to these questions can help us find the main application of services onto our network and understand more about the hyperlink

structure of tour domains to do so first we need to collect our data and pre-process it to start our analysis uh we found 20 000 seats initial seats from some resources on surface web some from dark web and some from related work and we fed them into four uh power crawlers running for two months and crawlering up to the dev 4. we focused only on html files to avoid uh request limiters and crawlering plugs and we found over 1 million html files to segregate the files belonging to dark web we did a type filtering based on the suffix of addresses that resulted in over 150 pages because the vast majority of content provided on tor is in english

over 80 percent we did a language detection to segregate the english content and we found over 40 000 english pages that belong to uh 1766 uh dummies in tor and finally to assign a semantically meaningful label as a topic for each domain we did topic modeling and graphics topic labeling based on dbpedia to find the topics of each domain based on the text content of their pages here you see our results 9 unique topics directory indicates services with huge list of onion addresses uh up-to-date uh information about tor services bitcoin uh belongs to services for digital cryptocurrency and transferring money to wallets uh we also have news uh very similar to personal web likes on surface web where

owner provides some opinions about different topics and readers can put follow-up comments email indicates services with communication through the network and purchasing vpns we also have multimedia services different resources like videos ebooks musics without any copyright restriction shopping indicates marketplaces on tour forum indicates services very similar to the forums we have on surface web gambling belongs to services to do online games gambling purchasing consultancy or advice on gambling and finally we have dream market a very popular marketplace on tour with different items from laptops to directly link and as we see here it's have its own topic analysis on the population of these uh topics indicate that directory dream market and shopping have the largest population

that are used to release information to public through the directory services and shopping on tour network and it may indicate the main application of english domains on tour the second category belongs to email forum and news that are used to free exchange for free exchange of ideas and we also found 10 percent of these population the english domains are for gambling services that is kind of surprising because it suggests that people today use tor to do online games as an entertainment tools we also found another surprising results here we see the small population of bitcoin it was in contrast to our initial expectation because we expected to see bitcoin as more popular with regard to the large

population of marketplaces on tour where uh users generally use bitcoin as a method of payment and this result may indicate that uh tor users do not trust um dark bitcoin services on tour also to understand more about the hyperlink structure of tor domains we considered each domain as a node and a hyperlink between domains as connected edges and we found this network as we see in the table this network is a sparse only over 5500 edges for over 1700 notes and very low density value close to zero we also find 25 weekly connected components where the largest component contains almost 50 percent of notes and this is in contrast to what we have in other socio-technical systems like

the surface web where almost all nodes participate in one massive connected component and this suggests that the hyperlink structure of tor domains discourage linking to each other which helps them remain isolated and difficult to observe we also uh consider the intra connectivity analysis of tour domains to see if they support each other if they prefer to link and refer their own customer to other domains with similar service types as we see here during market domains are tightly connected only four to five strangly connected components also we have shopping gambling multimedia news and bitcoin more isolated nodes and they are disconnected which reflects the competition between these service owners because they do not like to refer

their customers to other domains with similar service type on the other hand we have a directory forum and email all exhibits a single large connected component with fewer number of isolated nodes and this indicates some levels of support or cooperation between these service owners we also computed modularity to see if there is any relation between interconnectivity and interconnectivity of domains or domains and based on the definition of modularity a domain will have a high modularity if it tends to link to other domains with similar service type and has a fewer number of links to other pages with different service type here we see uh during market has the maximum value which uh indicates dream market domains

are largely siloed from all the other domains they have more link to each other and a few links to other domains with different service type also the low value of modularity for other services indicate that the majority of tour domains strongly prefer to link to other types of services we also did an importance evaluation based on some network metrics on the hyperlink structure of tour domains to see if we can find uh the interaction between information and structure on tool as we see here we have the cdf of betweenness centrality and this centrality indicates the number of shortest path that a node participates in and it helps us detect the amount of influence and node

can have over the flow of information in our graph the result indicates that betweenness centrality for the majority is close to zero and for a few of them it has values greater than 0.05 our manual investigation over such domains indicate that they belong to well-known tour directories so any attack removal or failure of such directories may directly impact the number of tor domains that can be reachable by a casual browsing and this also indicates the important role of tor directories as entry points for our crawlers we also have eigenvector analysis eigenvector describes the importance of each node in a graph as a function of the importance of its neighbors again we see in the density plots of eigenvector

centrality the majority have values near zero and for a few domains eigenvector centrality is greater than point two again our manual investigation over these domains indicate that they belong to doing market domains during market services and because the organization of tor is naturally developed the high eigenvector centrality of dream markets places them as the most meaningful structurally meaningful uh core service on tour so uh if we design uh search engines that work based on uh eigenvector centrality like page rank our search engine will rank any page that has a link to dream market as the top in the result of our search as a summary of finding for the first research we identified over half of all english

domains belonging to directory and marketplaces releasing information to public and shopping on tour we also found surprisingly large percentage of gambling domains that suggest people use tor to do online games we also identified dream market as a core service structurally important core service on score and the patterns in intra-connectivity structure indicates some levels of cooperation and competition between service owners on tour okay with regard to the tendency of tor domains to be isolated we did another study on information decage assessment to see to what extent tour hidden services may have the potential of information leakage because of linking to surface web here we want to see how many of tor dominance may have linked to the surface web which make them

vulnerable against information leakage an answer to this question can help us improve the design of pages on tour in a way that we can reduce the potential information leakage from the perspective of service owners designing the page with fewer potential for information leakage helps customers to trust in our services so uh first begin with the definition of information leakage as we know there are two ways to connect to our network directly using tor browser that is suggested by the main project and indirectly using tor proxies like torture and it's not recommended by a tor project in the second way uh proxy plays as the middleman between tor client and dark web and when the client

requests a page from dark web proxy finds the page on dark web retrieves it from dark web and rewrites the content and delivers it to the tor client but here if the requested page contains any link to the surface web these links will not rewritten by the proxy so the browser that is used by tor client can uh see the identity of tor client so this uh is the general procedure that happens for the information leakage problem and this problem happens because the user doesn't follow the recommended way to connect to the tor network we again crawlered pages on dark web and this time we collected and kept both pages from surface web and dark web because we want to

see how many of dark domains have linked to the surface web we found near 2 million distinct pages over 3500 from surface web and 3200 from dark atomics to see to what extent tour domains have the potential of information leakage due to linking to the surface web we considered a number of neighbors each tour domain has from both surface web and dark domains as we see here a plot in logarithmic scale indicates number of surface web neighbors versus the number of dark web neighbors for each tour dummy we found three different areas in this plot the first area that is the blue one belongs to services with less than three surface neighbors that form 13 percent of dark services where eight

percent have only links to tour dummies so this indicates only less than 0.1 percent of dark services have immunity against the information leakage because they do not have any link to the surface the second area the green one indicates services with less than three dark neighbors and this area forms 20 percent of dark services where 14 percent have only links to the surface so less than 0.2 percent of dark services have only links to the surface websites the rest belongs to 78 percent of all points in our plot dark services that contain hyperlinks to both surface and dark web and this population in addition to the size of second category that indicated number of tour domains with linked to

the surface web implies that more than 90 percent of tour domains contain at least one link to the surface web so they have the potential of information leakage problem we also analyze the reference view of tor services to see whether linking to the surface can change the hyperlink structure of tor dummies i also refer you to our main paper for the other network metrics but today i just want to talk about edge betweenness centrality and stress centrality as we know edge betweenness centrality of an edge indicates to what extent the age plays a role in bridging communication between network notes so again we plotted the city ccdf of edge between the centralities and we found a few domains with very

large betweenness centrality more than ten thousand manual investigation on such domains indicate that these ages belong to connect toward directories to each other so removal or failure of these directories may affect the communication between several pairs of domains on the tor network also as stress centrality of a node indicates number of shortest paths running to the node in a network we want to see which domains in the tour have high value of stress that indicates the node plays an important role in information dissemination through the network again a ccdf plot of stress centrality indicates that only six percent of our data have a stress centrality greater than the average so large portion of tour services

are reluctant to contribute to information dissemination which helps them remain difficult to discover remain isolated we also investigated domains with high stress and found that more than half of them belong to well-known directories on tour so as a result of our finding 90 percent of tour domains have at least one link to the surface web and have the vulnerability against [Music] information leakage and we found that well-known tour directories significantly contribute to communication and information dissemination in the network of dark to surface graph so as a quick conclusion on our talk we identified how we can change our view to the dark web and by considering dark web as a socio-technical system we can find uh more

about the interaction between a structure and information on tor which helps us gain a more holistic view onto our network as the direction for a future work we like to study more novel analytics to evaluate tour ecosystem and compare its hyperlink structure with other socio-technical systems whether we can model this underlying hyperlink network using other network models like ergon and also we will consider the modest operandi of services in non-english languages to see how people around the world use tor network okay uh thank you very much for your attention and please let me know if you have any questions or contact me for more details about our studies thank you

okay please let me know if you have any questions

uh yes there is a question uh for me how does dark web based uh okay impact your thank you thank you for your question yes uh we found also capture coding for some pages on dark web particularly uh dream market pages use this coding to make sure who connects and use their web pages our crawler ignored the pages with capture codes but there is also a direction to improve our clearing techniques to collect data from dark yes thank you for your question

sorry about that maddie and i meant to interrupt there but something happened all right thank you so much and uh working it from home right here say hi morning hi how are you okay thank you so much maddie um thank you obviously an important topic and um we'll uh we'll bring our next speaker on in in just a few minutes uh if you have any questions for maddie please uh post them in the besides channel for uh for this talk um leon shortly thank you thank you

[ feedback ]