← All talks

GT - Is Data Visualization Still Necessary? - Edmond Rogers, Grace Rogers & John Stillwell

BSides Las Vegas35:55107 viewsPublished 2017-08Watch on YouTube ↗
About this talk
GT - Is Data Visualization Still Necessary? - Edmond Rogers, Grace Rogers & John Stillwell Ground Truth BSidesLV 2017 - Tuscany Hotel - July 26, 2017
Show transcript [en]

please welcome Edmund Rodgers and grace Rodgers thank you oh here we go we're in Las Vegas again that this shot on the plane I am NOT going to be doing any bullet points in my entire presentation I apologize in advance so I heard some stuff was getting into Vegas I'm really I was really excited I bet there was a lot of people it brought their weed money and hopefully there's enough weed for everybody here they've been having some problems it might be having supply problems I don't know any these things personally especially since this is being streamed live to John maybe on the other end of the area but when the deadline was coming up for b-sides

talked and I hadn't been rejected from blackhat yet I decided that I wanted to do a rant about visualization because I've been working in visualization and making visualization tools for well over a decade now and I saw this picture about the day that I decided I was gonna make the CFP and this is a Jay trace visualization it's a big lot of you know s star star because we're live but you know I was like this is exactly it typifies the problem that I see whenever we try to do a lot of things with many objects in a screen and visualization because where do you cut data and are you cutting stuff that is actually

relevant and then and then even like the use of color can always be a matter of perspective right because what color is that haffley you think it's flowing color happy you think it's another color and then you know and then death by PowerPoint with death by diagram is another thing so here we have another nice visualization of the integrated defense acquisition technology and logistics lifecycle that I stole off the internet somewhere but it's just the idea about where's the information and you know it's a real challenge in research and then I'm gonna show screenshots from a bunch of tools that I use I've developed some of them I just used some of them and like there's

this tool I don't know if anybody uses this this is glass wire and I kind like run this on my computer thanks to to take takes I saw a link this tool and I actually bought it when it was on sale it's pretty cool it gives you some vegetable ization about what's going on in your computer right now like this is the last five minutes when I did the screenshot but then again the other thing you look at when you look at a week it's like what the hell is this I don't know anything from looking at this visualization unless I try to drill down into it and then one of the themes that I really saw was making screenshots for

this was that when you drill down you lose context with everything else in the visualization and then as a developer we were making visualizations you know it's always a trade-off between does it look nice or does it actually have information that we could use and so in other pieces of research we end up making stick diagrams that are actually useful in data but they really don't look good and if you're going to try and make a tool that people want to use it it's not visually appealing but it has a lot of cool information and like in our office where we're doing research like this is the site's office where we're working on types of version 2.0 which

wasn't ready for a b-sides talk yet but we're going to talk about some of the visualizations that we're working on in that you know we just stick figures on the wall and we actually play with like control system equipment down there at the bottom because primarily I did a lot of work on visualizations on power grid stuff and you know this is a visualization of the power grid it's really big right but then you can't really get any individual details about the power grid by looking at a visualization that big zoom in and just like what we released last year here at b-sides the sipes the tool we try to blend physical impact information in with cybersecurity

information and I'm not going to drink every time I say cyber in this talk because I won't be able to walk out because there's a lot of cyber in this talk so the whole idea behind Sikes if you try to remember we released it last year is about how do we have a mapping of the network attack surface and then we look at the physical impact of what would happen on the grid if a host was compromised and whether or not the system comes back to a steady state or it causes a blackout and everything goes crazy in the visualization you don't need to be an electrical engineer to understand that this is bad it looks

like an earthquake it's probably bad yeah it's not a seismology machine well that would be bad on a seismology machine too but then again you know I could stand up here and ran about this for a very long time but if you have any questions just kind of like you shout out at me because I'm just going to keep going because I'm fueled on Red Bull right now so our challenge of visualization was you know we build a lab we make a nice video of it because it looks cool and neat we have up there in the upper right hand corner like the visualization of the network attack surface and we have the physical manifestation down there in the bottom

right we want to put it all together and then like in the tool release last year where we're working with different graphical libraries like earlier I heard d3 mentioned and you know we did some stuff with the actually bus model doing some visualizations that were released in the tool and and then I did I showed this last year too but then when we got to the 300 bus modeled you know the tool just doesn't move so like whenever you do anything to try and move one step when you visualize 300 substations it takes like 30 to 30 seconds to a minute to increment one so it takes a really long time talk about text for ants but I'm pretty sure

ants can't read that either and then you know we went to maybe connected things in a little graph these are all just things that we went through as we were iterating on the tool but because the job that I really want to do is already taken by Bill I'm going to do a demo and we're gonna we're gonna try and do some demos too because I don't have any I don't have any bullet points to talk to so I'm gonna have to just show you tools I hope you don't mind now I got to remember which browser this is in and you don't see my porn okay so this is what we released last year I just wanted

to show really one quick thing about this because you know we've got the visualization here where we weren't on this idea about making a tree and then you click on things in the trade and get more information about what might be going on in the substation but again as we zoom in we lose a look on the larger system so I'm getting more specific information about a few things but not the overall picture and then it come back up here and so we tried this view where we had the substation inventory over the left and the assets on the right and then some of the other stuff that we covered in to talk and what I'll

just show is like d3 we use these three in this visualization here and then this is really interesting because it's not as gloppy as the first screenshot ahead but one of the things I did when I saw this and we finished this right before we released the tool last year I said that looks great but I can't really do anything with it because I don't know where the substations are and then so if i zoom in to try and figure out where I might be I have no idea where I am in the context entire diagram and then so these guys so like it's the same thing like when you're in a map and you're zooming in

you have no this is the name of the street and you're backing up trying to figure out where you might want to go it's hard to zoom in in diagrams and one of the questions I really wanted to start asking myself just like is is it that we're never going to solve this problem in visualization because how do you provide enough information for a human being to actually ascertain something from the diagram and still maintain a sense of context and things like that because there's a there's a trade-off between where am I in Las Vegas and you know what street do I need to be on and what businesses are there and you know this is just a

visualization problem I'm not trying to pick on Google and I don't want to be sued by them they do a fine job I think I was giving slides so maybe I should go back to Bill here because there's more stuff here because like it was it this button yeah nice she's helping me out here so about my 10 or 15 years ago wrote helped to write a tool that takes firewalls and does visualization and this was it was funded by do EE DHS and NSF grants along the years and it's been commercialized but back in the days when we're first starting out this is like a diagram of a SCADA system that controls the power grid at Ameren I used to work

at hammerin and I actually released this these diagrams for public consumption about 12 years ago so I'm not just so show them I think it's not there'll be another lawsuit and we'll hear about that in Twitter but I think this would be okay this has been shown publicly so this was the way the control system kind of looked in Ameren for twelve thirteen years ago and you know we kind of did a lot of things with colors because the dress wasn't out yet and we had VPN tunnels and it took me a really long time to make diagram to get some kind of an idea to show people what the network looked like the how we maintain the the transmission

system on the Android Network and then we did other cool things which we had the visualization like we could show just where the EMS traffic was that provided the thing the the power and then we could show specific protocol traffic and then as the project went into a DHS funding we also started to go in and say well let's look at a tax surface and then this Hut is leather thing comes in so now we have a visual depiction of you know onion skin how many hosts can get - how many hosts in network and then you've got the same visualization problem where if I pull out and I'm looking at this from a certain perspective I get certain

information but then I really have to zoom in to a different level of context to get an idea about what the attack surface looks like so this guy can talk to one guy that can then talk to 28 guys etc etc but then I have no idea where I'm going to network because I've pivoted away from anything that makes sense and just like that first diagram I showed you so if you look at the tool the weight stands today it kind of looks like this when it comes out and then when I blow everything up I can see again that I have a lot of stuff that really has no good frame of reference as

to see a bunch of dots on the screen that a human can only really ascertain a grok a certain number of dots right and but I was supposed to be doing demos and I've got a lady's pulled back her chair we're gonna remind ourselves to do another demo here because I think I've got this tool here so this is like the live version is tolling you see like when I want to go in and see something I lose specific reference and then it's gonna move real slow because it's bogging down a little bit because I have like all these demos running simultaneously but but you get the idea about you know I want to move things

around here and it's just okay I'm moving this guy around and now I have no idea it's a hard problem you know we can do things like even in the tool you do things like detach the graph and then whole bigger problem than I can look at here them still in this thing where you know what's the answer I mean we're supposed was hoping to have a discussion about this you know we're upright against the break and I'd be happy to talk to people if they want to talk to us offline because we are streaming but the whole idea is like what's the answer we've been developing tools like this for a well over a decade or more and

we're always run into the same problem I looked at a couple of the other visualization talks earlier today and I could see the same thing and thieve going in over and over again a lot of visualization tools and I was doing analysts like data and management likes flashy graphics yeah yeah well let me get back and make sure I didn't put any bullet points in here okay we're back so like we're talking about the sipes and stuff we are working on a better front end for sipes it's not ready yet and then grace yeah is interning with me right now she's been working with Carl Reinhart at UIUC on a different project and she's got to go through what she's

been working on a visualization and then I'm going to let her take over for a little while here and then anytime you want to stop and ask any questions please do or I'll be happy to keep talking what else so we have taken one of Miss interagency reports that looks at the cyber security protocols for us the smart grid and this is their overview of all of their actors which are components in the system from everything from distribution to like management and it's kind of a mess so they split it up into logical interface categories but there's 22 of them and they're across more than 40 pages of a PDF so they're really hard to access and like really comprehend and

see how they'll relate to each other so we created a tool which were going to demo and it basically combines all of those logical interfaces into one graphic that is in a 3d model so that we can more easily compare what's going on and so you can do you know flashy things like zoom in and zoom out spinning around view at different angles separate separate the layers and like so if you might recognize this is the same as what was just on that previous like the first layer has everything and then the other layers deal with specific things if you're looking for a specific thing say we're interested in distribution SCADA we can search it and filter out

everything that doesn't have anything to do with that and so now it's highlighted all of the distribution SCADA pieces and we are only dealing with what things that are relate to that so you can say okay what's related to something goes wrong with distribution SCADA well so it's related to that but even this is still you can't get a reference and it's clunky and you can try to you know separate things out so you can get a more comprehensive view but in the end it's still a problem of scale your brain can only process so much information at a time but we also want to ingest information in the future I'm working on a system to take things from other

documents and things like sipes ax and also lay out their cybersecurity protocols and we think this tool is useful for a reference

and so we're working on it so you can have an easy easy reference like we have a reference to some of the information that's actually in the document and we want to make it so people can save their own notes so if you're like Oh in this category we have these devices so that they can have a more useful reference but we also think it's really useful to present things to less cyber and client management who may need a graphic to really understand what you're trying to get at so before we go back into the slides I think it's really cool because when I first saw this early thing when Carl was working on it I always imagined

like if I could get in and do my cyber physical modeling that we were trying to do in two dimensions and lay a different graph out maybe I could get a wider piece of information because you can just pivot around and look at the graphic so and again this is still in you know grace is developing on the front end of this and there's a couple of people working on this project with us and we want to be able to mix together and slice the tooth out of this and so you know all the thing about timing working in the visualization and we really wanted to show what we were working on and just get maybe some

feedback about whether that we're headed in the right direction because this is not ready just yet but it should be soon some of the nister piece is it going to is it been released by you and paulien or yeah it's being sent out to some people in the industry you might want to test run it and give us new features to add the one of the big things we want to do is make it so you can save your own notes because we find that that's probably the most useful and in getting feedback people are like yeah that's what we want but if you want to talk offline about things you would like to see it do we're more than happy to talk

about that so so so there you go so grace you did really good this is her first ever big demo in front of a group of people so last time it was an underground talk so there's no pressure if you screw up nobody knows about it this is the extreme to the masses so sometimes you good job grace so you know I am a data analyst at heart and I worked at visualization too but this is the kind of stuff that I really get excited about when I'm looking at data and I'm interested in so this is actual dnp3 traffic on utilities network and it's about 250 Meg's a day and I think it was under those several hours

of data maybe 16 hours and this is the kind of stuff that I look at getting gleaning a lot of things not only I had to hide the IP addresses but these are all the different packages and look at different things there because one of the things like if you go to Wireshark and you look at things like the flow rate so here's the flow rate over and that cut off with the graphic down here in the bottom right hand corner it's like you know 80,000 seconds or something like that like once eighty thousand seconds so now I'm like dividing by 60 and six need to try to figure out how long it was but the

Hollis and then like if you go and kind of zoom in a little bit then you get this other thing it's like you know here is a piece of the graph but then I'm losing all the context about where it is and then you know so maybe we should look at this let's take a look and see if I screwed up yeah I did there we go

so here's a tayo graph and so you see if i zoom in to see what's going on here in the corner like it looks kind of better than this actually than what I did the screenshot but you know pull it in and I'm losing the the context of the graph and and the other thing that I'm really not going to show in that twenty minute demo is it took me like two and a half minutes to load this graph because there were millions of packets involved in the it's a 256 make trace so it took well it goes in three minutes three or four minutes for it to load because I actually reloaded in the speaker room

cuz the graphic gone away away oh crap gravity so I'm glad it stayed through the apartment you get the idea I mean if there's stuff in Wireshark like where it has if it has radical IP addresses if you want to do the map it just kind of like stops and it doesn't work because the the libraries begin to choke on themselves when you put a lot of data into them so it's a very difficult problem in visualization and so so far so good on the demo guides and so we made it so far we didn't lose any lives how they anticipating a bad demo but the the third speaker on the talk John Stillwell can't make it but he did

provide some screenshots and he wanted to talk about what they do with visualization nowadays but with my visualization so they use this tool here but they put this on a big 4k screen in Ameren and they can see the traffic moving around live and this is on a server class system with I think it's got 16 or 24 process there's a ton of and they they went in and but again you have this idea where if you look here in the middle of the screen for all the little control systems up is there's a lot second you don't have any kind of context as to what is actually going on so then you have to come up and do

things like have multiple views where you can zoom in and just see the control system and everything they might be going on in live and I think this is like relative the packet size there's something in two different colors presses that you see up there that's real crazy when it's going live yeah grace got this do a tour and see it and we were doing the tour and seeing it if you flip to the next slide we saw some strange data going out to California and we looked at the title and it was like not an English word so we were like oh man we found some anomalous data even like when you have all of this horsepower they found that

they had to have this is only operational data and non operational data got moved off to the side of the graphic because it was just too busy with everything in it even with a big giant six-foot 4k screen where you play that had reasons to employ the state of the art and there's still not enough to just as a human how to understand all of these things can now visualize with technology and I think that the question really is is data visualization necessary

yeah Wow you know it's like I don't know that it's necessary if you want to build a tool and sell it you know cuz nobody's gonna want to look at a tool it looks like this where's my slides they're going on crazy now so we go and we look at a tool that's like it just gives me this how you gonna sell that but then again on the other hand there so I'm gonna build a tool that visualizes what are you going to pay for a tool that can visualize something like this you know what's in your budget you know because some of these tools are very expensive because they charge by device so that's that that's the

question and it's something that I think it's an ongoing struggle forever but it does visualization because it's like it's like you know almost like an np-complete problem so that's pretty much we only we're doing 25 minutes and I think it's about pretty close to 25 minutes I don't know what time is it and thankfully put us up against the break so if you want to have side conversations I think that it's going to be okay that maybe like the people watching though is not going to be able to participate in that so John John you going changes shorts pretty soon now I think do I do any questions the audience yeah the mics coming they want the

people online to be able to hear you hi thank you for your talk so you mentioned visualization and selling a product and yeah it looked pretty I'm paraphrasing slightly that looks great but how do you get actual intelligence from that and how do you create rules that can defend and network yeah I know that in this particular visualization some of the things that came out were poorly configured DNS shows up with visualization like this so you you move to an abstract layer of DNS and you can see that you're using 8.8.8.8 for DNS and it's being blocked at the firewall so you have misconfigured machines that would like try to use Google DNS or something like that or for port 53 or

things like that so some of this stuff you can catch in visualization but it can just from an analyst perspective could be easily caught just from looking at raw data but then I think visualization does serve a really good point at adding to your general IQ level about what's going on in your network because if I see a nice visualization you can get a lot of pattern recognition there but again it all comes down to level of abstraction because I'm back down here can I lose I lose some views that might be relevant so that's a really good question I think we both don t like visualization can be handy for analysis but only if you can also access

some form of data whereas it's useful for an overview look that tells you where it what data you need to be looking at but you still have to be able to look at that data and think the other thing we decided was that visualization was really good for presentations marketing and management and demos yeah yeah come on divisions Asians are cool for demos but just came up here with a bunch of statistics we would have to use a power we would have to use bullet points yeah so you might have inadvertently answered my question but given the relative pourer amount of information that can be gained from these diagrams or workflows what do you want to call them what exactly how

exactly do you use them and how have you convinced your employer to continue paying you to do them so well look when I was at Amran this tool came about because I'm lazy so I'm a I'm an information security guy and there was a regulation called the nursin and we had to document the attack surface of our firewalls for this piece of regulation and I was like I don't really want to do all this work I want to I want to have a tool go in and parse my firewalls so that we can like spit out spreadsheets so the honors can come and see what rules we allow at the firewall and then at the same time there

was some people from San Diego came and did a vulnerability assessment the tool called me informed it did a visualization kind of like this I had the two tools put together and that's where this tool came from and then we were able to go in and say here's all of our traffic we see that everything is inside a VPN tunnel network protected by firewalls there's no connectivity to the corporate network and then we went through in step through each individual piece of traffic that comprised itself of the control network and having a static visualization like that really helped to explain this to non-technical people so yeah there's a lot that you can do with visualization and then if

the fine is high enough the price of the tool is worth it yeah they don't say long long answer but you know it was a bit of a complication with visualization is really good for the show your work part of the equation so did you get to mine 30 okay go ahead so how do you show Network outages versus mission data if it's critical so like on planning or anything that's required to be up and also near boxes go down how do you visualize that are you just going with that was SNMP data 30 minutes ago wow that's a heck of a question so hopefully then I've seen critical systems and then a lot of electric

system things on the model ahead of time we know what happens analogous and you knew this is a very partner there's there are network management tools that for two decades have been really good about aggregating alarms meaning but no there's not any real satisfied answer that I can give you standing up here without any tool to sell you anyway give me some time and how much money you willing to spend for something like that I mean that that comes down to the other bottom line how much money is just actually worth because in order to go through the research effort to answer some questions like that provide redundancy and network connections or another thing that we're really looking

at in research is performing calculations at the edge and then allowing devices to make their own decisions independently when there's a network outage because as like for it like Ana grid for it in a substation I know pretty much where I am on the network and I've got enough computing power at the edge so they can make their own decisions about how they're going to maintain stability and then we work to get communications back up so that we can reestablish our situational awareness which is pretty much what they did in the grid 30 or 40 years ago when there was an outage they rolled trucks out and the guy went to the substation see what was going on they got on a

radio and said we need to do this XYZ and then they turned the lights back on so we need to take advantage of computing power at the edge and let the edge make decisions when there's no communication with the mothership maybe that's a but by the answers I can go with it spotlight in my face but that's a great question I owe you a drink soon you have to the top I'll carry him high so I do a lot of work in user experience which covers DataViz to a fair degree and I know that how much of this is just you're looking at this from you know you're used to dealing with spreadsheets this is ten this is sort of the

affordance that you've been given therefore that this is new and confusing versus your actual ability to manipulate the data versus if I could just explain to somebody how I want to look at this they would they would solve half of my problems is this making sense so yeah I feel like what all problems people face when they get new visualization tools like for example John still was telling us when he got the tool that we showed today the one with the map in the background he was telling us how long it took him to just get the tool set up so that it worked properly on their system and I feel like that's a big problem in

visualization because everyone wants to use the tool in a different way and you have to be able to customize the tool for your own personal needs because no two people are going to use the tool in the same way right it may actually create questions like if you go back to we were sitting here and you know I know this network I've been working there for years I knew what all these little jobs were for so I can make a really cool visual but if you look back and you're looking at something that looks like this when you're you looking at something it looks like this you have no idea what far out of this network you're

not and you have no idea what's going on here and so how do you find I know what an EMS system and I know what ICCP is I know with DNS if I want to see DNS in this map and then in the in the interface being able to find how do you find something to give yourself something to pivot off of so you can learn from the visualization it's something that it's a very difficult problem to solve and you know Gabriel maybe she can get a talk and she can talk about this next year they say that a user interface yeah me they they say a good UI is like a good joke and if you have to explain it it's

not good and I think visualization kind of works the same way if you have to explain how your visualization works and what it's supposed to be representing then it's too complicated and something

works down here or something that I don't want to see anything critical so when I run it I run an analysis and it says well this is all the traffic that goes into the control system and I can see visually that I don't have any traffic that those places that might be bad and then it only goes without throwing the dmz so the historian traffic goes out only so this traffic is actually traffic going into the control system monitoring water treatment no that would be like implants okay this is on the transmission network in plants you have yeah you have dial-up modems give downloading substations too but they generally connect to things like fault recorders that aren't inside to

control them so you know when there's an outage at a substation you call up on modem and you get a bunch of data from the protective relay which is like when the connections down they have a modem backup and they suck in some information there's an a lob to turn the circuit back on again until I figure out what happened so you need the data from the field and generally it comes in on a boat or a cell phone modem now maybe for research yeah whatever our vehicle goes out to download the information then you can turn the circuit back on know from this perspective this only looks at firewall connected IP it's another thing where it

all depends on what you're using the visualization for maybe you need that data in the visualization maybe you need to focus on something else that's why I think the nister stuff is really cool because you can give it around and show different abstraction levels so you can have your your modem connectivity or whatever you want to do in your level of abstraction and then and then pivot back around and see where it is from your own 2d perspective so I really like this idea about 3d cubing visualization and then just put a lot of data in it as we're doing this and research is how big you can make it and still get information out of it