In this talk we will present some techniques that we use on a day to day basis in our research, where we combine our internet-wide data scanning and acquisition platform with ML/Data science techniques which allows us to find things faster or extract results in a more automated way. We will focus on practical cases and examples that even our audience at home will be able to use if they want. A couple of examples we will look at is how to classify images such as VNC screenshots, we will look at network scans and using machine learning to classify them and also the use of natural language processing to analyze CVEs. We will also talk a bit about a data analysis and classification pipeline architecture, we will look at the different technologies and what they do and how they can be used. We will start by giving a very brief entry to the data science world and talk about: Technologies Techniques How these relate to infosec Algorithms and how they can be used How people can come into the world of data and machine learning Data visualization techniques and what are the best choices for different types of data A couple of examples we will look at is how to classify images such as VNC or x11 screenshots, OCR, we will look at network scans and using machine learning to classify them and also the use of natural language processing to analyze CVEs. We will look at scoring and classification algorithms and how they can be used on ip addresses and we will talk about the use of learning and how we are applying it in real life. We will also talk a bit about a data analysis and classification pipeline architecture, we will look at the different technologies and what they do and how they can be used. Some specific examples of our research that should give you an idea of some things we will talk about can be seen here: https://blog.binaryedge.io/2015/11/10/ssh/ https://blog.binaryedge.io/2015/09/30/vnc-image-analysis-and-data-science/ https://blog.binaryedge.io/2015/08/10/data-technologies-and-security-part-1/ About the Speaker: Tiago, Filipa, Ana and Florentino swim in data every single day. From looking at what people are downloading to how they are exposing themselves, we LOVE DATA! Tiago (@Balgan) is the CEO and Data necromancer at BinaryEdge however he gets to meddle in the intersection of data science and cybersecurity by providing his team with lovely problems that they solve on a daily basis. Filipa (@filipacsr) is the Data Diva at BinaryEdge, she dances the macarena with numbers to get them to tell her all their dirty secret. Florentino (@fbexiga) is the Data MacGyver at BinaryEdge, on a daily basis he needs to deploy infrastructure used to analyse big and realtime data.When not doing that he can be found creating models to analyse data,give me an orange, i’ll give you a skynet. Why an orange you ask? I’m hungry and like oranges, there! Ana (@ana_barbosa90) is the Data Ferret at BinaryEdge. She is small and hides between the 110th and 111th characters of the ascii code to see and show data in that unique perspective of someone who can’t reach the box of cookies stored on top of the capitol ‘I’