← All talks

Effective Monitoring for Operational Security

BSides Charm25:243 viewsPublished 2021-05Watch on YouTube ↗
Speakers
Tags
About this talk
Russell Mosley and Ryan St. Germain present a comprehensive operational security monitoring strategy that combines log analysis, active monitoring tools, and manual review to establish baselines and detect anomalies. Drawing from real-world examples across email, web traffic, physical access, endpoint activity, and network logs, the speakers demonstrate how to distinguish normal operations from suspicious behavior without overwhelming security teams with alert fatigue.
Show original YouTube description
Effective Monitoring for Operational Security As Infosec practitioners, how well do you really know and monitor your IT and business operations? Would you identify a data exfiltration event by a bandwidth increase without attendant malware alerts? Would you identify an employee staying late and attempting to gain physical access to a restricted area? Would you identify a successful VPN login from another country? We will present effective monitoring methods we utilize and the resulting outputs that teach us what normal operations look like in order to identify suspicious activity. By reviewing these types of reports or tickets on a daily basis you will know your IT and business operations well enough to identify anomalies that may evade detection by your security tools. We will show example reports and tickets from our organization covering a variety of these topics and discuss how we analyze them, as well as how we use the information to better tune our monitoring tools. Presenters: Russell Mosley (@sm0kem) and Ryan St. Germain (@r_stgermain) Russell is an IT Infrastructure & Security Director for a Silver Spring software and outsourced accounting services company. Russell has seventeen years' experience in IT operations and enterprise defense and is responsible for the organization's compliance with SOC and FISMA requirements. He holds degrees from UMBC, UMUC, and Towson University as well as CISSP and several vendor certifications. Ryan is a Senior Information Security Engineer with ten years' experience, a Master's Degree, and CISSP certification.
Show transcript [en]

if you didn't make it try to make it next year we all had a we all had a blast so uh without further ado we're here today to talk about effective monitoring for operational security i'm going to give a quick overview and then we're going to get into those slides i was just telling you about with all the output data so by monitoring we're talking about processes and tools that we use for tracking information and statistics alerting based on them and detecting trends and anomalies and for this presentation we're focusing on log data and also output from some active monitoring tools that we use things like ping and snmp and wmi as well as some custom scripts

we work for a small it company and so our department is actually responsible for both it infrastructure and security and for that reason monitoring is essential for it operations to ensure availability and performance and keep track of utilization as well as for information security so we monitor access to resources of course and we're looking for malicious activity to start our incident response process briefly why is monitoring important well monitoring is important for situational awareness and for knowing your baseline statistics i.e what is normal the more you monitor and review your it operations the better your situational awareness and knowledge of what's normal is going to be so how can you detect anomalies or abnormal traffic or behavior without

baselines and knowing what normal looks like do you know what your normal bandwidth utilization looks like do you know where your users normally vpn in from if you're not monitoring those things you're not going to have a good solid idea of what is normal second monitoring is important because it improves your iot operations yes availability and performance monitoring are generally responsibilities not responsibilities of infosec teams but what do your security teams do when they come across application errors or a misconfiguration that will impact performance or availability you know those things in addition to you know you want to notify your system admins and your users or you want to fix those problems before your users see them

utilization spikes and application errors can also be indicators of compromise third monitoring is important actually critical to incident detection and response this is a slide from learnsecurity.org uh quoting a fireeye report that it takes 146 days to detect a breach there are lots of reports that say anywhere from 99 to 500 days you know anyway you look at it it's too long you know for the average company it takes way too long to detect a breach monitoring is critical for incident detection you don't want to have to wait for your users to notify you or brian krebs or someone else's blog to notify you that you've had a breach right so daily manual log review of

exceptions and anomalies is a key to security operations and incident detection all right so we know monitoring is important what do you monitor these are some of the logs and data that we focus on things like your server logs event logs syslog right your security and network device logs firewalls routers other infrastructure maybe even web proxy logs or dns request logs and logs from your applications how do you decide what to log and what logs to analyze and review we actually borrowed the slide with permission from yesterday's keynote speaker jessica payne this is from one of her blog posts it's called monitoring what matters and this is what microsoft's incident response team often finds when they're

called in to do incident response many organizations either seem to log way too much right without the proper context are not enough um not you know usually other than the default logging capability so you need to decide what to log based on your business needs and the resources that you have to analyze them you also need to set up tools and processes to make sure you're not drinking from a fire hose and you're not helping yourself by not keeping any logs so when determining your monitoring strategy you need to determine what's critical to the business right this might be determined for you if you have compliance requirements like pci hipaa fisma fedramp etc it's also going

to vary depending on the type of business you are if you're a retail company you're going to have different needs than a government contractor or an information security company and of course you should try to monitor appropriate to the risk level that's acceptable to your organization all right now we're going to tell you about our events management process and i have a laser pointer i hope this helps and isn't a distraction uh over here on the left so these are all the logs we're talking about right your server logs routers network infrastructure storage devices we feed everything into splunk you can use any log analysis tool we use splunk and so that's what we're going to be talking about here

on splunk we have a lot of queries that we've developed over the years custom queries to analyze all that data and they generate some real time like one to five minute searches generate alerts for high severity issues that will go to our support team you know they'll resolve the issues and you know create a ticket in addition to those we have about 50 today daily tickets that produce us reports for things like successes and fails of backups or system accesses and authentication also customer support tickets all of these tickets get reviewed on a daily basis and when it comes time for our audits we actually are able to print out all of the tickets that our auditors want to see

from the various weeks that they pick and tell us and just give them a stack of you know pdfs and reports and it really really streamlines the audit process for us to be able to do that and they basically just say you know here's what we need we're going to look it all over and ask you any questions that we have and you know we tell them about anything new of course but that's that's the majority of what we have to provide to our auditors are these you know these daily tickets that we review in addition when we see an opportunity for improvement you know if there's a ticket that has way too much data

or not enough pieces of critical information we'll go back and modify the search and output criteria to continually improve the process so monitoring authentication what exactly do we monitor uh success and fails of authentication to various systems like active directory vpn logins things like that application authentications from the custom maps that our organization operates multi-factor authentication tools so we're going to show a bunch of tickets like this this is what i was talking about if you want to move forward to see it better feel free this is an example of our daily rdp logins so this is windows remote desktop servers right authentications to rdp and this is the output we see in the daily ticket on the left we've redacted

information by basically copying whitespace so you'd have the username there account name of course whether it was a success or fail the source ip address and the time stamp so with this ticket we might be looking for things like multiple failed logins or people logging in at a very unusual time to review the activity and look for things that are abnormal that we might need to you know further investigate this is a particular system daily authentication ticket and here it's showing us some ssh logins these are actually automated uh processes uh that are logging in but it's a good example to see for authentications obviously we've got the process it's sshd you would see the user account the

source ip address as well like the previous ticket and of course your time stamp on the left for monitoring remote access we're talking about vpn logins we monitor success and fail we show them in our tickets we're able to review all the successes every day in addition to fails because again we're a fairly small company larger companies this might not scale as well but for us we look at all the successful logins as well on a daily basis and one of the things that we found is actually really useful is we sort of get a location piece of information in our daily tickets by doing a reverse lookup on the ip address that the user logged in from and putting

that in the ticket so we see the fqdn or fully qualified domain name in the ticket and this is what that looks like so this is a daily vpn users ticket and on the left you've got your origin your vpn endpoint or your originating gateway the user account the time stamp and on the right i redacted the beginning of these because you actually have the reverse ip but you can see our company is located in the dc area most of our users are in baltimore dc northern virginia you can see in the top one here baltimoremaryland.fios.verizon.net right and down here you see washingtondc.fios.verizon.net so if we see a login from someone from texas that we know lives in virginia

will go to their supervisor and say hey are they on vacation should they be logging in from texas so it's a great way we can look for unusual activity by reviewing these every day and you know having a fairly small user base and knowing where everyone resides folks have actually gotten used to this over time and they'll generally come to us or send us an email like hey i'm going to be traveling next week so you know don't go asking my boss where i am when you see my login you know monitoring email um we're looking for mass spam and phishing campaigns by monitoring email information uh also owa logins that's owa is outlook web access it's the web interface for

microsoft exchange if you're not familiar and with that we get the devices that are logging in and locations by ip addresses again we also monitor attachments for sort of a data loss prevention kind of a basic dlp i'll show you the ticket where we can look for people possibly sending information out that they shouldn't be this is a daily outbound attachments ticket and so what we have here is timestamp of course on the left you've got the file name and this is the file that they attached from within our organization obviously you're not going to see this for gmail and stuff but for the organization's email that's the file name of the attachment uh who it was from and who was to

so by reviewing this information you know we become accustomed to what's normal and we look for things that are possibly abnormal someone sending out um hr information or procedure manuals or someone sending an authorization for a file transfer that kind of thing we've caught before and gone to folks and said hey should you be doing this was it encrypted those sorts of questions this is a daily outlook web access connections ticket and with this one you see the ip address that their device connected from this is your phone or laptop logging into the system the user accounts and the device id and so with this ticket you're looking for something like a crazy number of different ip addresses

which might just indicate that they're driving and hitting a lot of cell towers also multiple device ids mostly with this ticket i think we get to know what's normal and just catch things that are really abnormal it's kind of a lot of information but at times it has been it has been useful to us for web traffic we send all of our proxy logs and policy violation logs into um into splunk and we have daily tickets where we go back and review top tens these are things like the top 10 users of uh bandwidth you know of web proxy users the top 10 sites that are that they're going to the top 10 categories and of course policy violators and we

also send our dns logs into splunk and we have a ticket we actually have real times and a daily ticket that are looking for some things we'll get into a little bit later so this is a proxy violations alert this is one of the real times so this isn't a daily this is actually this might be an hourly i'm not sure but it's either hourly or real time ryan says every 15 minutes so here you get the user account your timestamp of course you see the category this is a sexually explicit website was attempted to go to and you actually get the url right so from here we can go back and you know check out

the policy violation from the device or the user and also use this to tweak the categories because they're obviously not always perfect right anyone who's doing web proxy admin uh this one is a potential c2 over dns and i need ryan to explain this so uh basically what we're looking for is the randomness of the domain that's being requested so in essence we're looking at what's the entropy um because this can be an indicator of communication over uh dns tunneling and this is also great for trying to detect those randomly generated domains that are also used for command and control communications for physical access we monitor and review every day success and fail for all the key card swipes at our

headquarters building location so when you're going into various parts of the building you have to swipe your key those get logged and they go into a daily ticket in addition to the daily ticket we have a real-time alert for when someone swipes to go through a door that we don't have them configured you know access rights to and we'll get an alert on that immediately with these by reviewing these every day again we're learning normal patterns so that we can spot anomalies this is an example of one of those daily um physical security accesses so on the left you have your time stamp then you have the account name the result whether it was access granted or access

denied and on the right we have the location right so you can see things like rear entrance lower level double doors those sorts of things again this might not scale obviously to a really large organization but for us it works really well to keep track and we can see like larry was working late yesterday for example all right ryan's gonna take over now

so on um unannounced file changes to uh you know your servers and and several endpoints could be an indicator of compromise so what we do is we monitor these file systems with tools such as tripwire and splunk we sort of accomplished the same thing with windows event logs for our dfs shares we're looking for the unauthorized changes of files and unauthorized access of these files so we're looking for success and fail for this

there we go uh so here's an example of our one of our tripwire tickets that we uh receive every day and and we we review it in the morning this is an example of our linux linux tripwire ticket we're looking for changes that we didn't expect to see so since we're a small group we usually tell each other if something if we change something on a system then in the morning we know what to expect but we're looking as you can see there's removed elements there's added elements and then you can see that there are modified elements so we're looking for the modification of certain files and the addition of certain files that could potentially be malicious such as a

php shell and this is the example of our dfs ticket in the morning so here you can see that we're monitoring the hr document share um and we know who should be accessing and modifying files in these shares so it's one of those things that we're looking for anomalies um and also looking for users that you know are either illegally trying to modify a file or just try to access the shares in general so trying to detect physical access leaving your buildings pretty difficult so we're trying to monitor we're trying to monitor this such as you know print jobs optical media being written to disk and usb usage globally we block usb with group policy but

we found a method which i'll show you later on on how to detect if someone actually attempted to plug one of these usb devices in this is an example of us reviewing the daily print jobs of individuals in the organization so on the left side you can see that it shows the number of pages and the users along with the printer that the job was sent to and the document that was printed so here we're looking for an individual that we know shouldn't be printing a certain document or shouldn't have access to a certain document but somehow gained that access and is now trying to take the document outside the building and here's an alert from sysmond that we

have configured to notify us if an individual attempts to plug a usb device into their machine so we get this it's a real-time search it's looking for certain changes on the system in the registry keys and it notifies us telling us which computer the device was plugged into along with the device that was used or plugged into the machine so along with you know a lot of these crypto miners and other resource intensive uh malware payloads as you can say utilizing uh you know performance monitors on systems is becoming even more important so we use it not only for billing and planning but for identifying malicious activity on a system because we view this data on a daily

basis we know what's normal so we're just looking for something that's abnormal to us and this is an example of one of those alerts telling us that threshold of the disk space was succeeded same thing goes for you know cpu utilization just you know like with a crypto miner if if all of a sudden our system spikes right away and we don't expect it to well there's potentially something going on there so we have all of these security devices in our organizations and you know taking advantage of the data that they have has become pretty difficult just because the sheer number of devices that we have so what we're doing is we're sourcing all the data from these devices and and

sending them into our sim and then writing these useful alerts and reports to notify us either in real time or on a daily or hourly basis telling us what we actually want to know and we're not drinking like russell said we're not drinking from that fire hose we're just trying to take advantage of the data that it's providing us and here's one of those uh here's an example of that so we get an hourly email showing us the past hour of ips alerts and what we're doing is we're trying to identify abnormalities in this we're looking for spikes in specific signatures or we're looking for a large number of signatures that were hit from you know a single source or that there's

a single destination all depends on your organization and here's another example this is uh an alert that hits as soon as um something in our splunk search is identified in this example it's uh you know an angular exploit kit was detected and it's giving us the resource that was it that the signature actually saw and detected on along with the source machine and the destination this way we can easily initiate our incident response procedures and go from there so for endpoints we you know we don't do anything really crazy we we use regular commercial input and a regular commercial endpoint product um just you know basic protections but then on top of that we put sysmond for

more visibility and we use the uh the alerting capability of our endpoint product to notify us in real time of of an alert or if an agent's out of date and for the alerts themselves on on a virus being detected we receive a text message this way it can speed up the process of of contacting the individual that the alert was set off on this is one of the this is an example of that so not only do we receive a text message but we also receive a an email with all the resource information that we need and then can go from there so i i talked about sysmon earlier on um we used you know swift on security's

configuration which is awesome um and then modified it for our our environment and when i say modify we basically removed things or added exceptions for things that um were just noise and prevented us from actually identifying the actual malicious activity so to deploy the sysmon um implementation we use group policy to send out sysmon along with the configuration file then send a splunk forwarder forwarder to the endpoint which is how we actually get it to our sim um the configuration is updated on a regular basis with a regular scheduled task that way if we see that we need to add something during the day we just shoot it in and and the endpoint receives it um and then

we wrote uh alerts and reports that we deem useful for our environment just to take advantage of the data and finally our big goal is to integrate the miter attack matrix with our sysmon implementation in splunk hopefully by the end of the year so here are a few examples of our sysmon alerts that we have i'm going to go through a few of these real quick so the first is the illegal image activity alert that we know what to expect from our users to run we know what they shouldn't be running so we have a list of um images that we're looking for and then sending us an alert on in this example it's mshta executing an

hta file on an endpoint so here's an alert for an illegal or suspicious child process of an office document this could be powerpoint or word but basically we took a baseline of what cismond sees when you open up one of these office products and then excluded that from the search and so we're pretty much looking for any other child process of office this way we're not specifically looking for powershell or mshta or cert util we're looking for something that we're not used to seeing and this is for reg svr32 this is straight from the miter attack matrix um so it's pretty much the same thing as the others we're looking for activity that we don't expect

and in this example you can see that we ran the um the atomic red team example for scrub j.dll and that identified it and the final example is the connection outbreak uh search that we have so we know how many we know um the average number of machines that you know a certain endpoint would connect to on a daily basis so what we're looking for is an abnormal number of network connections so this could be an outbreak of a virus it could be someone using you know wmi it could be anything we're just looking for some sort of abnormal behavior of remote connections from one single endpoint all right to wrap up in conclusion you

need an effective monitoring strategy to understand what normal iit operations look like maximize your detection capabilities and obviously to be able to perform thorough incident response also your auditors will like you if you execute an effective monitoring strategy because it makes their job easier so that's it thank you and we hope our presentation will encourage you to enhance your monitoring strategy [Applause] i think we're out of time but we're going to be hanging out all day if you want to come talk to us about any of this thank you