
hello thank you thank you for joining uh our lecture for today is uh Google workspace forensics insights from Real World hunts and instant response my name is donon Cari and on my left is my friend Ariel sharf we are both a senior Cloud researchers at mitiga and also we have some something in in common and today we are going to talk about Google workspace forensics we're going to share a little bit about what is Google workspace a little bit about the log structure and some challenges that we found in in the logs while performing forensic investigations we are going also to talk about what we did in order to overcome those challenges and share some uh real cases
that we found during uh instant response and threat hunts at the end we are going to show a particular visibility Gap that we found in Google Drive so let's start so first of all what is Google workspace as uh you may know Google workspace is a cloud-based uh collection of tools that were designed to make collaboration between individuals and organization much easier it included uh it includes many services such as Google Drive Gmail Google Calendar Google keep and more what you have to know for this lecture is that this is a very very popular uh uh platform there are more six million pay paying businesses all over the world which makes it a high Target for threat actors to uh exploit
and steal data a little bit about the logs before we dive into into them so first of all they are divided by the service once you enable Google uh workspace logs they are divided by the services that you have enabled they are collected in near real time and the typical retention period is 6 months with some exceptions today are going to be focused specifically on Google Drive but everything that we are going to say apply to all the logs in Google in Google workspace but before we start we would like to share with you a story cool hi before D on dives into the log structure I want to share with you a story one of our customers saw that
internal data was published publicly and want us to investigate it so we got to work as part of our investigations investigation we ran basic anomaly detections and found suspicious activity we saw an external user from gmail.com that performed approximately 15,000,000 download events at the same time stamp it's a lot and it was really interesting and you had a lot a lot of questions about that for example what are the file paths who created them who shared the files externally in order to answer these questions and more now we're going to learn about Google workspace log structure with the one so let's talk about the log structure on your left hand side you can see you can see a typical log record
from Google Drive we can see many pieces of information that are relevant for forensic investigation for example we can see the caller which is the email which is the entity that performed the action we can see we can see the IP address H we can see the application name and more what we also can see is a list of dictionaries which called events which we expanded here on the right hand side for example we can see an event of type upload and its parameters the parameters is another list of dictionaries that represent the parameters of each call so what we can see here is actually two list of dictionaries for each log entry this is
quite quite challenging as you may agree but other the other than the log structure we found other challenges in Google uh drive and Google workspace logs that we are going to show you right now for example the first thing user agent field is missing there is no user agent across all the logs of Google workspace we can see many types of information again such as the IP address the T the event name and more but there is no user agent and it might might be challenging when you want to perform some anomaly detection without having the user agent another another thing that we found is IP addresses inconsistencies here we can see three download events
coming from the same user which is blurred but this is the same user from Google Drive all in the very same second but from three different IP addresses this could be misleading the investigator while performing performing the investigation another thing that we found is that there are no path in file related uh log entries for example in download event you cannot know from where this uh file was downloaded you can you can see some information about the document such as the document title the document type the document name and ID but you cannot know the path so we can all agree right now that this is quite challenging but what you can do in order to make it much easier
to read to investigate and to be ready for an attack so let's talk again about the log format and specifically about the events what we can see here is a list of dictionaries that represent the events of each entry and why there is a list of di a list of events this is because the way Google tied different events is the following there is one event that will be marked as a p the primary event and this is actually the action that was taken by the user for example in this case this is upload but following this event this event triggers other actions in the in the background that are all all related this is uh quite difficult to
understand so what we do in order to make it easier to investigate we split each sub event in the events field into a dedic dedicated row so here we can see three different events that originally were uh the part of the same chain and we splitted them uh into different rows for to think about the case that you would like to search search for all the files that were publicly at some point in your organization you you don't care if it was created as public or was private and then uh moved to be to be public uh you you would just would like to know that it was public at some point with this technique you just need to search for
the event name that represent Public Access and you will know all the files that were public another thing that we would like to highlight is the talking about the parameters the parameters represent the parameters of the call that was taken for example here again we see the uplo event and we can see a list of dictionaries that represent the parameters of the call for example we can see that this is this is a primary event we can see the document ID for example and we can see whether this file is encrypted or not what else we can see is that each uh dictionary has two types of keys first one is the name the name of the
parameter and four other keys that represent the type of the of of this key of this par of this parameter for example the first one primary primary event is buan uh the second one document ID is tring because the value is populated and the third one is also Boolean in this case from our our research we found out that whenever everything is null it means Boolean set to false so again this is quite challenging you would like to investigate something you would like you would like to be fast you would you would need to understand the logs uh right away so what we do we restructure the data format we actually omit all the type related uh keys and we leave only
the parameter name and its value this is much easier to read much straightforward and the investigation can be much quicker the third third thing that we do is we enrich the data what do what do I mean under the parameter in some some occasions you may see the origina in upid not sure if you can see it here origin in up ID the F parameter and this ID represent the application that took an action on behalf of the user so in the log you will see that the email address or the actor that took the action is someone in the organization but actually this action was taken by by an application and sometimes this is important to
understand during an investigation for example in this case this is the slack application remember earlier that we talk about IP address inconsistencies this could be one of the reasons why we see those inconsistencies because sometimes the IP address would be the IP address of the hosting provider of the application but the actor will be the user and this mismatch could be uh confusing so understanding this is coming from an application could help the investigation now back to Ariel to tell us more about the exfiltration case thanks D now we are going to talk about data exfiltration from Google Drive let's start from the basics there are six event names that may be related to data
exfiltration in Google Drive the most obvious of course is download thre actor also can View files they can send them in email as attachment they can print them you'll note that they not need to physically print them in order to exfiltrate huge amount of data they can print print them to BDF FES of course they can preview them and the least intuitive they can copy them to more convenient location for example to a public folder when you suspect a user you can search which exceleration related events the this user performed and also in threat when you want to generate leads you can search for anomalies in these events appearances for example this is an anomalist graph
based on these events each line is a user and as you can can see we can see how many exfiltration related events each user performed over time for example the green user here performed on February 27th approximately 20K of EX exfiltration related events and it might be really interesting to investigate it now let's talk about sharing files in Google Drive we're going to talk just about about sharing files from share drive and and not from the private drive so when you share a file or folder in Google Drive this window pops up in this window under the general exess section you can choose a group anyone with a link your organization or restricted restricted means just user users that
you explicitly mention in the upper section get permissions to this object and also you can choose exess scope viewer commenter or editor in our example we changed the the group from our organization to anyone with a link this click actually generated four events two change document visibility events and two change document access cope events in this table also you can see three parameters we extracted from the parameters column Target domain old value and new volume the first look it may be really confusing but when you are looking back you can see here a pattern in the first couple of events the start State people within domain with link and can view exoscope changed to the clean
State private after that in the next couple of event the clean State changed to the end State people with link and can view exoscope you'll note that even though we didn't change the exoscope it still go through the clean State To None now let's talk about how share file or folder with concrete principal looks like in the log when you share a file with concrete principal it's straightforward there is one event that's called change user access and the actor of this event is the user that actually TR perform that easy but when you share a folder with concrete principal something interesting happened for the main folder there is one event that's called changes access and the
actor of this event is the user that actually perform that but after that for each file and folder recursively under the main folder there is special event that's called change user access hierarchy reconciled and the actor of these events is system and you'll note that all of these events is are primary events and not part of a chain of events like the described earlier now back to our our story from the beginning just a reminder we saw an external user from gmail.com that performed approximately 15,000 of download events at the same time stamp and it's a lot one of one of the questions we asked ourselves was what are the file paths the straightforward solution of
course is using API but there are two problems with that first using API H to use API you need proper permissions and when you are an external investigator you don't always have them and always and uh I'm sorry and uh second uh when you when you use API calls you get the the current state of the organization and when you investigate you want the historical state so we try to think what what we can do in know order to get the paths based on the loger Cordes only in our research we saw that for each file or folder creation there is a create event of course but also there is a to folder event in in this event
parameters there are the document ID title and also there are the destination folder ID and title based on these events at to folder events we built this table in this table you can see the document ID the destination folder title and the destination folder ID now if you think about that if all the destination folders you have also the at to folder events you can try to search these IDs the destination folder IDs in the left column the doc ID and try to build the paths recursively so that's what we actually did this table is from our lab don't worry here you can see the event names the document title and the calculated document path we build in this school
technique you'll note that in this technique the paths might be partial of course depending on the L time frame if you don't have the relevant at to folder events you can do that back to the story research just to to to to close the story we search in the log uh which user sh shared externally the files we saw this user was an admin user long story short this user was compromised by fishing attack and after we understood that we investigated the logs in the relevant time frame finally I want to share with you a visibility Gap we found in Google workspace logs at two months ago when we investigate when we investigate we assume consistency in the logs what do I
mean all of us already know that there is an event about download file so every time a user download file there is a log record about that right so it's not as simple as that let's talk a little bit about licenses in Google worldspace each user has the free license Cloud identity free and this license enables basic features in addition an admin can purchase other licenses in order to enable more features in this example you can see that in this in in this organization there is a paid license that's called Google workspace Enterprise Plus but this license isn't igned to this user in our research we found that if a user doesn't have any paid license there
are no log records on their private drive at all not about download files copy files create files and so on it's crazy to think about just with a free license there are no log records on their private Drive In in organizational Google workspace based on this finding we to think how theor can exfiltrate not just the private drive but also the share drive with minimum log records now we want to show with you a use case how the actor can perform something like that in this use case the compromised user is an admin user because an admin user has the permissions to revoke and assign licenses so in this use case the threat actor can revoke the P license to the
compromise user copy all the files from the share drive to the private Drive download all the files from the private drive and finally reassign the paid license to the compromised user to be discret as possible now let's talk about the logs for the revok and we assign there are relevant log records under admin audit log user license revoke and user license assignment for the copy files actually it's interesting in general for each copy file in Google Drive there are two log records Source copy on the original file and copy on the destination file these events are almost the same so usually it's not interesting to monitor both but in our special case there are no copy events at all because
there are no log records on the private drive so they are just Source copy events and for the download of the files there are no longer cods at all based on This research we understood that in our investigation we should search also for licensed revoke and assign in a short time and also we should search for Source copy events without related copy events no one to you so let's talk about what what we had today in this talk so we talk a little bit about what is Google workspace how the logs structured and what are the challenge in in those logs we talked we talked about the challenges in the structure itself but also about some
pieces of of information uh that aren't present in in the logs for example the user agent the inconsistencies of the IP H the file path and more ariiel shared shared with you a real cool use case of data data exfiltration and the visibility Gap that we found in wubble drive but now you might ask yourself what now what what do I need to know to do now so first of all we think that the first thing that you would you would need to you would need to do is to know the logs to understand the limitation of Google workspace logs to understand what it gives you what it doesn't give you and to be able to uh to
know it before an attack before before you need to actually perform an investigation in one week time we recommend you to start and facilitate the Google workspace log readability exactly what like we showed you to uh split uh the rows to split the events into different rows to flatten the parameters and to make the logs ready for an inst investigation and once it's ready in one month's time we recommend you to start proactively monitoring for data exfiltration cases from your organization to understand if someone somehow uh was able to exil ex exfiltrate data out of the organization this was our talk on the left QR code you can see a link to our blog where you can find more more
information and the right QR code is for our advisory that we shared with Google thank you very [Applause] much there any questions there
I think there the
mic check check oh perfect uh well thank you for the presentation that was great uh I'm curious so I've done uh i' I've set up uh event logging where I could have a notification or an alert um if stuff happens in gcp uh in like their logging they have a logging product where they can actually show you know kind of more or less the same thing but for you know different gcp applications do you know if if uh Google workspaces and uh gcp share the same resources for the backend where I can actually query uh those logs from the logging platform on gcp they usually Google Shares a lot of resources on the back end they share a lot of uh platforms I'm
not completely sure I think that you would need to store them in some other solution to be able to query it cost query those resources um but we need to check that I'm not completely sure
yeah