Syslog-ng: Getting Started, Parsing Messages, Storing In Elasticsearch

Name: Syslog-ng: Getting Started, Parsing Messages, Storing In Elasticsearch
Uploaded: 2019-11-13
Duration: 1 h 47 min 36 s
Description: Peter Czanik introduces syslog-ng, an enhanced logging daemon for high-performance central log collection. The workshop covers fundamental logging concepts, configuration basics, filtering and parsing techniques, and integration with Elasticsearch, suitable for both beginners and experienced users.

BSides Luxembourg · 20191:47:368.3K viewsPublished 2019-11Watch on YouTube ↗

Speakers

Peter Czanik

Tags

CategoryTechnical

DifficultyIntro

StyleWorkshop

About this talk

Peter Czanik introduces syslog-ng, an enhanced logging daemon for high-performance central log collection. The workshop covers fundamental logging concepts, configuration basics, filtering and parsing techniques, and integration with Elasticsearch, suitable for both beginners and experienced users.

Show original YouTube description

The syslog-ng application is an enhanced logging daemon with a focus on portability and high-performance central log collection. It is used mainly by IT security professionals, but also in Ops and DevOps environments and by embedded developers.The syslog-ng workshop helps you take the first steps with syslog-ng, and shows how you can quickly get more information out of your logs and have greater insight into what happens on your network. Ideal for beginners, but covers advanced possibilities for seasoned syslog-ng users as well.

Show transcript [en]

So I'm coming from Hungary, I'm a Cisco Genji evangelist and do Cisco Genji packaging support and advocacy. So let me give you a quick overview. First in a few words I will talk about what you need for this workshop. Then I will give you an introduction to Cisco Genji and its four major roles. I will talk a bit about logging basics, then we will change to a bit more practical side and learn about configuring and testing Sysfog NG. We will go on to networking topics and then for the part which is more interesting from the security point of view, filters and parsers within Sysfog NG. And finally, if we have time, we will also see how to

send logs to Elasticsearch. So what you need if you want to follow my workshop is a laptop and I have a bunch of USB keys with the virtual machine image and also with my slides. I will hand it out and please copy it on your machine. and hand it on and at the end please give it back to me. I tested if this image works fine with VirtualBox and VMware as well. I use VMware, but it's up to you what you have. There are two users on it. Of course it has root and it also has a user called workshop and both have the password workshop. One note before diving into Cisco Genji is that this workshop was

originally 3 hours long. Now we have 1 and a half hour. I cut out many slides and tried to speak as fast as possible. Still I have some doubts if we can finish everything on time. So... What is logging? It's a recording of events. My favorite example is SSHD. You see a log message here, just like you could find in your Word log messages. So what is syslogng? It's a syslog implementation. In enhanced logging, they want to get a strong focus on portability and high-performance central log collection. It was originally developed in C, but it can be extended in Python or Java as well. So why Centrelogging is so important that I mentioned in the Cisco Genji

definition? First of all, it's ease of use. You have one place to check instead of many when you want to check your log messages. It's also availability, even if the sending machine is unreachable for some reason, you can check your log messages and even use it to figure out why the sender machine is down. And last but not least, it's also security. The first thing when a machine is compromised is removing the traces of the compromise. But if you have sent a login and your log messages are pushed in real time to another location, then even if the logs are removed later on, you have a sent a location where you can check what happened to your compromised machine. Syslog-ng has four major

roles: it can collect log messages, it can process messages, it can filter messages and finally store them somewhere, either locally or forward somewhere. Let's talk about data collection. SyslogNG can collect system and application logs together, which can provide quite useful contextual data for either side. There are wide variety of platform specific sources, like devlog, journal, science teams, As a central log collector, obviously it speaks all of the different syslog protocols, the legacy one, the new one, and understands its over UDP, TCP and then cryptic connections. And there are many other log sources, fives, sockets, pipes, even application output can be used as a log source. And there is also a new Jolly Joker available for probably only for a

year or less than a year. It's the Python source. So you can expand syslogng in Python and create an HTTP server receiving log messages like the HTTP event collector in Splunk. Or you can fetch messages from Amazon CloudWatch or Kafka or whatever you want. anything what is supported by Python. The next one is processing. You can classify, normalize and structure log messages with many different built-in parsers. You can divide messages. And you don't have to think about falsifying, but for example, if you are required to anonymize your log messages by compliance, then you can use rewriting for anonymization. You can also reformat log messages using templates, so if your destination needs a specific format, like it needs ISA date or

JSON formatting like Elasticsearch, then you can do that. And you can also enrich your log data, for example using GUI IP or create additional tools based on the message content. And here again, Python is a jolly joker, as it can do any of the above, and you can also use Python to enrich log messages from a database, or even do filtering from Python, which brings us to our next topic: filtering. It has two main uses: the better known is discarding surplus log messages, like Except for some special cases, you don't want to store debug-level messages, as that can fill up your storage quite quickly. And it's also message routing, meaning for example that making sure that anything related to authentication reaches

your SIEM system. There are many possibilities for filtering. It can be based on message content, different message parameters or macros. You can do comparisons, use regular expressions, many different filtering functions. And best of all, you can combine any of these using Boolean operators. Thank you. Finally, you have to store your log messages somewhere. Traditional log messages were stored locally in flat files or sent over the network using one of the Syslog protocols and stored at the central location into flat files. Over the years many different destinations were added to Syslog NG. First, different SQL databases, message queuing systems like AMQP, Storm, Kafka and different big data destinations like Hadoop, Elasticsearch, MongoDB. And here again we have some

Jolly Jokers like Python and Java. Actually some of the big data destinations are implemented in Java. Let's talk a few words about Logmas cities. If you take a look at the warlock messages, most of the logs have a pretty free form, but still structured format. It starts with a date, a hostname and some text at the end. This is yet another SSH login message. And you can see that the text part is practically a complete English sentence with some variable parts in it. Which is pretty easy to read by a human. It was originally intended to be read by humans. On the other hand, if you have many servers, lots of users, I wonder when you will

read log messages at 1000 messages per second. So you rather try to create an RxN report from your log messages, but that's quite difficult from these pre-formatted log messages, as our applications have different logs. There is a solution for this problem, it's called structured logging. In this case, instead of writing preformed text, events are represented as name value pairs. For example, my favorite SSH login event can be described with an application name, a user name, a source IP. The good news is that syslogng has name value pairs inside right from the beginning, so it's quite well prepared for this task. A great facility priority, everything else was represented as name value pairs within syslogng to make templates

possible. So we will talk about those later. And parsers within syslogng can turn unstructured and some of the structured data like CST or JSON into name value pairs as well. Why is it important? Because in this case you can easily create a filter or alert or whatever based on the content of the name value pair. So not just a whole message, an unstructured text, but a field part out from this text. For example alert if there is a SSH login by root.

So, before going on to more serious topics, a tricky question for you: which is the most used CISO Genji version? Project started 21 years ago, Red Hat FL, which is our most popular distro, has version 3.5, and the latest stable version, where I thought that we updated, it's 3.24. What do you think, which is the most popular version? Well, not most popular, most used version. Well, it's... You were getting quite close to it, it's 1.6. There are well over 100 million Kindle eBook readers and each and every of them are running Cisco Gendy for logging. I don't think there are... any servers, virtual machines, whatever, in this number, anywhere. So back to something more serious, it's configuring and testing syslogng. And here comes my initial

advice: don't panic. It's simple and logical, even if it doesn't look so at the first sight, or even the second one. Syslog NG configuration has a pipeline model, which means that there are many different building blocks, sources, destinations, filters, parsers, rewrites, and so on. And these building blocks are connected together using log statements into pipelines. Let's see how a source looks like. The generic definition is at the top of the screen. So it starts with a source statement, then an identifier, and then there comes a list of source drivers. At the bottom of the screen you can see a very simple example: a file source. So it's reading a file. And here is a more

complicated one. A single source in your configuration as well, but it has four different log sources listed. Internal sends for the internal messages of Syslog NG. There are two file sources here, and a Unix stream source for devlog. There are also flags that can influence how your source works. Here I list only the most important ones. Like "no parse", "disable". By default all of the incoming messages are parsed as syslog by syslogng. But for example if you receive a pure JSON message then it doesn't really work. So you can disable parsing with the no-part flag. And there are many more filters listed in the documentation. Source drivers. I already mentioned internal, Unix stream, Unix diagram, so the

traditional devlog and many other Socket sources, the system with the journal, so syslog entry can read log messages from the journal and read all of the different main value parts from the journal. It can read files, obviously network sources, either for the legacy or the new syslog protocol. And there is a common mistake when it comes to sources in syslogng, which is duplicating sources in the syslogng configuration. For example, entering the network source twice as you want to use it in two different scenarios. Don't duplicate sources in your configuration. Once you define the source, you can reuse it in your configuration as many times as you want. On the other hand, you cannot bind to the same source IP

and port twice.

There is a special source in Syslog NG, it's called the system source, which is important because it makes your life easier. There are many systems without system B anymore, but when there was a long long transition period and if you used the system source, then you could use the same configuration no matter what were your log sources as the system source automatically detects the platform specific log sources. So it can detect if you use devlog or journal and you can even use the same configuration cross platform as if you use the same configuration on a Sun system then it will use Sun streams So it hides some platform specific configuration from you, making your life easier. Next, the other

mandatory part of the configuration is destination, as you need to store your log messages somewhere.

Yet again, the general declaration is the destination keyword, then an identifier for the destination, and then you can list as many destination drivers as you want. It is a very simplified destination at the bottom of the screen, writing log message is "dubarlog syslog". Obviously there are a lot more different destination drivers, mentioned pipes, sockets, network using either the legacy or the new syslog protocol. You can even print log messages to a terminal, but that's not really recommended, as it's blocking and terminals are slow. So if you have tons of log messages, it can slow down your logging. Program destination makes it easy to write your own specialized destination. Sysfog NG can span an application and write

to its standard input. And I mentioned SQL, HEDU, and I already mentioned all the different big data destinations and so on. There are many other elements what can be used in the SyslogNG configuration, like options, which can modify how SyslogNG works. Many of these can be set globally and when you need to override you can at the specific destinations also you can override it there. Macros are the results of message parsing. So if you parse your syslog message, you can then the parse values are stored into macros like date, process ID or if you write your own message parsing then also we will see many examples later on. And templates, using templates you can define

your own message formatting or even a file name. You can also use it for file naming. There are also filters, we will talk about parsers and reviles as well. Finally, the heart of the configuration is the logpath, which connects all of these elements together for message routing. At the top of the screen you can see a very simple LOPE using just a single source and a destination, which is perfectly valid, but you can complicate it as much as you like it, listing many different sources, filters, destinations, parsers, whatever you want. Here is a very simple SyslogNG configuration. It starts always with a version number, like 3.19. You can include external configurations from syscogenji.com, and one is included by default, it's called scl.com.

It stands for Syscogenji Configuration Library, and we will learn about it on the next slide. Obviously you can comment your configuration file and here the options stands for the global options I mentioned on the previous slide, for example You can make sure that all of your log messages are written to list as soon as they arrive, but on a mail server where you have hundreds of messages arriving each second, you don't want to slow down your system to write all of the messages immediately. immediately, so you set a global option that you write everything immediately, but override it at the destination, later on that for the destination you want to write messages only when 100

messages are collected together. Next, you can see some typical building blocks of the Cisco Gen-G configuration: a source with a system source and internal source drivers, a destination, five warlog messages and a filter, which is typical for warlog messages. And at the bottom of the screen you can see the log statement. which connects all of these different building blocks into a pipeline. As you see in the log statement, a source, and here we refer to the name of the source, yet again a filter default, oops, oh here it is, and destination So from the block statement we refer to the name of the different building blocks. I mentioned already the Cisco Genji configuration library. It's a collection of configuration snippets.

They are accessible from your configuration just like any other Cisco Genji driver. Don't really see any difference. Some of these were turned into application adapters, which means from the CISO terminology it means automatic message parsing. We will use this in one of the sample configurations. There is a nice regular expression for credit card number matching regular expression spanning probably 5 lines. But if you are in an environment which is regulated by PCI DSS this can come quite handy. The Elasticsearch HTTP destination is also a configuration snippet in the Cisco Genji configuration library. There is a generic HTTP destination within Cisco Genji. And this is actually a configuration block which wraps around this Geronik HTTP destination driver. And

there are many more possibilities here, configurations for different login as a service providers like Logly and so on. So, question: Does anybody have the virtual machine image copied? Then you should also import it to your software. Soon we will try to use it as well. So, by default, once you installed syslogng, it starts in the background. So the first step we will do, we'll be stocking Cisco Genji. During the workshop we will start it from the command line and start it in the fourth round. So the virtual machine has CentOS 7, which means that it's using System B and System CTR. You can use that or you can also use the syslogng has a utility called syslogngctl. You can use that

to stop syslogng as well. It has many other features like querying statistics from syslogng or exchanging debug letters and so on. syslogng has many options. The few options we will use are listed here. You can use the "-s" for syntax checking. The "-F" stands for "4th round starting". With debug and variables you have many more output on screen and -f is used to choose an alternate configuration. We will use these during the workshop as we use the regular Cisco Genji configuration untouched and instead of that we have many configurations prepared. As I mentioned we will start syslogng in the fourth round. This way it's easier to see any configuration problems and also you can quickly stop it using ctrl+c once you want to use another

config. And we will use two different tools for testing. I guess everyone knows LogCard. It's a simple syslog tool, installed by default on all Linux systems, which can send a single syslog message. And syslogng comes bundled with LogCan. Originally it was used for benchmarking syslogng, so it can generate some patterns and and you can change what speed it's emitting messages and so on. It can also send messages from files and we will use that feature. So here comes our first practice session. We will test a very minimal configuration and also starting/stopping syslogng. Here is the configuration which we use. It's really very simple, just a version number, a source, a destination and a log statement which connects

the two together. Any questions until now? The root password is workshop and there is a workshop user also with workshop password. Originally I made this workshop in a way that people could write their own configurations, and on the other hand I put everything in a directory called /etc/syscode-ng/cheating for those who were typing more slowly and still wanted to run the test commands. I think right now we don't have time for typing. But if you change to this directory, you will see that there are quite a lot of different configurations. And on my slides you will see... Oh, not on this one. But most of the time you will see on the slides which configuration to use. For example in this case

it's minimal. And so let's make sure that syslogng is not coming. Can you see what's on my projector also? Okay. So let's stop syslogng. So as it starts by default in your virtual machine. And now start it from the command line. So first of all it's a -F, so it starts in the foreground, and then a small f and minimal.com. And here is a problem immediately, as it writes that the configuration format is old, and check if there are any incompatibilities with 3.21 which is installed here. So, just stop it with Ctrl+C and edit it. vi-minimal.conf and replace the version number from 3.19 to 3.21 and save it.

Now I started it and as you can see no warning messages were printed on screen. You can also add some debugging. The D and V, debug and verbose and you will see a lot more messages on screen as it prints out every module what is installed on the system. As these are loaded, the version number and everything. Now let's send some test messages. And you see... Here it is. And you can see that it's coming through the journal, that this is not a test. So I stopped it. And as you can see it's also in wardrobe messages. Okay. Everybody is with me still? Okay. Sorry. One moment. Let's talk a bit about networking and Cisco Genji. As I mentioned,

Cisco Genji supports both the legacy or DSD or RFC 3164 Cisco Genji protocol version and the new one. In my talk I will only talk about the legacy syslog protocol as practically everyone is still using it. If one of my colleagues do integration somewhere then they use the new one but I don't think that more than 1% of installations utilize the new Cisco protocol. So I stick to the old one here. And we will also talk about clients relay and how Cisco Country works, the client relay and the server. So how the legacy syscode message looks like. It has three major parts: the priority, some headers and the message. If you take a look at the logpies in your varlog directory, then you only

see the second and the third one. The priority itself is encoding into a number two things: facility, what kind of, where the message is originating from, there is a nice table, I don't have it here, describing what the different numbers are, mail, cron, authorization and whatever, so And there is also severity, which means that the importance of the message, if it's a debug level, if it's regular or warning or emergency. There are seven different levels, I don't know them by heart.

The headers are timestamp, hostname, process name, process ID. The latter two are not mandatory, but of course good to have. And finally the message bar fits are, which can be just about anything. So modes of operation. When it comes to networking, Cisco Genji can work in three different modes. One is client mode. And here client doesn't mean that it's a desktop machine, but your servers are clients as well if they are sending log messages away. So a client is collecting log messages from a machine and sending them to a remote server either directly or to a relay. The relay, the server is the central machine collecting log messages and storing them either locally or sending them off to somewhere like Elasticsearch or whatever.

And in between can be one or more relays, which don't store the log messages, but collect messages, probably even process them, and then forward them to yet another relay or to your central server. Why relays are important? Obviously if you have just a local network and nothing more, then you don't need them. But if you have a larger network, And you have for example network devices logging using UDP. You don't want to use logs, so you put a relay as close to your UDP sources as possible. Collect messages using Unity, but forwarding using TCP connections for more reliability. It's also scalability. If you do lots of message parsing, especially if you use Python for it, then you also need some processing power for

message processing. And you can distribute this processing to your relays and do only the storage on your central server. And it's also structuring your network, like you can put a relay on each side or to each department. And it's also security, as you want your log messages off from your client machines as soon as possible, as while messages are on clients, they can be still modified. but as soon as they left the client machines they are out of reach of an attacker. So if you If your clients cannot reach your central server, they can still send messages to your relays, and you can create message buffering on your relays, where messages can be accumulated as long

as your central server is unreachable. Next, we will use Locker to send a few test messages to a network source. We will use this command line to send a test message to a local server. -p is for connecting over PCP. The IP address is for the localhost, -n stands for hosting, where IP will use the localhost. Capital P is for the port number, and then there is the actual log message. And here is the configuration we will use, the netsource.conf. The beginning of the configuration is the same as we have seen previously, but There is at the bottom of the screen, there are three more lines. There is a TCP source, listening on port 514. There is a destination file called

"fromnet", where log messages are stored. And finally there is a log statement, which connects the TCP source and the file destination together. So let's do a bit of practice. Here are the steps we will do. So... Yeah, oh, syntax check. I forgot to show that last time. So here we did a syntax check and we have the same problem that the version number at the beginning is too old. Obviously for these simple configurations it's not a problem, but if you use some obscure features which changed along the way, then there might be some compatibility problems like how... The default message size was changed sometime, or the number of connections handled by default, and how different memory buffers are

calculated. But here it's not a problem. So let's start. So we started in Netsource here. And on the right hand side you can see the command line for sending a test message over the network. We sent it. And you can see the results. That this is a network test in the last line. Everyone is still with me? Okay. Then let's go on to a bit more interesting features of Cisco Genji. This is where you will work most of your time when it comes to configuring Cisco Genji. For your log messages it's filtering. As many applications send tremendous amount of data, But if you have a scene system or any other log analysis application, these are mostly licensed,

if they are commercial, they are mostly licensed on log volume. So you want to reduce the amount of log messages as much as possible, but not more. And it's also because you have a limited amount of storage, so it's not just licensing, it's also storage. And filtering is where you can drop log messages. and also to make sure that any important messages reach the right destinations. There are many different filtering functions and you can connect them using BOLIN operators and we will also see that It's a bit more advanced and not so straightforward at first, but if and else can actually make filtering a lot more easy. These are embedded into the log statement. So at first they look like

a bit of exception to the rules of the pipeline, but they can really really simplify. What was previously possible using multiple log statements and long long long configuration, any statement can shorten it quite a bit. So what are macros? They are variables defined by SyslogNG either by parsing log messages. So any messages arriving are parsed by default as Syslog message. So as you can see at the bottom of the screen there are facility, priority, date, and so on are all macros or name value pairs created by SyslogNG. Actually, date related macros have different versions. You can parse dates from log messages, which can be different from the date when the log message actually arrived. So it depends on your

intent, which value you use in your templates later on. Templates can be used to create new message formats, or you can also use templates in file names. Here is a very simple template. You define here a template, I don't recall anymore which Loginality application requires to have either date, which also has fractions of a second, not just round seconds, but fractions of a second included. So it's almost the same as a regular syslog in Word log messages, but slightly different date format. You can also use templates in filenames. For example, if you have a central syslogng server, then you can make sure that messages coming from different hosts are stored into different directories. So here there is a host macro in the So

each host stores into a different directory. Or in the second case the host metro is part of the "Phi" name. You can also use this for log rotation. And here is what I mentioned that you can use different date metros. The "R" in the name of the date metro means that it's the time when it was received. So received year, received month, host name, received day. This way you can store log messages based on the day and have a cron job which delays when the log messages are no more necessary. If you have a compliance requirement to keep for 3 years, then you can have a cron job which compresses log messages the next day and delete them a year or 3 years

later. The "createDir" here means that if the directory where you would store the log message doesn't exist, then it's automatically created. If you don't have this and you don't have the directory, then those messages are lost. I know one use case where it was used if you have a host-based logging. then you can make sure that for every host you want to log, you create a directory, and even if one of your colleagues points a high volume log server to your syslogng server, those messages are dropped as there is no place to store it. But I don't know any other use case for this. Filters, they are pretty... Declaring filters is pretty much similar to what we have seen previously on sources

destinations. The filter keyword, then we have a name for it and some kind of filter function declare. So here we have an example. Filter, F default. Oh, I didn't mention, and it's not mandatory, but many people use this naming convention, when they name a source, definition, whatever, that they use the first name of the feature, like here, for filter, it's F. For a source it's s, destination d, whatever in the naming, so they don't mess up the naming. If they forget that I have a default source and a default filter, then the names clash. And here you can see two filter functions, one for level and one for facility, and they are connected using Boolean operators. There are many different

filtering functions. For traditional syslog messages it's level or facility. Actually in the original syslog implementation this was the only filtering possibility. but you can filter on hostname, program name. The match filter is for matching a regular expression within the message bar for any name value pairs. You can call to a different filter as well. And I guess this Configuration should look familiar, as this is the very first example I'm showing you. So we have a source, destination, a filter and a log statement where we use this filter. So source, filter and destination. And the order is here important, as you can have multiple filters and you can also have parsers here. and they are executed in the order how

they are in the configuration. There is a special filter, which is very close to my heart, as this was my idea, coming after a security session somewhere else. It's filtering based on white and red listing. This filter compars a single field with a long list of values. So you can have a text file and on each line you can have a value. And the original use case was kind of poor man's SIM system. I mean there are many different IP address and hostname lists on the internet for spammers, for malware command and control, IP addresses and many different lists. And you can match against this list and create an alert if you see: "Well, this Windows machine

is communicating with a malware command and control, then probably I should investigate what's going on." But... More people use it for, for example, filter based on application names. So they have a filter which lists the application names, which are important for them. So anything related to the SAP system is locked to a separate or have a list of IP addresses for sales, another list of IP addresses for marketing, another list of IP addresses for development and log these separately. So there are many different uses for the NLIS filter. So let's go to a bit more advanced topic. It's if/else. It's practically embedding filters in the logpath, creating conditional expressions. It makes it a lot more easy to use the results of filtering, as

you don't have to use a separate logpath for two different results of a filter, but you can do it in the same configuration. for example to use different parsers on different log messages. Sorry, I need some help. So the beginning of this configuration should look similar, it should look familiar. The magic starts somewhere here, but if you If you are using Cisco GanttG for a while, you will learn that the actual most important part of the configuration is the log path. That's where everything is connected. So we are also reading the log path and we'll jump to the other parts of the configuration as necessary. So what we see here that we use the very

same s_sys source as the other logpath. So here we can see that you can reuse the different sources as many times as you want. So we use the same source and then we apply the sudo filter, which is this one. We name it sudo and it matches on the sudo application. And here comes a trick. As you see, we include the syslog-ng configuration library in our configuration. Which also means that we have a couple of parsers enabled by default. One of these is a parser for smudo log messages. So we have the next line here, the if statement. is actually filtering on the username extracted from sudo messages. In our case it's workshop. And the... we are... this filters... the

sudo parser creates name value pairs automatically with a naming that all name value pairs start with .sudo . and after that the actual name of the name value pair. So here we have the username who is running sudo and we are looking especially for workshop here. And here comes another weird stuff. Until now we had building blocks defined separately, but you can also embed those in the logpair. If you know that you don't reuse it anywhere else, then you can create it inside your logpath as well. So here we have an embedded file destination. So anything matching workshop is stored to sudo filter. And finally everything is written to sudo all. The file destination defined here.

You see that the name is sudo.json and we use a template function for writing messages. The format.json template function writes everything in JSON format. It's used also for example for Elasticsearch. And this is the name of the template function. Template functions are practically function calls where you can have some parameters and it will generate a message based on these parameters. So here we define use parameters, scope is for choosing which name value pairs we want to include in this log message and .name value pairs mean that it's not There is a tradition that any name value pairs generated by syslogng itself, I mean coming from the syslogng configuration library, are the name starting with a dot.

So that's why we have the dot name value pairs here as a scope, and the name value pairs is for the rest of the name value pairs. And scope RFC 5424 stands for the regular syslog fields. And we have a few formatting instructions here as well. It's newline. If you have a continuous stream of JSON, it's quite difficult to read by a human, even if it's easy for machines. So I put here double line feeds, so we have an empty line between two messages, so it's easier to find where the message starts and ends. We will see it working later on. Yeah. Practice is coming. And... Okay, from this one we will try the first one, the filter.conf, and the last

one, the iftest2.conf.

Okay. It's the filter.conf what we started on the left hand side. And what you can see here that debug level messages are discarded and anything coming from the main facility is discarded. So... we can see that our regular log message arrived here and then I have to cheat a bit -p stands for priority and the format here is facility and salary. facility.salary so we have here mail facility and then the log message. And if you check our log messages, we do not see here, it is discarded. Okay. And then let's go to our other example. For that, this is the sudo example we have seen on our previous slide. You can see that it is storing all

of the sudo related messages to /var/log/sudo.json and any log messages coming from sudo which are coming from the workshop user are stored to /var/log/sudo/healtar. So what I'm doing here is running sudo as root. Doesn't make much sense, but good enough for a test. As you can see it's not here. It's from a previous run of the workshop user, but it's here. So you can see it's a nice JSON And also the command I had just run sudo ls. So, let's cheat a bit. And we can see here that the sudo message coming from the workshop user is saved here. this if statement in our logpath actually works. Any questions until now? For example the journal

Oh you cannot see the dots here as... Oh I don't... I'm not really good at creating JSON. Yeah. If I remember well... No. But I don't have it installed. So...

Check it later. Ok? So, let's go to message parsing. What it is? It's structuring, classifying, normalizing log messages. Syslogng has a technology called buttonDB. It parses unstructured log messages. And we have parsers for JSON, XML, CSV, and a few other structured message formats. Why message parsing is important? It allows much more precise filtering and thus alerting, and it also allows saving only relevant data. And it's not just filtering, Last week I was in a conference in Phoenix, Arizona and one of our Cisco Genji users explained the use case that even my Joe drops the floor that they are investigated what their scene system is actually using from the log messages they are pushing there. And then what they

did, that they figured out that they have probably 20-30 different fields passed from log messages, but the scene is using only 4 of those for any kind of analysis. which meant that they configured syslogange in a way that it's parsing the incoming message. It drops everything except for the four fields needed by the scene. It meant for them that 90% of the log message would be thrown away, at least from the scene point of view. They stored it locally for compliance, but to the scene, to the other side of the USA, they only had to forward one tenth of the original log message, which meant a huge drop in licensing, and a huge drop in networking traffic, a huge

drop in storage. Of course it needs quite a bit of investigation, as you actually need to know what is used for log analysis, but it's worth the trouble, as far as I could understand it meant multiple terabytes per day as savings. So it's quite useful. And of course it doesn't work everywhere, and it's an extreme case, but you can save quite a lot with filtering, with parsing and reformatting as well. So let's talk about parsers. One of the oldest parsers in Cisco GEN-G is the partnpp parser, which can extract information from unstructured messages into name-value pairs. It can also add status fields based on the message text, and do message classification like log check in Debian, but there is

a drawback. A pattern DB needs an XML database describing your log messages. So it cannot just look at it, take a wild guess what it is and say something. But you actually need a database describing your log messages. On the other hand, when you have this database, and we have a few examples on GitHub, how to create one, it can come quite handy. Coming back to my favorite SSH login message example, the parsed values from the log message are the application name, the user name, the source IP. As we have the log message in the XML database, We can create some additional name variables based on the log message, that it's a login message. And it's a login failure

actually. And we can classify it at the end as a violation. JSON parser. Anything in the message part, of a log message is treated as plain text, or string, or whatever. So it's not parsed from the syslog ng point of view, it's a black box. you need to use a parser on the log message in order to have name value pairs created. So even if you look at the JSON message and you see that it includes name value pairs, as long as you don't apply a parser to it within Cisco Genji, it's just plain text from the Cisco Genji point of view. And once you parse it, You can use the resulting name-value pairs for filtering, for

selfie storage, or whatever you want.

It works in some cases. Personally, I don't have configuration here and I don't know how to do it. I know that I've seen it working. It's a bit longer and more difficult configuration. It should work. But I don't have it here. There is a CSV parser and with not just comma separated values but any kind of columnar data. For example one of the typical use cases was Apache Access Log. Here you can see the field names for Apache Access Log: client, ident, name, username, timestamp and so on. everything parsed from the log message. And here you can see that in the destination we use in the file name template the username parsed from the log message. So if there is a username included

then it's called "varlogmessages" username or if there is no username included then it's called messages no user another parser which is very popular is the key value parser as most of the different firebores store log messages as in a key value format like this log messages

And there are some further parsers like an XML parser which can be used. I think the original use case was some logs from a Java application. But I might be wrong. I never used it. There is a parser for Linux audit logs. Or also a date parser where you can have template... There are many different date formats and you can have a template which describes the given date format in your log message and parse it. And the parser stores the date to the sender date macros. There are also many different parsers in the syslogng configuration library. I already mentioned sudo. There is a dedicated parser for Apache Access Logs, which combines the CSE parser and the Date

parser together. Also, Cisco log messages are quite, how to say, interesting as they look like syslog messages but they are not complying to anything related to syslog so what we tried to do that with some heuristics parse these syslog-looking messages from sysco and those as if they were regular Cisco messages. And also create some additional fields if there is, what is it called, Cisco Nanomix or whatever, those can also be parsed automatically. Obviously not all of the Cisco devices are covered, as practically each and every of them have a different date format. log format. And I already mentioned that Python is a jory-joker here, because you can parse quite complex data formats using Python code, or even use external

data sources like SQL or Huize or whatever to enrich your log messages. Obviously, writing anything, using anything written in Python is slower than C, but it has many advantages as it doesn't need any compilation, it doesn't need a development environment, and there are a lot more libraries available ready to use than for C. And it's quicker than everything. some performance variance. On a machine which can process about half a million messages per second without any encryption or advanced message parsing or whatever, the Python parser can handle about one tenth of that. To put it into a perspective, most of the Cisco Genji users are collecting less than 100 messages per second. So even Raspberry Pi

can handle that amount. I already mentioned application adapters. It's for parsing log messages automatically. Like I mentioned, Cisco sudo, there are a few more parts that are sponsored with syspcng and every other release has something added for this. It's enabled by default since 3.13. and there is a very nice name enterprise-wide message model, but in practice what it means that you can forward name value pairs between syslogng instances in json formatting. This is good as you can preserve the original log message and the false name value pairs, everything Obviously as it's message parsing it has some performance limits, but it works. And this configuration should look already familiar for you. The magic is here in the

second line: include scl.conf. Cisco Genji configuration library includes these parsers I mentioned on the previous slide. So that's why in the filtering example we could use the results of a sudo parser even without having a sudo parser here in our configuration. Let's jump to "Enriching Log Messages". Sorry, I need this as my flow is... What "Enriching" means? It means that you can create additional name value pairs based on the message content. I already mentioned "PatternDB" for this. But we have two more technologies: the GIP parser and the contextual data. We will see what these are. Can you recall part 10 DB? There I mentioned that you can create additional status fields based on the message, like if restore SSH

login failure into our external database, then we can put it next to it that it's a login related message and also that it's a failure. So button BB is one of the possibilities for enriching log messages. JOOIP is another one, which JOOIP can help you to find the geolocation of an IP address. It's using external databases for that. Note that the component of Cisco Gengis is still called GUI-T, but actually library which is called geo-IP is phased out, the database is no more maintained, as far as it is even probably removed from the web, and It's called now MaxMindDB, which contains a lot more information than just geolocation. I mean, longitude, latitude. It also contains

city name, if it's there, or country name, continent name, whatever. So, really a lot more information. What you can use it for, for example, detect anomalies, if one of your users logs in from Brussels and 5 minutes later from St. Petersburg, from Russia, then probably you have a problem. But it also can be used for some eye candy, like silver guys love maps, so even if it has no practical use at all, you can show that you have attacks coming from China, from Russia, from the US. from Brazil. I remember one of the malware outbreaks coming from there and it was looking fantastic on my map. The other possibility is adding metadata from CSV files. For example a host role or a

contact person which is useful you can create a lot more accurate alerts and dashboards using this information. So you see that a log message is coming from marketing or from developers and you can alert the responsible sysadmin based on this information or there are many other possibilities. Next, we will use LogGAN. That's the benchmarking and message sending tool bundled with Cisco GEN-G. It can generate log messages or post the content of an existing log file. Here is a nice example. This is the one we will use. I prepared some sample data for you. and we will post it first to our file and it doesn't look completely hopeless to post it to our search at

the end. So here are the options we will use. -i stands for Internet, -S for TCP connections. We use -b, don't pairs. I removed the message headers from the log messages and used the "don't pass" option, otherwise the dates would be fixed to something ancient and it would be quite difficult to find these messages in Elasticsearch. So instead of that I used "don't pass" and used the current date and finally where the logs are coming from and where we want to send it. And here are the example logs I used. It's coming from IT tables and for the logs we send I removed the header file. And here is the configuration we will use. Let's go.

As usual we start looking at the log statement. We see that there is a TCP source. Here it is, listening on port 514. Then we have a key value parser, that's up here. If you take a look, you see that IP tables logs are in a key value format. And for any value parsed from the IP tables logs we use a KV prefix. So we don't interfere with the rest of the name value pairs. Then the next one is a GUI IP parser. And you see here source IP. So we parse it with the key value parser and we use the kv path traffic, so it's stored in kv.src. For gip we use this field, this database and traffic the

result with gip to

And finally we store the results into a file. And here we use the JSON template function again. The difference here is that we use a feature called "repeat" on anything starting with a dot and remove the dot. We do this because when we... Elasticsearch doesn't like linear if we are starting with a dot, so what we do is replace it with an underline. On the other hand, this has a special meaning within Elasticsearch. So if it's not indexed, cannot be searched and similar covers. So what we do here is remove the leading dot. So anything we store here can be later stored to Elasticsearch. And we also do some magic here and create a timestamp as required by Elasticsearch. So we

will use almost the very same template in Elasticsearch, the only difference will be that the two line pins at the end will be removed. But it's almost the same configuration as we will use for Elasticsearch. But here we still store to a file. Here I guess we better skip the first two configurations and jump directly to to the last one but first ok sorry ok and then and here is the command line for what I've shown you previously it's logcam so the tool bundled with Swift4NG and it sends the unpart log messages to syslog-ng to port 514. That was all. And now if we take a look at the WordLog fromnet, as you can see here,

Then we will see that here is a nice long JSON based log message. First here we have the name value parser from the p-value parser. So we have kb and then after the name that you passed first from the IT tables message and here we have the GIT part up until here. So here you can see United States, even the time zone, ISO code for the country and the actual geolocation. And then the regular syslog related fields and even the original message. Yes? Oops, I ran out of time. Oh, I didn't hold the... Oh, sorry. No, no, no, no, no, the only thing is missing here is the actual Elasticsearch stuff. But then let's go there quickly. I

knew that I will run out. Originally we implemented Elasticsearch in Java, but it has the problem that it cannot be included in distributions, so there is no HTTP destination in SyslogNG based on curl, and we use that for Elasticsearch. There is an Elasticsearch HTTP destination, The Java-based destination is still better if you have an extreme load. Otherwise the HTTP destination is much better. Here is how you configure it: index name, type, URL, And you recall the template from the previous example, it's almost exactly the same except for the U-lines at the end. For GUI-T, here is an extra thing we do. Elasticsearch doesn't like if we have a part of the JIT information is missing, so

we make sure that we write JIT information only if we have it. To get JIT working on the Elasticsearch part, we need mapping. And... Okay, let's try it. If it doesn't work in two minutes, then... Okay. We sent it there and I don't even have... You see the docs I sent here and here are all the 3D IP informations and with a minimal lag... Oh, not even here, just... Oops, oh, I don't have internet connection, as I didn't want to have any kind of pop-ups during my talk. But you should see a world map in the background, and here are the points from the view IP. And that was all.

Syslog-ng: Getting Started, Parsing Messages, Storing In Elasticsearch

Related talks