Logs in the Unix World

Logs in the Unix World - Servers Administration, Networking, & Virtualization

Users browsing this thread:

	venam Offline \| 16-02-2021, 04:21 PM \| #4

I wrote a transcript for this. It's a very old episode, not very deep and badly researched but still interesting.
Especially, that it's the subject of the day on IRC because members are working on related projects (freem, mort).

The transcript is clearer than the actual recording.

# Logs in the Unix World

What are logs and what do we do with them.

Logs are like the pieces of bread left on the road in the Hansel and
Gretle story. They could lead you to the witch or to save your brother.
Logs are used to record activities from your programs, and those can be
use for analysis, troubleshooting, debugging, checking for inappropriate
activities, etc..

What should we log, where should we store those logs, and how should we
store those logs? As in what format should we store them in.

Let's start with software based logs. Logs that are specific to a
single software. The developers took the leisure to do their own logging
system,to store the logs in a location of their choice, and in a format
they conceived.
We can't really discuss these types of logs as the subject is very broad
and the developers can do absolutely whatever they want. Print on screen,
or on files, format could be binary or textual, using a third party
software to access them, different levels of debugging with different
meanings.
Whatever goes. You're free to do whatever you want.

Let's discuss instead system based logs. They are more widely used and
more important than software based logs.
Unix is known for its flexible and powerful logging system. It enables
you to record almost anything and to manipulate it to retrieve the info
you require.

In a system based log you have a centralized system that gathers those
logs. It could mean that all the logs go to a single place and that they
are easy to search and analyze. However, as we'll see, it's not really
that accurate.
It's true that it's going to one direction, one system, but not that the
logs themselves, as files, will end up in one location. We still refer
to this as system based logs.
The typical location the logs end up in is `/var/log` but as we're going
to see later on it could be any other place. The problem, the hick with
that is that it is only useful if all the logs are in text format. Making
them searchable with simple text manipulation tools
When in one single place you can `grep` them use `awk`, `sed`, `cut`,
etc..

If they aren't in a single place it's going to be harder to do these things.

Having logs centralized in a binary format will make them useless for
the manipulation part. Multiple binary logs file in one location is
meaningless.

When the logs are centralized they can be kept on a remote disk as one
big chunk. Let's say a partition `/log` that is separated on another
device. Thus, it's easier to replicate, backup, and we can avoid the
loss of information.
This decoupling also allows to debug things if the main server goes down
as the logs will still be accessible. This is useful in critical situations.

In the Unix world the main implementation of the centralized log system
is called syslog. It provides general purpose logging. You send info to
syslog and it logs it.

There are two views of centralization. It could mean that a single system
takes all the messages and logs, or it could mean that all the logs file
are in the same location.

syslog is the de-facto standard, because everyone agrees to use it. There
are no specifications, just the basics RFC by the ietf, so different
implementations are known.
Like anything in the open source world, everyone comes up with their own
brand. Note that the internet engineer task force made the RFC for syslog.

rsyslog is the main implementation, a lightweight daemon installed on
most common unix-like distributions. A few systems that have it: fedora
(was using it switched to journald), open suse, debian(journald), ubuntu
(journald), redhat(journald), solaris, gentoo, archlinux (switched
to journald).
Most have deprecated rsyslog in favor of journald.

The second most common implementation of syslog is syslogng. In theory,
anything can act as a syslogging system. Any queuing daemon that takes a
message, queues it, and pipes it on the other end to a file or system,
could be used for logging. There are implementations using logstash
(with elasticsearch), fluentd, anything.

Programs send their log entry to the syslog daemon which then will consult
a configuration file `/etc/syslogd.conf`, `/etc/syslog`, or anything else
depending on the implementation you have. After consulting this file it
will check if the message sent matches something in the configuration,
if it does it will write the log to the corresponding file.
Thus, from that you can imagine syslog as a routing system for logs. It
queues the message looks for a key matching it and redirects it to the
right file.

How does it write the message to the file, distribute those
logs? It achieves it by using "tags", which are defined as rules
in the configuration file `/etc/syslogd.conf` that will contain the
appropriate details in it. Inside it you will find "terms", "priority",
and "selector". The term is the identifier describing the application
or process or anything that generated the log. Ex: passwd, ftp, kernel,
mail. The priority is the importance of the message which is graded by
levels defined in the RFC. Ex: NOTICE, WARNING, ERROR. The selection is
used to filter the logs, it splits them by file, a topic matcher along
with an action. The action that goes with the selector can anything,
like sending an email or push the log to another logging system.

How to send messages to syslog from the command line?
The logger command nudges the syslog daemon and, consequently, invokes the
creation of logs. We can use it to debug syslog and if the configurations
are set right.
The format is as follows: `logger -f <file> -p <priority> -t <tag> <message>`.
Example: `logger -p local0.notice -t host -f somefile`.

It's going to add an entry to the default file if we don't have any
rules for local0. local0 is the term and the priority is notice, they
are concatenated with a dot ".".

Where are they stored? We said they were centralized, however that
is ambiguous because that depends on the implementation and on the
configuration. Usually they go in /var/log but that's because the system
is by default configured that way. The routing rules in the config could
write the logs in any other place.
That's the hick, we said it's centralized but that doesn't mean the log
files are centralized in a common place. That's also useful if you want
to write to subdirectories within a centralized directory.

What happens when the directory syslog routes to doesn't exists? Then
usually the syslog daemon will create the directory, however it needs
the right permissions to do so. If it doesn't have the right access
permission syslog will log an error on itself. Introspectively, the
logging system has logs for itself.

Let's dig in the configuration. The format is flexible and all about
routing. For rsyslog we have `/etc/rsyslog` and `/etc/rsyslog.conf`
(depends on implementation). A typical syslog file look like a series of
line, and on each of them there is a representation of a message received
and how it will handle it. The format is divided into columns separated
by tabs.
On the complete left it has the term.priority, aka the global
directive. Additionally, there's a catchall message for a certain priority
(ex: `*.debug`).
On the complete right it has the location where it will push the log
file to. This field is also called the rule field as the location is
specified as an action. These actions can be either: device (file),
user, pipe, another syslog host (@).
In between these two, there are other less used options such as the
template and output channel. The template is the format in which the log
is going to be saved (kind of like printf with some built-in variables
such as hostname, time, etc..).

If you want extra flexibility points, you will be happy to hear that
some syslogd implementation listen on a socket too. So you can contact
them over the network via a socket, remote logging. That means you can
have a centralized machine which job is to log for all other machines.

Another concept in the log world is when the log files get too big and
have to handle them, what we refer to as log rotation. Logs grow very fast
and very big and consume a large amount of disk space. Many utilities will
come to your help such as newsylog and logrotate. Those tools are usually
called through a cronjob because you want to keep repeating the rotation
at a coherent interval. Tarballing the logs, and erasing old ones, etc..
This could obviously be implemented from scratch using `stat`.

Let's get to systemd, the fun... not so fun... replacement for syslog. It
has its own logging system called a journal. Therefore, logging a syslog
daemon is no longer required to read the logs. Now they are stored
in binary format and you need a special command called `journalctl`
to access them.
By default those logs are also stored in `/var/log` but inside a
subdirectory called journal. Unlike syslog it's not going to recreate
the directory if you erase it. systemd could instead, if configured in
a non-persistent way, store them in `/run/systemd/journal`, or if in a
persistent way it will recreate the directory.

journal stores the logs in a binary format, that is lighter, but you
can't read them using the usual tools. You are forced to use a third
party software to read them.
The journal configuration file allows you to make the files either
persistent or not. Taking a look in `/var/log/journal` you'll see files
with an md5sum name and there's a single file or rotated files for
all software. That means logrotate becomes meaningless as the system
manages itself based on the size limit set.

This is configured in `/etc/systemd/journald.conf` in it you'll find
configs related to compression, splitting, syncing interval (for when
it will actually write to disk), max use, maximum runtime, if forwarded
to syslog, max storage, level of priority, etc..
Forwarding them to syslog should save you if you want to run syslog
alongside systemd. By default this is enabled but if syslog isn't running
it goes back to the default behavior.

Targeting and monitoring logs. If in text format you can use text
manipulation tools to do that. Otherwise, you have to use the third
party software given to you.

For reporting, which is mostly missing in journald but not in syslog,
you can use `logwatch` to monitor system logs and email you in case there's
something weird happening. `awstats` can be used as a sort of apache web
server monitor logs. Text files are very flexible!

I hope you learned a thing or two about the logging system.

Messages In This Thread

Logs in the Unix World - by venam - 24-06-2016, 02:47 PM

RE: Logs in the Unix World - by josuah - 03-07-2016, 08:34 AM

RE: Logs in the Unix World - by venam - 16-08-2016, 05:55 AM

RE: Logs in the Unix World - by venam - 16-02-2021, 04:21 PM

RE: Logs in the Unix World - by freem - 17-02-2021, 09:38 AM

RE: Logs in the Unix World - by venam - 17-02-2021, 10:29 AM

RE: Logs in the Unix World - by jkl - 17-02-2021, 03:30 PM

View a Printable Version