Logs in the Unix World

Logs in the Unix World - Servers Administration, Networking, & Virtualization

Users browsing this thread: 1 Guest(s)

	venam Offline \| 24-06-2016, 02:47 PM \| #1

(This is part of the podcast discussion extension)

Logs

Link of the recording [ https://raw.githubusercontent.com/nixers...-06-24.mp3 ]

What are logs, where are they stored?

--( Show Notes )
https://en.wikipedia.org/wiki/Syslog
https://en.wikipedia.org/wiki/Rsyslog
https://en.wikipedia.org/wiki/Log_rotation
https://en.wikipedia.org/wiki/Journald
https://launchpad.net/logwatch
http://www.awstats.org/
https://en.wikipedia.org/wiki/Apache_Kafka
http://unix.stackexchange.com/questions/...g-in-linux

	josuah Offline \| 03-07-2016, 08:34 AM \| #2

I had everything to learn but it was well explained: A great introduction.

More flexibility let the configuration make logs messy, or at contrary, very clean and easy to browse... Up to the sysadmin (?). With systemd, maybe there is less risk to get messy, but all the fun (text files) goes away as well ("knock, knock, knock!").

Very neat: put the logs in a different machine and still access them even if the server crash (one reason I imagine logs can be useful).

Last thing: I wouldn't imagine logs to be this big!

	venam Offline \| 16-08-2016, 05:55 AM \| #3

I realized I forgot to mention in the podcast that syslog listens on logs from /dev/log.
I also didn't talk about the kernel logs and boot logs.

If someone wants to add to those subjects it would make the subject complete.

	venam Offline \| 16-02-2021, 04:21 PM \| #4

I wrote a transcript for this. It's a very old episode, not very deep and badly researched but still interesting.
Especially, that it's the subject of the day on IRC because members are working on related projects (freem, mort).

The transcript is clearer than the actual recording.

# Logs in the Unix World

What are logs and what do we do with them.

Logs are like the pieces of bread left on the road in the Hansel and
Gretle story. They could lead you to the witch or to save your brother.
Logs are used to record activities from your programs, and those can be
use for analysis, troubleshooting, debugging, checking for inappropriate
activities, etc..

What should we log, where should we store those logs, and how should we
store those logs? As in what format should we store them in.

Let's start with software based logs. Logs that are specific to a
single software. The developers took the leisure to do their own logging
system,to store the logs in a location of their choice, and in a format
they conceived.
We can't really discuss these types of logs as the subject is very broad
and the developers can do absolutely whatever they want. Print on screen,
or on files, format could be binary or textual, using a third party
software to access them, different levels of debugging with different
meanings.
Whatever goes. You're free to do whatever you want.

Let's discuss instead system based logs. They are more widely used and
more important than software based logs.
Unix is known for its flexible and powerful logging system. It enables
you to record almost anything and to manipulate it to retrieve the info
you require.

In a system based log you have a centralized system that gathers those
logs. It could mean that all the logs go to a single place and that they
are easy to search and analyze. However, as we'll see, it's not really
that accurate.
It's true that it's going to one direction, one system, but not that the
logs themselves, as files, will end up in one location. We still refer
to this as system based logs.
The typical location the logs end up in is `/var/log` but as we're going
to see later on it could be any other place. The problem, the hick with
that is that it is only useful if all the logs are in text format. Making
them searchable with simple text manipulation tools
When in one single place you can `grep` them use `awk`, `sed`, `cut`,
etc..

If they aren't in a single place it's going to be harder to do these things.

Having logs centralized in a binary format will make them useless for
the manipulation part. Multiple binary logs file in one location is
meaningless.

When the logs are centralized they can be kept on a remote disk as one
big chunk. Let's say a partition `/log` that is separated on another
device. Thus, it's easier to replicate, backup, and we can avoid the
loss of information.
This decoupling also allows to debug things if the main server goes down
as the logs will still be accessible. This is useful in critical situations.

In the Unix world the main implementation of the centralized log system
is called syslog. It provides general purpose logging. You send info to
syslog and it logs it.

There are two views of centralization. It could mean that a single system
takes all the messages and logs, or it could mean that all the logs file
are in the same location.

syslog is the de-facto standard, because everyone agrees to use it. There
are no specifications, just the basics RFC by the ietf, so different
implementations are known.
Like anything in the open source world, everyone comes up with their own
brand. Note that the internet engineer task force made the RFC for syslog.

rsyslog is the main implementation, a lightweight daemon installed on
most common unix-like distributions. A few systems that have it: fedora
(was using it switched to journald), open suse, debian(journald), ubuntu
(journald), redhat(journald), solaris, gentoo, archlinux (switched
to journald).
Most have deprecated rsyslog in favor of journald.

The second most common implementation of syslog is syslogng. In theory,
anything can act as a syslogging system. Any queuing daemon that takes a
message, queues it, and pipes it on the other end to a file or system,
could be used for logging. There are implementations using logstash
(with elasticsearch), fluentd, anything.

Programs send their log entry to the syslog daemon which then will consult
a configuration file `/etc/syslogd.conf`, `/etc/syslog`, or anything else
depending on the implementation you have. After consulting this file it
will check if the message sent matches something in the configuration,
if it does it will write the log to the corresponding file.
Thus, from that you can imagine syslog as a routing system for logs. It
queues the message looks for a key matching it and redirects it to the
right file.

How does it write the message to the file, distribute those
logs? It achieves it by using "tags", which are defined as rules
in the configuration file `/etc/syslogd.conf` that will contain the
appropriate details in it. Inside it you will find "terms", "priority",
and "selector". The term is the identifier describing the application
or process or anything that generated the log. Ex: passwd, ftp, kernel,
mail. The priority is the importance of the message which is graded by
levels defined in the RFC. Ex: NOTICE, WARNING, ERROR. The selection is
used to filter the logs, it splits them by file, a topic matcher along
with an action. The action that goes with the selector can anything,
like sending an email or push the log to another logging system.

How to send messages to syslog from the command line?
The logger command nudges the syslog daemon and, consequently, invokes the
creation of logs. We can use it to debug syslog and if the configurations
are set right.
The format is as follows: `logger -f <file> -p <priority> -t <tag> <message>`.
Example: `logger -p local0.notice -t host -f somefile`.

It's going to add an entry to the default file if we don't have any
rules for local0. local0 is the term and the priority is notice, they
are concatenated with a dot ".".

Where are they stored? We said they were centralized, however that
is ambiguous because that depends on the implementation and on the
configuration. Usually they go in /var/log but that's because the system
is by default configured that way. The routing rules in the config could
write the logs in any other place.
That's the hick, we said it's centralized but that doesn't mean the log
files are centralized in a common place. That's also useful if you want
to write to subdirectories within a centralized directory.

What happens when the directory syslog routes to doesn't exists? Then
usually the syslog daemon will create the directory, however it needs
the right permissions to do so. If it doesn't have the right access
permission syslog will log an error on itself. Introspectively, the
logging system has logs for itself.

Let's dig in the configuration. The format is flexible and all about
routing. For rsyslog we have `/etc/rsyslog` and `/etc/rsyslog.conf`
(depends on implementation). A typical syslog file look like a series of
line, and on each of them there is a representation of a message received
and how it will handle it. The format is divided into columns separated
by tabs.
On the complete left it has the term.priority, aka the global
directive. Additionally, there's a catchall message for a certain priority
(ex: `*.debug`).
On the complete right it has the location where it will push the log
file to. This field is also called the rule field as the location is
specified as an action. These actions can be either: device (file),
user, pipe, another syslog host (@).
In between these two, there are other less used options such as the
template and output channel. The template is the format in which the log
is going to be saved (kind of like printf with some built-in variables
such as hostname, time, etc..).

If you want extra flexibility points, you will be happy to hear that
some syslogd implementation listen on a socket too. So you can contact
them over the network via a socket, remote logging. That means you can
have a centralized machine which job is to log for all other machines.

Another concept in the log world is when the log files get too big and
have to handle them, what we refer to as log rotation. Logs grow very fast
and very big and consume a large amount of disk space. Many utilities will
come to your help such as newsylog and logrotate. Those tools are usually
called through a cronjob because you want to keep repeating the rotation
at a coherent interval. Tarballing the logs, and erasing old ones, etc..
This could obviously be implemented from scratch using `stat`.

Let's get to systemd, the fun... not so fun... replacement for syslog. It
has its own logging system called a journal. Therefore, logging a syslog
daemon is no longer required to read the logs. Now they are stored
in binary format and you need a special command called `journalctl`
to access them.
By default those logs are also stored in `/var/log` but inside a
subdirectory called journal. Unlike syslog it's not going to recreate
the directory if you erase it. systemd could instead, if configured in
a non-persistent way, store them in `/run/systemd/journal`, or if in a
persistent way it will recreate the directory.

journal stores the logs in a binary format, that is lighter, but you
can't read them using the usual tools. You are forced to use a third
party software to read them.
The journal configuration file allows you to make the files either
persistent or not. Taking a look in `/var/log/journal` you'll see files
with an md5sum name and there's a single file or rotated files for
all software. That means logrotate becomes meaningless as the system
manages itself based on the size limit set.

This is configured in `/etc/systemd/journald.conf` in it you'll find
configs related to compression, splitting, syncing interval (for when
it will actually write to disk), max use, maximum runtime, if forwarded
to syslog, max storage, level of priority, etc..
Forwarding them to syslog should save you if you want to run syslog
alongside systemd. By default this is enabled but if syslog isn't running
it goes back to the default behavior.

Targeting and monitoring logs. If in text format you can use text
manipulation tools to do that. Otherwise, you have to use the third
party software given to you.

For reporting, which is mostly missing in journald but not in syslog,
you can use `logwatch` to monitor system logs and email you in case there's
something weird happening. `awstats` can be used as a sort of apache web
server monitor logs. Text files are very flexible!

I hope you learned a thing or two about the logging system.

	freem Offline \| 17-02-2021, 09:38 AM \| #5

(16-02-2021, 04:21 PM)venam Wrote: Especially, that it's the subject of the day on IRC because members are working on related projects (freem, mort).

I have not read that thread, yet, but so that people can know a bit more about the projects you're speaking about, bot projects have reached at least alpha stage (mine, at least, for mort's, I don't know, but it seems to be usable, it may even be considered released?):

* mort is writing a log reader, a tool to help understanding logs by highligting some sequences in it.
* freem (me) is writing a monitoring tool, which parse daemon's logs to inform the human behind screen of their status: down, up, or "waking up" in a visual fashion. The last one being "daemon manager started process, but process does not informs that it's up, so let's guess it is after some time".

...

Few hours later, the text is read, and I have to answer, as a systemd opponent myself, I must say that many things there are just... 1970s-style, when a local storage had high cost, and, most importantly, software reliability was a joke (restart manually daemons? Really?).

As a *very important foreword*, let me remind that the words I criticize are from 2016, the author was younger, and I was, too, so maybe I'd have been more friendly to that text. Maybe.

For a start, I'll give some links towards someone else's blog, that I think interesting reads:

* https://asylum.madhouse-project.org/blog...-terrible/
* https://asylum.madhouse-project.org/blog...-terrible/

Regarding content:

> The developers took the leisure to do their own logging
system,to store the logs in a location of their choice, and in a format they conceived.
We can't really discuss these types of logs as the subject is very broad
and the developers can do absolutely whatever they want.

I disagree that "developers can do absolutely whatever they want".
All those methods are normal logging, so it's possible to discuss them. Many, too many daemons in my Debian actually rely one of those methods, and it's a pain.
I'd say, if you find a tool doing that, drop it as soon as possible.

Why?
Because logs are the pain information of your systems. They are critical. If you can't feel the pain, you can't dodge injuries.
Same for a computer: without monitored logs, nothing can prevent damages.

In practice, logs are done in mostly 3 fashions:

1) a mix of stderr/stdout prints
2) printing to a specific file
3) dump on a socket

Case 1 is easy: redirect where you want. That's the best.
Case 2 and 3 are annoying, you need to rely on some tricks to force daemons to put their shit in water-closets. So that you can process it correctly and extract meaningful info. When you can configure it, you can use /dev/stdout for example, but when you can't... just drop the tool, if you can, really. If you can't, place a socket before it starts, or other trick. Raw logs are not very useful to a non-dev.

About the rest, I'll concentrate my arguments into those few words, because it's really, really late:

* a central system does not imply the absence of local, more detailed systems
* a single central system, even if local, implies Denial of Service attacks
* syslogd by default encourages DoS attacks and lack of resilence
* text is binary, when text is not american^Wlingua franca (wait, does franca means, french derived?:p). Which means, it's almost always binary, thus text logs always require 3rd party tools
* even pure 7-bit text, "ascii" encoding requires 3rd party tools to be usable
* binary formats allows control integrity chunk by chunk
* "text formats" relies on underlying binary FS storage anyway, so are binary and unreliable if binary formats are.
* gzipped tarballs are readable by vim and less, are they text formats?
* syslog is a fucking standard. I hate that fact, but it's still true.
* syslog have no security, and can't prevent malicious network user to rewrite logs
* syslog being based on UDP guarantees nothing, you'd better store logs locally
* it's easier to redirect a daemon's stderr/stdout to a file or process when it comes to log analysis, than to grep syslogd or journalctl, and that technic is decades old: daemontools.
* a logging system should have an as most reduced impact on system as possible: not recreating a directory is a feature, here. For reliability reasons.
* binary files can be more flexible than text. Ever tried to grep for all mail addresses?

	venam Offline \| 17-02-2021, 10:29 AM \| #6

(17-02-2021, 09:38 AM)freem Wrote: I disagree that "developers can do absolutely whatever they want".
All those methods are normal logging, so it's possible to discuss them. Many, too many daemons in my Debian actually rely one of those methods, and it's a pain.

This wasn't meant as an encouragement to developers to do whatever they want... This definitely can be annoying, it would be better if everyone was using a standard method via the system logs. However, this part of the podcast was exactly to say that this type of local logging isn't respectful of any such standard, so not worth talking about other than "some people do local program logs that don't use syslog".

We can obviously dive into the best approach to log for debugging purposes, how to track sessions and context and whatnot, but this is another story.

(17-02-2021, 09:38 AM)freem Wrote: For a start, I'll give some links towards someone else's blog, that I think interesting reads:

I'm not sure I agree with what's discussed in these 2 articles. Especially the "Binary logs need their tools!" section. The point is that text logs can have strong formatting, are interroperable, and selected pieces can be kept in cold storage for auditing.

Just like syslog can have specific fields but the message can be arbitrary, journal has the same thing where the message is arbitrary too (the fields are defined in sytemd.journal-fields(7)). There's only so much structure that you can impose, the rest is still left on the programmer to define what is inside the message.

Moreover, taking a real world example, most of the things I do at work have triggers based on log events, and there is a whole infrastructure built around logs. Forcing a new format that can only be read by passing by a single software would require changing the whole infrastructure, which isn't practical. However, you can definitely use journald as a simple proxy towards filtering or other logs systems.

In the end, the logs are only as valuable as the processing you can do over them. You can keep them in textual format or binary format, or a DB, or big data storage, as long as you're able to perform the operations you want.

(17-02-2021, 09:38 AM)freem Wrote: About the rest, I'll concentrate my arguments into those few words, because it's really, really late:

Great points!

(17-02-2021, 09:38 AM)freem Wrote: As a *very important foreword*, let me remind that the words I criticize are from 2016, the author was younger, and I was, too, so maybe I'd have been more friendly to that text. Maybe.

Looking back at it, I don't think it was such a bad podcast. Even from today's point of view, there's not much to criticize. The only big change is that in the Linux world systemd's journal has taken over as the default logger on multiple distributions.

	jkl Offline \| 17-02-2021, 03:30 PM \| #7

Binary logs are broken logs. Change my mind.

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen

View a Printable Version