Name

penlog - An abstract and machine readable logging format

Specification

The PENLog logging format is intended to be used as a generic and reusable data format for measurement data. The penlog format specifies an abstract data format consisting of various fields with data and metadata. The abstract penlog format can be mapped to multiple output formats, for instance json, or hr, … All available output formats are explained below.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Abstract Logging Format

The penlog structured logging format consists of the following fields. Unset fields which are considered optional MUST be absent.

component (string, OPTIONAL)

The component, e.g. software module, which has issued the log message. In absence, an implementation SHOULD pull the content of the environment variable PENLOG_COMPONENT and MUST set it to root as a fallback.

data (string, REQUIRED)

The log message as an UTF-8 string.

host (string, OPTIONAL)

The hostname of the machine who generated the messages. This field is OPTIONAL, since it is missing in the human readable format. It is RECOMMENDED that implementations include this field, as it increases reproducability of logging data.

id (string, OPTIONAL)

A unique message identifier.

line (string, OPTIONAL)

Information about the file and line number where this log entry was generated. The information MUST be in the form filename:number. filename can be an absolute or relative path, or a filename.

priority (int, OPTIONAL)

This field can be used to optionally set the priority. For priorities, the syslog priorities are used as defined by RFC5424. Implementations can indicate priorities by e.g. a separate color.

stacktrace (string, OPTIONAL)

Implementations can optionally include a stacktrace. This could be useful for debugging if fatal errors occur. Stacktraces are very specific to the used programming language, e.g. python or go. Thus, this field is just an unstructured string.

tags (list[string], OPTIONAL)

To each log entry a custom list of tags can be applied. For instance: ["autogenerated", "pre-test", "post-test", …]. Tags MAY be key value pairs, separated by =.

timestamp (string, REQUIRED)

ISO8601 string of the current date.

type (string, REQUIRED)

The type field is a free field which can be used to assign a particular message type.

Custom fields can be added freely, in other words, additional custom fields are OPTIONAL. Their post-processing and tooling around these custom fields is up to the developer and MUST be ignored by generic converters.

JSON Format (json)

A penlog log file stored on disk is typically stored in the json output format. The tool hr(1) is intended to be used — similar to cat(1) — for viewing penlog data in the json output format. If encoding of a log message fails, the component MUST be set to JSON, the type to ERROR, and the error message MUST be included in data.

The json format consists of a verbatim sequence of the described JSON objects. Each JSON object MUST be present at one line, separated by \n (ascii 0x0a). In order to keep decoding simple and line based, no JSON arrays or virtual, endless JSON structures are employed.

JSON pretty (json-pretty)

The json format forces every JSON object to appear in a single line. The json-pretty format provides an indented, more readable json form for debugging purposes. The actual content of json and json-pretty is the same. It is adviced to use json for data processing pipelines due to less overhead.

Human Readable Format (hr)

The syntax of the human readable format looks like the following. Curly braces indicate a field from the JSON format. If a field is empty it expands to an zero length string; if id, line, tags, or stacktrace are not availabe, the whole line is omitted. A verbatim curly brace brace is expressed with two ones: {{ means {.

{timestamp} {{{component}}} [{type}]: {prio-prefix} {data}
   -> id  : {id}
   -> line: {line}
   -> tags: {tags}
   -> stacktrace:
   | {stacktrace}
timestamp

The RECOMMENDED timestamp format is Go’s StampMilli format as defined to Jan _2 15:04:05.000.

component
type

The component and type fields MUST be padded or truncated that the colons, :, in every single line are perfectly aligned.

data

The actual log message. It MAY be truncated to fit in the current terminal size. When it is truncated an ellipsis character () MUST be appended to indicate the truncation for the user,

id

The optional unique message identifier.

line

The optional filename and line number where this log entry origins from.

stacktrace

The optional stacktrace where this log entry origins from.

prio-prefix

An optional priority prefix. It is RECOMMENDED to indicate message priorities via colors. If colors are not available it MAY be desireable to indicate the priority via a short prefix. The prefixes are enclosed by brackets [ and ]: E A, C, e, w, n, i, d. These letters stand for: emergency, alert, critical, error, warning, notice, info, debug.

tags

The optional tags as comma separated values.

Tiny Human Readable Format (hr-tiny)

The hr-tiny format is the same as hr except that component and type are omitted.

Apr  2 12:48:08.906: Starting tshark with
Apr  2 12:48:09.583: Doing stuff

Example

Apr  2 12:48:08.906 {scanner } [message]: Starting tshark with
Apr  2 12:48:09.583 {moncay  } [message]: Doing stuff

If a JSON line cannot be decoded, the faulty text MUST be included in messages of type ERROR and component JSON:

$ python -c "import foo" 2>&1 | hr
Jun 16 08:19:01.305 {JSON    } [ERROR   ]: Traceback (most recent call last):
Jun 16 08:19:01.305 {JSON    } [ERROR   ]:   File "<string>", line 1, in <module>
Jun 16 08:19:01.305 {JSON    } [ERROR   ]: ModuleNotFoundError: No module named 'foo'

Environment Variables

The following environment variables MAY be understood by penlog implementations. The supported datatypes are string and bool. A bool is a special string consisting of either t, T, true, TRUE, 1 or f, F, false, FALSE, 0.

PENLOG_COMPONENT (string)

If no component is set, the component field MAY be set via the PENLOG_COMPONENT variable at the scope of an operating system process.

PENLOG_CAPTURE_LINES (bool)

If this environment variable is set, implementations SHOULD emit filenames with line numbers via the line field.

PENLOG_CAPTURE_STACKTRACES (bool)

If this environment variable is set, implementations SHOULD provide stacktraces via the stacktrace field.

PENLOG_OUTPUT (string)

A switch for implementations to choose from several output forms. Available are: hr, hr-tiny, json, json-pretty, systemd.

PENLOG_LOGLEVEL (string)

In order to limit the emitted logging messages, loglevels MAY be supported. If a library supports filtering based on loglevels, it MUST check this environment variable. The supported values are critical, error, warning, notice, info, debug. The default MUST be debug. A message MUST omitted if its priority field contains a value greater than PENLOG_LOGLEVEL. A mapping between these strings and integer values is availabe in RFC5424.

See Also

hr(1), penlog-best-practice(7)

Bugs

This project is maintained on Github: https://github.com/Fraunhofer-AISEC/penlog.

Authors

Current maintainers are:

License

This document is published under the Apache-2.0 license. The license of the code can be obtained from the Git repository.