Playing With Promtail: Labelling Hostnames From File Names
The Problem
I’ve recently needed to move to using a more robust logging solution that I was before. Previously, I had fluent running in a container collecting logs, but this was finicky and if something went wrong with the container, I would never know about it.
I installed rsyslog on a dedicated host and started pushing logs to this new VM. This was working great, I was now handling hundreds of messages a second without issue, so I installed Promtail and began exporting the logs, and here is where the issues arose. I had no labels any more, and organization is paramount when dealing with logs. I decided that I would try and figure out if I could extract data from the filenames generated by rsyslog and use that as a first step in the re-labelling process. Here is where I ran into some roadblocks, and what I did to fix the problem.
The Solution
Here’s the configuration that I started with. I was able to gather that we can use pipeline stages to add a tag from a file. Logs are stored in /var/log/remote
and are named host-name-1.log
. With a little luck I should be able to extract the actual hostname and insert that into a label.
I adapted the following configuration from Github
server:
disable: true
# http_listen_port: 9080
# grpc_listen_port: 0
positions:
filename: /tmp/prom_positions.yaml
clients:
- url: '${LOKIURL}'
batchsize: 400
batchwait: 5s
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: remote/system
__path__: /var/log/remote/*.log
hostname: "app-logging-1"
pipeline_stages:
- match:
selector: '{job="remote/system"}'
stages:
- regex:
source: filename
expression: '(\/var\/log\/remote\/)(?P<loghost>.+)\.log$'
- labels:
loghost:
This however does not work, and I get an error complaining about having multiple keys in the stages pipeline. With a little help from GPT, I was able to get a ‘working’ configuration by modifying the pipeline stages to read:
...
pipeline_stages:
- regex:
expression: '(\/var\/log\/remote\/)(?P<loghost>.+)\.log$'
This allowed Promtail to start, but after reviewing my logs in Grafana, I realized that the labels were not being applied. What gives?
Promtail’s official docs Loki/Promtail provide a little bit of insight.
# This stage is only going to run if the scraped target has a label
# of "name" with value "promtail".
- match:
selector: '{name="promtail"}'
stages:
# The regex stage parses out a level, timestamp, and component. At the end
# of the stage, the values for level, timestamp, and component are only
# set internally for the pipeline. Future stages can use these values and
# decide what to do with them.
- regex:
expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)'
- labels:
level:
component:
This allowed me to update my configuration as follows. I was missing the labels key, which is required to apply the label during the pipeline. So I changed the end of my configuration to read:
...
pipeline_stages:
- regex:
expression: '(\/var\/log\/remote\/)(?P<loghost>.+)\.log$'
- labels:
loghost:
And… Nothing. This didn’t work Promtail started, but the labels still weren’t being applied.
At this point, I was beginning to get frustrated. This should work but it doesn’t, what gives? It was then that I had realized that I needed to borrow a few line from the first example I had found on Github. My pipeline was being applied to individual lines the filename was never checked at all. so after adding a matcher back in to check the filename I was left with:
...
pipeline_stages:
-match:
selector: '{job="remote"}'
stages:
- regex:
source: filename
expression: '(\/var\/log\/remote\/)(?P<loghost>.+)\.log$'
- labels:
loghost:
It was here that I realized that the escaping in the regex was already taken care of by Promtail. I simply updated the expression line to read "/var/log/remote/(?P<loghost>.+).log$"
and voilà! The new label is showing up in Grafana.
Conclusion
It can be daunting at first, dealing with the strange syntax that Loki and Promtail like to use for labeling, but with a little perseverance, you can make it work.
Thank you for reading, and as always. I hope to discuss some of this in the comments.