Log spam #1377

Jakob3xD · 2025-03-04T10:28:54Z

Hello,
we updated our blackbox exporter and are now getting a lot of log messages due to the current implementation of slog introduced with #1311 as it removes the default none logging option.

We have tens of thousands of probes running each minute. Some of them are expected to fail but will cause the logger to print an error message. We are now producing over 14k of logs per second.

I can see two possible solutions for this where I prefer the first one.

1. Adjust the log level for some message

Currently, a probe failure is logged with level ERROR. IMO this should be level DEBUG (or WARN). My interpretation of level ERROR is that blackbox exporter itself is having an issue. Like failed to read config file, failed to unmarshal config file, failed binding to port. Things that are real issues. A failing probe can have different reasons, but does not affect the blackbox exporter.

2. Add back the `none` logging option

Add the none option by writing logs to io.Discard like in NewNopLogger().
Instead, it would also be possible to add a flag to define where to write the logs. For example stderr, stdout, /dev/null.

The text was updated successfully, but these errors were encountered:

mateuszdrab · 2025-03-15T12:23:39Z

I just noticed that too, after discovering a shocking 160 million log lines generated (and ingested into Loki) in a mere week.

Somewhat surprised that there is no way to revert this behavior so have opted in for pinning the previous version of the chart for now.

I understand the motivation behind the logging library change but that should be a breaking change (or a massive warning in the release notes) if you ask me.

SuperQ · 2025-03-15T13:56:41Z

CC @tjhop care to look into this one?

Immediate fix for prometheus#1377. This fix will need to be refactored a bit when we upgrade prometheus/common to v0.63.0, since prometheus/common#754 contains a breaking change to promslog's level code. Explanation: When I ported the blackbox exporter to use slog in prometheus#1311, I removed the `none` option from the log prober flag, as it was non-standard to other prometheus projects, not a valid value in the original promlog package, and slog has no inherent corresponding concept. The log prober flag value is used to create the scrape logger (which is specifically used for scrape-related logging, not overall application logging). I also included some code that ensured a level config would be created for use with the scrape logger. In combination, this causes the blackbox exporter to always output scrape logs, which can be noisy. If we instead default the flag to an empty string and do some intelligent handling of an nil log prober value within the scrapeLogger's implementation of the slog.Handler interface, then we can restore the previous behavior. TL;DR: Leave `--log.prober` flag unset and it won't do scrape-level logging Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

tjhop · 2025-03-18T04:30:59Z

@Jakob3xD I agree with your interpretation of where error should be used. I'm hesitant to re-introduce the none option again since it was "non-standard" from the accepted level values/promlog package usage in other prometheus projects. I say that in reference to the --log.level flag mostly, since the --log.prober flag is unique to this exporter.

By defaulting the flag to an empty string and improving how things are handled when unset, I believe I was able to restore the previously expected behavior. As mentioned in the linked PR:

TL;DR:
Leave --log.prober flag unset and it won't do scrape-level logging

Opinions?

SuperQ · 2025-03-18T07:56:40Z

I agree, I don't think we want to re-introduce the none mode.

Leave --log.prober flag unset and it won't do scrape-level logging

I think this isn't working as intended. The default value for --log.prober is info. But even when I change --log.prober=debug, I'm still seeing it log probes as info level.

Also, the default value for --log.prober should be Debug().

tjhop · 2025-03-25T03:46:30Z

I think this isn't working as intended. The default value for --log.prober is info. But even when I change --log.prober=debug, I'm still seeing it log probes as info level.

Am I misunderstanding what the purpose of the --log.prober flag is supposed to do? If I checkout v0.25.0 to look at behavior for the last release and run with --log.level=debug --log.prober=debug, I still see the probe logs output at info level as well:

ts=2025-03-25T03:42:24.214Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Beginning probe" probe=http timeout_seconds=4.5
ts=2025-03-25T03:42:24.214Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Resolving target address" target=prometheus.io ip_protocol=ip4
ts=2025-03-25T03:42:24.215Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Resolved target address" target=prometheus.io ip=104.21.60.220
ts=2025-03-25T03:42:24.215Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Making HTTP request" url=https://104.21.60.220 host=prometheus.io
ts=2025-03-25T03:42:24.275Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Received HTTP response" status_code=200
ts=2025-03-25T03:42:24.275Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Response timings for roundtrip" roundtrip=0 start=2025-03-24T23:42:24.215637413-04:00 dnsDone=2025-03-24T23:42:24.215637413-04:00 connectDone=2025-03-24T23:42:24.227121089-04:00 gotConn=2025-03-24T23:42:24.24403057-04:00 responseStart=2025-03-24T23:42:24.275497629-04:00 tlsStart=2025-03-24T23:42:24.227146156-04:00 tlsDone=2025-03-24T23:42:24.243946663-04:00 end=2025-03-24T23:42:24.275811957-04:00
ts=2025-03-25T03:42:24.276Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Probe succeeded" duration_seconds=0.061779807

tjhop · 2025-03-25T03:48:33Z

Also, the default value for --log.prober should be Debug().

I can do that. Should the behavior of --log.prober be dependent on --log.level, then?

grimz-ly · 2025-04-02T23:01:00Z

I think this isn't working as intended. The default value for --log.prober is info. But even when I change --log.prober=debug, I'm still seeing it log probes as info level.

Also, the default value for --log.prober should be Debug().

isn't debug mode generally a 'catch all'? If I set this flag to 'warn' it stops the info logging for sure. this is unique to v0.26.0 release. I would think default should not be info.

brandonbirdj · 2025-04-09T21:17:18Z

I think you might be confused about how log levels work. You need to change to a higher level to not see these. Changing to debug still included info.

--log.prober=warn has fixed the issue for me.

prometheus/blackbox_exporter#1377

Jakob3xD · 2025-04-24T07:40:46Z

@SuperQ can you clarify for us what the expected behaviour of --log.prober is. As far as I understand, it is similar to --log.level as it defines what log level should be printed for probe logs.

From your comment, I would assume that you expect the --log.probe to set to which level the probe logs should be sent. Meaning --log.probe set to debug results in all probe logs on debug level which then should not be printed if the --log.level is set to info?

If this is correct, then this would IMO not be a good solution for the problem, as we lose the use of levels for probe logs. Specially because those levels are requested by other issues, see #1401.

From what I have read so far, I see two solutions.

Adjust the log levels of probe logs to not be higher as WARN if they are not real issues and keep the --log.prober argument so when setting it to level error we don't see those probe logs.
Similar to 1 but drop the --log.prober argument and only use the --log.level argument.

Option 1 allows a finer filtering of logs as the exporter logs and probe logs are separated where option 2 removes the confusing extra argument.

SuperQ · 2025-04-24T14:04:14Z

Yea, I don't know what the original intention of --log.prober was supposed to look like.

I feel like it was just hacked in based on how the old logging library worked.

Based on discussion in prometheus#1377, slack, etc, it seems unclear what the intended functionality/behavior of the probe scrape logger was supposed to be. This is an attempt to align behavior with current user expectations. The improved defaults and updated configs here should address the log spam as well. User facing changes: - the `--log.prober` flag now sets the _level at which prober scrape logs will be emitted at_. - the `--log.prober` flag now defaults to `debug` level. Because the default log level is info, this means that in order to see probe scrape logs, a user must either: - increase the level of the logger via `--log.level` to be >= that of `--log.prober` (ie, set both to `debug`) - decrease the level of the probe logger via `--log.prober` to be <= that of `--log.level` (ie, set both to `info`) Note: This approach (emitting all logs from the probe logger at the level specified) involves hijacking calls to the logger and overriding the level. This comes with the very obvious side effect that the developers working on code using the probe scrape logger no longer have control over the control over the level of the log that gets emitted. Regardless of whether they use `l.Info()` or `l.Debug()`, it'll still get logged at the configured log prober level. This approach likely would not be compatible with some of the other discussions around how probe scrape logs work, like prometheus#1401. Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

tjhop · 2025-05-08T04:02:19Z

After some more thought and discussion with @SuperQ on slack, I updated #1381 based on our current best understanding of how the --log.prober flag was intended to behave. It now sets the value that probe scrape logs will be emitted at. Probe logs default to debug level, so they will be not be seen unless A) the log level is increased or B) the probe log level is decreased.

More details in the PR, feedback welcome 👍

* fix: stop scrape logger spam, refactor scrape logger design Based on discussion in #1377, slack, etc, it seems unclear what the intended functionality/behavior of the probe scrape logger was supposed to be. This is an attempt to align behavior with current user expectations. The improved defaults and updated configs here should address the log spam as well. User facing changes: - the `--log.prober` flag now sets the _level at which prober scrape logs will be emitted at_. - the `--log.prober` flag now defaults to `debug` level. Because the default log level is info, this means that in order to see probe scrape logs, a user must either: - increase the level of the logger via `--log.level` to be >= that of `--log.prober` (ie, set both to `debug`) - decrease the level of the probe logger via `--log.prober` to be <= that of `--log.level` (ie, set both to `info`) Note: This approach (emitting all logs from the probe logger at the level specified) involves hijacking calls to the logger and overriding the level. This comes with the very obvious side effect that the developers working on code using the probe scrape logger no longer have control over the control over the level of the log that gets emitted. Regardless of whether they use `l.Info()` or `l.Debug()`, it'll still get logged at the configured log prober level. This approach likely would not be compatible with some of the other discussions around how probe scrape logs work, like #1401. Signed-off-by: TJ Hoplock <t.hoplock@gmail.com> * docs(README): add section on --log.prober behavior with sample output Address PR feedback Signed-off-by: TJ Hoplock <t.hoplock@gmail.com> --------- Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

tjhop mentioned this issue Mar 18, 2025

fix: stop scrape logger spam, refactor scrape logger design #1381

Merged

tjhop mentioned this issue Apr 10, 2025

Log level for unsuccessful probes #1401

Open

jsl6ul added a commit to jsl6ul/docker_apps that referenced this issue Apr 11, 2025

blackbox: add command to set '--log.prober=warn'

12e1ed4

prometheus/blackbox_exporter#1377

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log spam #1377

Log spam #1377

Jakob3xD commented Mar 4, 2025

mateuszdrab commented Mar 15, 2025 •

edited

Loading

Uh oh!

SuperQ commented Mar 15, 2025

Uh oh!

tjhop commented Mar 18, 2025

Uh oh!

SuperQ commented Mar 18, 2025 •

edited

Loading

Uh oh!

tjhop commented Mar 25, 2025

Uh oh!

tjhop commented Mar 25, 2025

Uh oh!

grimz-ly commented Apr 2, 2025

Uh oh!

brandonbirdj commented Apr 9, 2025

Uh oh!

Jakob3xD commented Apr 24, 2025

Uh oh!

SuperQ commented Apr 24, 2025

Uh oh!

tjhop commented May 8, 2025

Uh oh!

Log spam #1377

Log spam #1377

Comments

Jakob3xD commented Mar 4, 2025

1. Adjust the log level for some message

2. Add back the none logging option

mateuszdrab commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SuperQ commented Mar 15, 2025

Uh oh!

tjhop commented Mar 18, 2025

Uh oh!

SuperQ commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjhop commented Mar 25, 2025

Uh oh!

tjhop commented Mar 25, 2025

Uh oh!

grimz-ly commented Apr 2, 2025

Uh oh!

brandonbirdj commented Apr 9, 2025

Uh oh!

Jakob3xD commented Apr 24, 2025

Uh oh!

SuperQ commented Apr 24, 2025

Uh oh!

tjhop commented May 8, 2025

Uh oh!

2. Add back the `none` logging option

mateuszdrab commented Mar 15, 2025 •

edited

Loading

SuperQ commented Mar 18, 2025 •

edited

Loading