Skip to content

Log spam #1377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jakob3xD opened this issue Mar 4, 2025 · 11 comments
Open

Log spam #1377

Jakob3xD opened this issue Mar 4, 2025 · 11 comments

Comments

@Jakob3xD
Copy link

Jakob3xD commented Mar 4, 2025

Hello,
we updated our blackbox exporter and are now getting a lot of log messages due to the current implementation of slog introduced with #1311 as it removes the default none logging option.

We have tens of thousands of probes running each minute. Some of them are expected to fail but will cause the logger to print an error message. We are now producing over 14k of logs per second.

I can see two possible solutions for this where I prefer the first one.

1. Adjust the log level for some message

Currently, a probe failure is logged with level ERROR. IMO this should be level DEBUG (or WARN). My interpretation of level ERROR is that blackbox exporter itself is having an issue. Like failed to read config file, failed to unmarshal config file, failed binding to port. Things that are real issues. A failing probe can have different reasons, but does not affect the blackbox exporter.

2. Add back the none logging option

Add the none option by writing logs to io.Discard like in NewNopLogger().
Instead, it would also be possible to add a flag to define where to write the logs. For example stderr, stdout, /dev/null.

@mateuszdrab
Copy link

mateuszdrab commented Mar 15, 2025

I just noticed that too, after discovering a shocking 160 million log lines generated (and ingested into Loki) in a mere week.

Somewhat surprised that there is no way to revert this behavior so have opted in for pinning the previous version of the chart for now.

I understand the motivation behind the logging library change but that should be a breaking change (or a massive warning in the release notes) if you ask me.

@SuperQ
Copy link
Member

SuperQ commented Mar 15, 2025

CC @tjhop care to look into this one?

tjhop added a commit to tjhop/blackbox_exporter that referenced this issue Mar 18, 2025
Immediate fix for prometheus#1377. This fix will need to be refactored a bit when
we upgrade prometheus/common to v0.63.0, since prometheus/common#754
contains a breaking change to promslog's level code.

Explanation:
When I ported the blackbox exporter to use slog in prometheus#1311, I removed the
`none` option from the log prober flag, as it was non-standard to other
prometheus projects, not a valid value in the original promlog package,
and slog has no inherent corresponding concept. The log prober flag
value is used to create the scrape logger (which is specifically used
for scrape-related logging, not overall application logging). I also
included some code that ensured a level config would be created for use
with the scrape logger. In combination, this causes the blackbox
exporter to always output scrape logs, which can be noisy. If we instead
default the flag to an empty string and do some intelligent handling of
an nil log prober value within the scrapeLogger's implementation of the
slog.Handler interface, then we can restore the previous behavior.

TL;DR:
Leave `--log.prober` flag unset and it won't do scrape-level logging

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
tjhop added a commit to tjhop/blackbox_exporter that referenced this issue Mar 18, 2025
Immediate fix for prometheus#1377. This fix will need to be refactored a bit when
we upgrade prometheus/common to v0.63.0, since prometheus/common#754
contains a breaking change to promslog's level code.

Explanation:
When I ported the blackbox exporter to use slog in prometheus#1311, I removed the
`none` option from the log prober flag, as it was non-standard to other
prometheus projects, not a valid value in the original promlog package,
and slog has no inherent corresponding concept. The log prober flag
value is used to create the scrape logger (which is specifically used
for scrape-related logging, not overall application logging). I also
included some code that ensured a level config would be created for use
with the scrape logger. In combination, this causes the blackbox
exporter to always output scrape logs, which can be noisy. If we instead
default the flag to an empty string and do some intelligent handling of
an nil log prober value within the scrapeLogger's implementation of the
slog.Handler interface, then we can restore the previous behavior.

TL;DR:
Leave `--log.prober` flag unset and it won't do scrape-level logging

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
@tjhop
Copy link
Contributor

tjhop commented Mar 18, 2025

@Jakob3xD I agree with your interpretation of where error should be used. I'm hesitant to re-introduce the none option again since it was "non-standard" from the accepted level values/promlog package usage in other prometheus projects. I say that in reference to the --log.level flag mostly, since the --log.prober flag is unique to this exporter.

By defaulting the flag to an empty string and improving how things are handled when unset, I believe I was able to restore the previously expected behavior. As mentioned in the linked PR:

TL;DR:
Leave --log.prober flag unset and it won't do scrape-level logging

Opinions?

@SuperQ
Copy link
Member

SuperQ commented Mar 18, 2025

I agree, I don't think we want to re-introduce the none mode.

Leave --log.prober flag unset and it won't do scrape-level logging

I think this isn't working as intended. The default value for --log.prober is info. But even when I change --log.prober=debug, I'm still seeing it log probes as info level.

Also, the default value for --log.prober should be Debug().

@tjhop
Copy link
Contributor

tjhop commented Mar 25, 2025

I think this isn't working as intended. The default value for --log.prober is info. But even when I change --log.prober=debug, I'm still seeing it log probes as info level.

Am I misunderstanding what the purpose of the --log.prober flag is supposed to do? If I checkout v0.25.0 to look at behavior for the last release and run with --log.level=debug --log.prober=debug, I still see the probe logs output at info level as well:

ts=2025-03-25T03:42:24.214Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Beginning probe" probe=http timeout_seconds=4.5
ts=2025-03-25T03:42:24.214Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Resolving target address" target=prometheus.io ip_protocol=ip4
ts=2025-03-25T03:42:24.215Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Resolved target address" target=prometheus.io ip=104.21.60.220
ts=2025-03-25T03:42:24.215Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Making HTTP request" url=https://104.21.60.220 host=prometheus.io
ts=2025-03-25T03:42:24.275Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Received HTTP response" status_code=200
ts=2025-03-25T03:42:24.275Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Response timings for roundtrip" roundtrip=0 start=2025-03-24T23:42:24.215637413-04:00 dnsDone=2025-03-24T23:42:24.215637413-04:00 connectDone=2025-03-24T23:42:24.227121089-04:00 gotConn=2025-03-24T23:42:24.24403057-04:00 responseStart=2025-03-24T23:42:24.275497629-04:00 tlsStart=2025-03-24T23:42:24.227146156-04:00 tlsDone=2025-03-24T23:42:24.243946663-04:00 end=2025-03-24T23:42:24.275811957-04:00
ts=2025-03-25T03:42:24.276Z caller=level.go:71 module=http_2xx target=https://prometheus.io level=info msg="Probe succeeded" duration_seconds=0.061779807

@tjhop
Copy link
Contributor

tjhop commented Mar 25, 2025

Also, the default value for --log.prober should be Debug().

I can do that. Should the behavior of --log.prober be dependent on --log.level, then?

@grimz-ly
Copy link

grimz-ly commented Apr 2, 2025

I think this isn't working as intended. The default value for --log.prober is info. But even when I change --log.prober=debug, I'm still seeing it log probes as info level.

Also, the default value for --log.prober should be Debug().

isn't debug mode generally a 'catch all'? If I set this flag to 'warn' it stops the info logging for sure. this is unique to v0.26.0 release. I would think default should not be info.

@brandonbirdj
Copy link

I think you might be confused about how log levels work. You need to change to a higher level to not see these. Changing to debug still included info.

--log.prober=warn has fixed the issue for me.

@Jakob3xD
Copy link
Author

@SuperQ can you clarify for us what the expected behaviour of --log.prober is. As far as I understand, it is similar to --log.level as it defines what log level should be printed for probe logs.

From your comment, I would assume that you expect the --log.probe to set to which level the probe logs should be sent. Meaning --log.probe set to debug results in all probe logs on debug level which then should not be printed if the --log.level is set to info?

If this is correct, then this would IMO not be a good solution for the problem, as we lose the use of levels for probe logs. Specially because those levels are requested by other issues, see #1401.

From what I have read so far, I see two solutions.

  1. Adjust the log levels of probe logs to not be higher as WARN if they are not real issues and keep the --log.prober argument so when setting it to level error we don't see those probe logs.
  2. Similar to 1 but drop the --log.prober argument and only use the --log.level argument.

Option 1 allows a finer filtering of logs as the exporter logs and probe logs are separated where option 2 removes the confusing extra argument.

@SuperQ
Copy link
Member

SuperQ commented Apr 24, 2025

Yea, I don't know what the original intention of --log.prober was supposed to look like.

I feel like it was just hacked in based on how the old logging library worked.

tjhop added a commit to tjhop/blackbox_exporter that referenced this issue May 8, 2025
Based on discussion in prometheus#1377, slack, etc, it seems unclear what the
intended functionality/behavior of the probe scrape logger was supposed to be.
This is an attempt to align behavior with current user expectations. The
improved defaults and updated configs here should address the log spam
as well.

User facing changes:
- the `--log.prober` flag now sets the _level at which prober scrape
  logs will be emitted at_.
- the `--log.prober` flag now defaults to `debug` level. Because the
  default log level is info, this means that in order to see probe
scrape logs, a user must either:
    - increase the level of the logger via `--log.level` to be >= that
      of `--log.prober` (ie, set both to `debug`)
    - decrease the level of the probe logger via `--log.prober` to be <=
      that of `--log.level` (ie, set both to `info`)

Note:
This approach (emitting all logs from the probe logger at the level
specified) involves hijacking calls to the logger and overriding the
level. This comes with the very obvious side effect that the developers
working on code using the probe scrape logger no longer have control
over the control over the level of the log that gets emitted. Regardless
of whether they use `l.Info()` or `l.Debug()`, it'll still get logged at
the configured log prober level. This approach likely would not be
compatible with some of the other discussions around how probe scrape
logs work, like prometheus#1401.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
@tjhop
Copy link
Contributor

tjhop commented May 8, 2025

After some more thought and discussion with @SuperQ on slack, I updated #1381 based on our current best understanding of how the --log.prober flag was intended to behave. It now sets the value that probe scrape logs will be emitted at. Probe logs default to debug level, so they will be not be seen unless A) the log level is increased or B) the probe log level is decreased.

More details in the PR, feedback welcome 👍

electron0zero pushed a commit that referenced this issue May 21, 2025
* fix: stop scrape logger spam, refactor scrape logger design

Based on discussion in #1377, slack, etc, it seems unclear what the
intended functionality/behavior of the probe scrape logger was supposed to be.
This is an attempt to align behavior with current user expectations. The
improved defaults and updated configs here should address the log spam
as well.

User facing changes:
- the `--log.prober` flag now sets the _level at which prober scrape
  logs will be emitted at_.
- the `--log.prober` flag now defaults to `debug` level. Because the
  default log level is info, this means that in order to see probe
scrape logs, a user must either:
    - increase the level of the logger via `--log.level` to be >= that
      of `--log.prober` (ie, set both to `debug`)
    - decrease the level of the probe logger via `--log.prober` to be <=
      that of `--log.level` (ie, set both to `info`)

Note:
This approach (emitting all logs from the probe logger at the level
specified) involves hijacking calls to the logger and overriding the
level. This comes with the very obvious side effect that the developers
working on code using the probe scrape logger no longer have control
over the control over the level of the log that gets emitted. Regardless
of whether they use `l.Info()` or `l.Debug()`, it'll still get logged at
the configured log prober level. This approach likely would not be
compatible with some of the other discussions around how probe scrape
logs work, like #1401.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

* docs(README): add section on --log.prober behavior with sample output

Address PR feedback

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

---------

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants