Skip to content

Option to use stale cache after query failed with timeout or error response #15294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pizchen opened this issue Mar 12, 2025 · 2 comments
Open

Comments

@pizchen
Copy link
Contributor

pizchen commented Mar 12, 2025

  • Program: dnsdist
  • Issue type: Feature request

Short description

Currently stale cache will be used only when all downstream servers are not available.
If any downstream server is still available, today dnsdist will foward the query to the selected server.
If that query fails for any reason, timeout or error response, there is currently no chance to still go back and use the existed stale cache.
The feature request is to provide an option so that it can choose to go back to use the stale cache (if existed) if the forwarded query failed.

Usecase

This is trying to shorten the possible query failing period for the client side. If a cache is expired it is reasonable to firstly forward query to available downstream server. However, if that query fails, and cache is still within stale TTL, sending the cached result is possibly a better solution.

Description

The scenario would be like followings:

  1. Client sends a DNS query to dnsdist
  2. dnsdist looks up cache (if enabled), and found a cache, which is expired, but within stale TTL
  3. dnsdist shall continue forward the query to selected downstream server
  4. while if any failure happens for this particular query, dnsdist can go back lookup cache again (or save the first lookup result in the query), and use the cached result as response to send back to client.

This can be made as an optional feature so that user can configure if the stale cache shall be used in such situation or not.

@rgacogne
Copy link
Member

Thank you for opening this feature request. I have to say that I'm not convinced this would actually be useful because:

  • if the server does not respond and dnsdist detects a timeout, in my experience it is already too late, the initial client has given up or sent a new query already
  • it would only be useful if the server is failing in a way that is not detected by the health-check mechanism. If it happens, perhaps we can look into improving the health-check mechanism instead.

@pizchen
Copy link
Contributor Author

pizchen commented Mar 13, 2025

Thank you for quick reply on this!
In my view it is arguable for the two points that you mentioned:

  • what if a user set client side timeout greater than the timeout at dnsdist? then the stale cache has its value
  • error response would come back very fast while health check takes at least a few seconds with a reasonable threshold, many qureies may get failure response in these seconds, even a stale cache is available and most likely with the right answer

In this sense, a service like dnsdist role could provide better expericence for different user needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants