-
Notifications
You must be signed in to change notification settings - Fork 5
Added support for gzipping the cached files #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.x
Are you sure you want to change the base?
Conversation
In general I like the idea to make it possible to directly cache the encoded version. But I'm not 100% sure if I would activate it by default. In some private projects I'm for example using Interesting is also that I'm not 100% sure about activating gzip by default, as I think some maybe already doing similar things where they support also brotli already, not sure what happening then if it then return gzip or if it first require to decode gzip and encode it as broetli which mostly would be more work on the server. Also some already maybe did optimize there gzip configuration on nginx where they use a gzip_level which is optimized for being faster decoded on the browser, as I remember there is not good to use level 9 always as it requires more work and time for the browser to decode with minimal effort over example level 6 (which I did use mostly in the projects). If we add support for this I think we still require to cache both (unencoded and encoded version) of the response. Or even multiple version of the response (none, gzip, zstd, br), sadly zstd and br is not easy to be supported in PHP out of the box. I currently can also not see in the pull request if we are handling the What I'm also not sure what happening if there are multiple cache layers, cloudflare, fastly, varnish, ... in front we maybe also should not encode it. I like the idea but I personally would not activate it by default. @chirimoya @wachterjohannes what do you think? |
In all of these cases, you shouldn't use this library at all. I mean, this library is meant to be used for simple PHP-only use cases. If you optimize your server setup using nginx, Varnish etc. they should cache the response and you don't need the Symfony HttpCache setup in the first place.
That I will certainly not implement. The goal of this PR is to reduce the disk usage, not to increase it. |
Nginx is our default webserver in mostly all our own Sulu projects since years, but Nginx was in our cases never used for Caching because Nginx caching in its free version is very limited there. Sure combiniation with varnish or third party I really think are uncommon and so don't need to keep in mind for the decision. Still there are Apache Webserver and Nginx gzip configurations deployed which already did configure gzip_level optimized for there content or using broetli so I think I would not activate it by default as it maybe could overwrite existing configured behavours. What also did come to my mind, that we maybe need to keep in mind is also adding a gzip_mime_types configurations as maybe some projects cache also some binary files which are not xsendfile responses, and things like png, mp4, formats should not be gzip as they already have there own compression. I mostly used https://github.com/google/ngx_brotli#sample-configuration as reference which mime types should be encoded. But that is sure also an edge case which maybe not happens often.
Make sense! Still keep in mind that level 9 is maybe best in size but not best in browser to decode it :) |
Which means in case of browsers supporting brotli, your Nginx setup compressed the raw HTML response on every single request, right? I can hardly imagine that this is faster than just outputting the
Right, I've checked the defaults in Apache etc. and it seems like 6 is the level most aim for. Changed in a9003a4. |
Also have https://en.wikipedia.org/wiki/BREACH in mind as most people uses HTTPS these days. |
I guess for HTML responses we could add random sized paddings but no idea how to mitigate for all contents. Any proposals? Also, I wonder how important this really is. This should be fixed on protocol level and not in every single application but seems like the IETF abandoned any proposals. |
If I see correctly varnish also has an option to activate GZIP for specific requests. https://varnish-cache.org/docs/5.1/users-guide/compression.html. Maybe you find there some information how they solve such issue? |
When I read this, they do exactly the same as my implementation does^^ I also find 0 information in the source code of Varnish regarding BREACH attacks. All I find when searching the Internet is tickets over and over that are closed because irrelevant/can't do anything about it. Such as dotnet/aspnetcore#45662 for example. |
Reading the varnish docs, something else come to my mind. How does GZIP stored content work together with ESI Include HTML Tags with this storage? I just remember that Varnish is streaming HTML content, stops by a ESI tag and stream the ESI and continue then with the main HTML content. Not sure how they solved GZIP over the whole content. If we directly return GZIP content does the ESI Listener of Symfony still work? Not sure but I think it is handled here: https://github.com/symfony/symfony/blob/fd530913d494e2b4e9a624ba6350371d8e801ca3/src/Symfony/Component/HttpKernel/HttpCache/Esi.php#L70 but if getContent is now GZIP it will fail and the ESI not longer work? Why not directly use ESI inside Sulu, we use ESI in a lot of projects. |
@alexander-schranz : Good point.. Don't know how recent versions of varnish works but with Fastly, ESI fragments can not be compressed (they will be delivered as compressed garbage to the client) |
ESI should be of no concern. The proxy will not send an |
@Toflar Ohhh I missed that part. I thought we GZIP it and return directly a GZIP Response so the webserver don't require todo the gzipping. But as I understand now we gzip the content on write and on read we ungzip it. So the response object created by the store looks for both cases where gzip is activated or not activated the same? |
Yes this PR (in its current state) would store the contents gzipped (if supported and not already gzipped). We decompress on the fly if the client does not support gzipping. Just because if you have thousands of pages in the cache, this generates a significant amount of data in the cache. But we could also make things a bit more configurable. E.g. only handle |
For the current implementation the only thing which would then be interesting for me if we need do now |
return; | ||
} | ||
|
||
$encoded = gzencode((string) $response->getContent(), $this->options['gzip_level']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a check here for the case that ->getContent()
returns false
like the streamed response does for example? https://github.com/symfony/symfony/blob/fd530913d494e2b4e9a624ba6350371d8e801ca3/src/Symfony/Component/HttpFoundation/StreamedResponse.php#L129
Just stumbled today over following Varnish Documentation about Brotli. https://docs.varnish-software.com/varnish-enterprise/vmods/brotli/ So Varnish if brotli is enabled seems to store it as brotli and transcode it to gzip. Thought you might be interested in that. |
This brings support for gzipped cache files which should significantly reduce the used storage.
If a request supports gzip (
Accept-Encoding
) which should be the majority of all browser, we can deliver the gzipped file right away. If not, we'll decode on the fly.I think always gzipping is a sensible default but asking for opinions here from the heavy users of this package :)
/cc @alexander-schranz, Sulu
/cc @mnocon, Ibexa (ex. eZ publish)