Skip to content

Added support for gzipping the cached files #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: 4.x
Choose a base branch
from
Open

Conversation

Toflar
Copy link
Owner

@Toflar Toflar commented Jan 15, 2024

This brings support for gzipped cache files which should significantly reduce the used storage.
If a request supports gzip (Accept-Encoding) which should be the majority of all browser, we can deliver the gzipped file right away. If not, we'll decode on the fly.

I think always gzipping is a sensible default but asking for opinions here from the heavy users of this package :)

/cc @alexander-schranz, Sulu
/cc @mnocon, Ibexa (ex. eZ publish)

@Toflar Toflar marked this pull request as ready for review January 15, 2024 11:20
@alexander-schranz
Copy link
Contributor

alexander-schranz commented Jan 15, 2024

In general I like the idea to make it possible to directly cache the encoded version.

But I'm not 100% sure if I would activate it by default. In some private projects I'm for example using nginx broetli https://github.com/google/ngx_brotli and return br encoding instead of gzip if the browser does support it: https://caniuse.com/brotli

Interesting is also that FrankenPHP beside gzip also have support for zstd but think the browser support is not there yet: https://caniuse.com/zstd.

I'm not 100% sure about activating gzip by default, as I think some maybe already doing similar things where they support also brotli already, not sure what happening then if it then return gzip or if it first require to decode gzip and encode it as broetli which mostly would be more work on the server.

Also some already maybe did optimize there gzip configuration on nginx where they use a gzip_level which is optimized for being faster decoded on the browser, as I remember there is not good to use level 9 always as it requires more work and time for the browser to decode with minimal effort over example level 6 (which I did use mostly in the projects).

If we add support for this I think we still require to cache both (unencoded and encoded version) of the response. Or even multiple version of the response (none, gzip, zstd, br), sadly zstd and br is not easy to be supported in PHP out of the box. I currently can also not see in the pull request if we are handling the Accept-Encoding header here correctly. If no gzip encoding is not given we should not return a gzip version of the content as maybe some clients mostly not browser do not support gzip.

What I'm also not sure what happening if there are multiple cache layers, cloudflare, fastly, varnish, ... in front we maybe also should not encode it.

I like the idea but I personally would not activate it by default. @chirimoya @wachterjohannes what do you think?

@Toflar
Copy link
Owner Author

Toflar commented Jan 15, 2024

But I'm not 100% sure if I would activate it by default. In some private projects I'm for example using nginx broetli https://github.com/google/ngx_brotli and return br encoding instead of gzip if the browser does support it: https://caniuse.com/brotli
Interesting is also that FrankenPHP beside gzip also have support for zstd but think the browser support is not there yet: https://caniuse.com/zstd.
I'm not 100% sure about activating gzip by default, as I think some maybe already doing similar things where they support also brotli already, not sure what happening then if it then return gzip or if it first require to decode gzip and encode it as broetli which mostly would be more work on the server.

In all of these cases, you shouldn't use this library at all. I mean, this library is meant to be used for simple PHP-only use cases. If you optimize your server setup using nginx, Varnish etc. they should cache the response and you don't need the Symfony HttpCache setup in the first place.

If we add support for this I think we still require to cache both (unencoded and encoded version) of the response

That I will certainly not implement. The goal of this PR is to reduce the disk usage, not to increase it.

@alexander-schranz
Copy link
Contributor

alexander-schranz commented Jan 15, 2024

In all of these cases, you shouldn't use this library at all. I mean, this library is meant to be used for simple PHP-only use cases. If you optimize your server setup using nginx, Varnish etc. they should cache the response and you don't need the Symfony HttpCache setup in the first place.

Nginx is our default webserver in mostly all our own Sulu projects since years, but Nginx was in our cases never used for Caching because Nginx caching in its free version is very limited there. Sure combiniation with varnish or third party I really think are uncommon and so don't need to keep in mind for the decision. Still there are Apache Webserver and Nginx gzip configurations deployed which already did configure gzip_level optimized for there content or using broetli so I think I would not activate it by default as it maybe could overwrite existing configured behavours.

What also did come to my mind, that we maybe need to keep in mind is also adding a gzip_mime_types configurations as maybe some projects cache also some binary files which are not xsendfile responses, and things like png, mp4, formats should not be gzip as they already have there own compression. I mostly used https://github.com/google/ngx_brotli#sample-configuration as reference which mime types should be encoded. But that is sure also an edge case which maybe not happens often.

That I will certainly not implement. The goal of this PR is to reduce the disk usage, not to increase it.

Make sense! Still keep in mind that level 9 is maybe best in size but not best in browser to decode it :)

@Toflar
Copy link
Owner Author

Toflar commented Jan 15, 2024

Nginx is our default webserver in mostly all our own Sulu projects since years, but Nginx was in our cases never used for Caching because Nginx caching in its free version is very limited there. Sure combiniation with varnish or third party I really think are uncommon and so don't need to keep in mind for the decision. Still there are Apache Webserver and Nginx gzip configurations deployed which already did configure gzip_level optimized for there content or using broetli so I think I would not activate it by default as it maybe could overwrite existing configured behavours.

Which means in case of browsers supporting brotli, your Nginx setup compressed the raw HTML response on every single request, right? I can hardly imagine that this is faster than just outputting the gzip version I've now implemented ;) But I think it depends if your nginx setup adjusts the Accept-Encoding header when forwarding to PHP?

What also did come to my mind, that we maybe need to keep in mind is also adding a gzip_mime_types configurations as maybe some projects cache also some binary files which are not xsendfile responses, and things like png, mp4, formats should not be gzip as they already have there own compression. I mostly used https://github.com/google/ngx_brotli#sample-configuration as reference which mime types should be encoded. But that is sure also an edge case which maybe not happens often.

BinaryFileResponses were already encoded but we're now also ignoring all responses that already have a different encoding: f1aa2d3

Make sense! Still keep in mind that level 9 is maybe best in size but not best in browser to decode it :)

Right, I've checked the defaults in Apache etc. and it seems like 6 is the level most aim for. Changed in a9003a4.

@vidarl
Copy link

vidarl commented Jan 17, 2024

Also have https://en.wikipedia.org/wiki/BREACH in mind as most people uses HTTPS these days.

@Toflar
Copy link
Owner Author

Toflar commented Feb 1, 2024

I guess for HTML responses we could add random sized paddings but no idea how to mitigate for all contents. Any proposals? Also, I wonder how important this really is. This should be fixed on protocol level and not in every single application but seems like the IETF abandoned any proposals.

@alexander-schranz
Copy link
Contributor

If I see correctly varnish also has an option to activate GZIP for specific requests. https://varnish-cache.org/docs/5.1/users-guide/compression.html. Maybe you find there some information how they solve such issue?

@Toflar
Copy link
Owner Author

Toflar commented Feb 14, 2024

When I read this, they do exactly the same as my implementation does^^ I also find 0 information in the source code of Varnish regarding BREACH attacks. All I find when searching the Internet is tickets over and over that are closed because irrelevant/can't do anything about it. Such as dotnet/aspnetcore#45662 for example.

@alexander-schranz
Copy link
Contributor

Reading the varnish docs, something else come to my mind. How does GZIP stored content work together with ESI Include HTML Tags with this storage? I just remember that Varnish is streaming HTML content, stops by a ESI tag and stream the ESI and continue then with the main HTML content. Not sure how they solved GZIP over the whole content.

If we directly return GZIP content does the ESI Listener of Symfony still work? Not sure but I think it is handled here: https://github.com/symfony/symfony/blob/fd530913d494e2b4e9a624ba6350371d8e801ca3/src/Symfony/Component/HttpKernel/HttpCache/Esi.php#L70 but if getContent is now GZIP it will fail and the ESI not longer work?

Why not directly use ESI inside Sulu, we use ESI in a lot of projects.

@vidarl
Copy link

vidarl commented Feb 14, 2024

@alexander-schranz : Good point.. Don't know how recent versions of varnish works but with Fastly, ESI fragments can not be compressed (they will be delivered as compressed garbage to the client)

@Toflar
Copy link
Owner Author

Toflar commented Feb 14, 2024

ESI should be of no concern. The proxy will not send an Accept-Encoding header so it will get the uncompressed response.

@alexander-schranz
Copy link
Contributor

alexander-schranz commented Feb 14, 2024

@Toflar Ohhh I missed that part. I thought we GZIP it and return directly a GZIP Response so the webserver don't require todo the gzipping. But as I understand now we gzip the content on write and on read we ungzip it. So the response object created by the store looks for both cases where gzip is activated or not activated the same?

@Toflar
Copy link
Owner Author

Toflar commented Feb 14, 2024

Yes this PR (in its current state) would store the contents gzipped (if supported and not already gzipped). We decompress on the fly if the client does not support gzipping. Just because if you have thousands of pages in the cache, this generates a significant amount of data in the cache.

But we could also make things a bit more configurable. E.g. only handle text/html or application/json etc. responses. And e.g. ignore gzipping if a Surrogate-Control header is present to avoid constant decompressing of those responses etc.

@alexander-schranz
Copy link
Contributor

For the current implementation the only thing which would then be interesting for me if we need do now gzdecode for most cached responses if this will have any effects on the performance. Maybe we could do here some benchmarks.

return;
}

$encoded = gzencode((string) $response->getContent(), $this->options['gzip_level']);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a check here for the case that ->getContent() returns false like the streamed response does for example? https://github.com/symfony/symfony/blob/fd530913d494e2b4e9a624ba6350371d8e801ca3/src/Symfony/Component/HttpFoundation/StreamedResponse.php#L129

@alexander-schranz
Copy link
Contributor

alexander-schranz commented Mar 6, 2025

Just stumbled today over following Varnish Documentation about Brotli. https://docs.varnish-software.com/varnish-enterprise/vmods/brotli/

So Varnish if brotli is enabled seems to store it as brotli and transcode it to gzip. Thought you might be interested in that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants