Hi Sudheer
Sorry, one quick follow up question.
We have other plugins declared in the plugin.config file. Should the collapsed forwarding plugin precede them or go at the end? Or does it not matter?
Thanks
Nick
Sent from my iPhone
On Mar 8, 2018, at 8:54 PM, Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>> wrote:
I can see how the documentation might be slightly misleading. You are right that these settings by themselves are orthogonal to whether or not you've collapsed forwarding plugin enabled. And configuring them would indeed avoid Thundering Herd to the Origins, in the sense that, at most, one request per object is leaked upstream.
However, in terms of the net result to the client, as clearly described in the docs, the best these settings can achieve is to return an error to the client or a stale copy, when applicable/available (for example, an older manifest file in case of HLS streaming). This is generally not a desirable behavior for many video solutions and this is where the collapsed_forwarding plugin comes into play. That plugin essentially is built on top of the open_write_fail_action, intercepts the error from going back to the client and waits until the cache is filled with the needed object. The net result, in this case, is clearly better experience to the users and friendlier to the clients (e.g video players).
Technically, using collapsed_forwarding plugin would still be an "out-of-the-box" solution, as long as you compile the plugin and set it up correctly.
More info about how the plugin works is at Collapsed Forwarding Plugin â Apache Traffic Server 8.0.0 documentation<https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html>
<https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html>
Collapsed Forwarding Plugin â Apache Traffic Server 8.0.0 documentation
Hope this helps.
Thanks,
Sudheer
On Thursday, March 8, 2018, 3:22:56 PM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:
Hi Sudheer,
I really think Iâm missing something. Please allow me to check my understanding from the beginning.
We followed the documentation at the link I provided earlier, specifically the section on Reducing Origin Server Requests (Avoiding the Thundering Herd).
We added the required prerequisite configurations (as per the documentation):
CONFIG proxy.config.cache.enable_read_while_writer INT 1
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
CONFIG proxy.config.cache.max_doc_size INT 0
And we also chose sensible settings for each of the following configurations (as per the documentation):
CONFIG proxy.config.cache.read_while_writer.max_retries INT xxx
CONFIG proxy.config.cache.read_while_writer_retry.delay INT xxx
CONFIG proxy.config.http.cache.max_open_read_retries INT xxx
CONFIG proxy.config.http.cache.open_read_retry_time INT xxx
CONFIG proxy.config.http.cache.open_write_fail_action INT xxx
The documentation then states - "Once these are enabled, you have something that is very close, but not quite the same, to Squidâs Collapsed Forwarding.â
AFAIK none of this involves the Collapsed Forwarding plugin. The documentation doesnât mention the Collapsed Forwarding plugin. We donât have the Collapsed Forwarding plugin declared in our plugin.config.
It was my understanding that these settings were orthogonal to the Collapsed Forwarding plugin but provided similar functionality âout of the boxâ.
Please can you let me know if I have misunderstood the documentation? Maybe this section of the documentation is outdated?
Many thanks for your patience,
Nick
From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 5:37 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>
Subject: Re: Parent.config and thundering herd.
AFAIK, collapsed_forwarding plugin is still actively used in production for live and vod streaming by a few companies and I'm not aware of any plans to deprecate it (we did agree on deprecating coallpsed_connection plugin which is somewhat similar in what it does, but, different in how it does -- perhaps, you were referring to that?).
I copied an older link earlier for open_write_fail_action, mainly because, it hasn't changed much in 7.x in what it does. Please see below 7.x reference.
With the open_write_fail_action feature (within the ATS core), ATS would return errors to all but one, on seeing multiple concurrent requests for the same object. For example, if you were doing a live streaming and a 1000 clients requested for the same segment file that is not in the Delivery Server's cache yet, enabling open_write_fail_action feature allows to return 502 to 999 clients, while the other request fetches the segment and populates the cache. As long as the clients retry, this should mostly work. However, if you do not like to return errors to clients (we certainly did not, as it'd make things much worse by causing a retry storm), collapsed_forwarding plugin can hold those requests waiting for the one request that was proxy'ed over to the Origin to fetch the segment and fill the cache. Once the segment is fetched and the writing to cache begins, the other requests can then join the party (that's where, read-while-writer comes into picture), and start streaming to all the clients at the same time.
Now, it's possible that you may have never used the collapsed_forwarding plugin and somehow happened to not see the problem of returning 502 errors to clients, but, it's always possible depending on the scale, concurrency (and in particular, the origin latency). Perhaps, enabling parent proxy may have exposed the problem, by somehow making the latency worse?
records.config â Apache Traffic Server 7.0.0 documentation<https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>
<https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>
records.config â Apache Traffic Server 7.0.0 documentation
<1520548213187blob.jpg>
On Thursday, March 8, 2018, 11:26:13 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:
HI Sudheer,
Iâm not sure weâre quite on the same page but Iâm grateful for your input. This is all for ATS ver 7.0 and the documentation Iâm talking about is on this page
https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html
In the section "Reducing Origin Server Requests (Avoiding the Thundering Herd)â
There is nothing in that section about these settings being associated with the Collapsed Forwarding plugin. In fact there is no mention of the Collapsed Forwarding plugin at all. Now Iâm a little confused.
Is anyone able to clarify this for me? I thought I understood but maybe I donât.
Thanks,
Nick
From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 1:36 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>
Subject: Re: Parent.config and thundering herd.
Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream.
Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?
records.config â Apache Traffic Server 6.2.1 documentation<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>
<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>
records.config â Apache Traffic Server 6.2.1 documentation
proxy.config.http.cache.open_write_fail_action
Scope: CONFIG
Type: INT
Default: 0
Reloadable: Yes
Overridable: Yes
This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.
* 0 = default, disable cache and goto origin server
* 1 = return a 502 error on a cache miss
* 2 = serve stale if objectâs age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 3 = return a 502 error on a cache miss or serve stale on a cache revalidate if objectâs age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 4 = return a 502 error on either a cache miss or on a revalidation
Thanks,
Sudheer
On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:
HI Sudheer,
Thanks for the reply. I couldnât think of any reason either, but I wanted to check with the community.
Just for clarification. Weâre not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:
* Read While Writer
* Open Read Retry Timeout
* Open Write Fail Action
We certainly donât have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.
Thanks,
Nick
From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Reply-To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Subject: Re: Parent.config and thundering herd.
I havenât looked at parent proxy setup much, but at a high level, I canât think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesnât change, it shouldnât matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didnât see any problems with it.
Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing.
Thanks,
Sudheer
On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:
Hi,
Weâve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.
I donât all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=falseâ? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?
All this testing is with ATS 7.0 currently.
Thanks for your insight.
Nick
Nick Dunkin
Principal Engineer
o: 678.258.4071
e: ***@curr.com<mailto:***@ccur.com>
4375 River Green Pkwy # 100, Duluth, GA 30096, USA
<319E5E02-1647-4542-836C-D389403ADE5F.png>
<1520548213187blob.jpg>