Discussion:
Parent.config and thundering herd.
Dunkin, Nick
2018-03-08 15:25:07 UTC
Permalink
Hi,

We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.

I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?

All this testing is with ATS 7.0 currently.

Thanks for your insight.

Nick

Nick Dunkin
Principal Engineer
o: 678.258.4071
e: ***@curr.com<mailto:***@ccur.com>
4375 River Green Pkwy # 100, Duluth, GA 30096, USA
[cid:902550A1-F59F-4EDA-B98A-FF0B12771D75]
Sudheer Vinukonda
2018-03-08 16:49:51 UTC
Permalink
I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it.

Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing.

Thanks,

Sudheer
Post by Dunkin, Nick
Hi,
We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.
I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?
All this testing is with ATS 7.0 currently.
Thanks for your insight.
Nick
Nick Dunkin
Principal Engineer
o: 678.258.4071
4375 River Green Pkwy # 100, Duluth, GA 30096, USA
<319E5E02-1647-4542-836C-D389403ADE5F.png>
Dunkin, Nick
2018-03-08 16:58:29 UTC
Permalink
HI Sudheer,

Thanks for the reply. I couldn’t think of any reason either, but I wanted to check with the community.

Just for clarification. We’re not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:

* Read While Writer
* Open Read Retry Timeout
* Open Write Fail Action

We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.

Thanks,

Nick

From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Reply-To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Subject: Re: Parent.config and thundering herd.

I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it.

Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing.

Thanks,

Sudheer

On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:

Hi,

We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.

I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?

All this testing is with ATS 7.0 currently.

Thanks for your insight.

Nick

Nick Dunkin
Principal Engineer
o: 678.258.4071
e: ***@curr.com<mailto:***@ccur.com>
4375 River Green Pkwy # 100, Duluth, GA 30096, USA
<319E5E02-1647-4542-836C-D389403ADE5F.png>
Sudheer Vinukonda
2018-03-08 18:36:45 UTC
Permalink
Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream. 
Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?
records.config — Apache Traffic Server 6.2.1 documentation


|
|
| |
records.config — Apache Traffic Server 6.2.1 documentation


|

|

|



- proxy.config.http.cache.open_write_fail_action

| Scope: | CONFIG |
| Type: | INT |
| Default: | 0 |
| Reloadable: | Yes |
| Overridable: | Yes |


This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.

- 0 = default, disable cache and goto origin server
- 1 = return a 502 error on a cache miss
- 2 = serve stale if object’s age is under proxy.config.http.cache.max_stale_age, else go to origin server
- 3 = return a 502 error on a cache miss or serve stale on a cache revalidate if object’s age is under proxy.config.http.cache.max_stale_age, else go to origin server
- 4 = return a 502 error on either a cache miss or on a revalidation





Thanks,
Sudheer


On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <***@ccur.com> wrote:

HI Sudheer,
Thanks for the reply.  I couldn’t think of any reason either, but I wanted to check with the community.
Just for clarification.  We’re not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:
- Read While Writer
- Open Read Retry Timeout 
- Open Write Fail Action 
We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.
Thanks,
Nick
From: Sudheer Vinukonda <***@yahoo.com>
Reply-To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>
Subject: Re: Parent.config and thundering herd.

I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it. 
Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing. 
Thanks,
Sudheer 
On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com> wrote:


Hi,
We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results.  However the behavior seems to change when we start usingparent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing).  My initial tests are showing multiple access failures and not very much in the way of request coalescing. 
I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enableis enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”?  Especially when most of the time the Primary Origin will be up and available.  Are there any gotchas we should be aware of?
All this testing is with ATS 7.0 currently.
Thanks for your insight.
Nick


Nick Dunkin

Principal Engineer

o:   678.258.4071

e:   ***@curr.com 

4375 River Green Pkwy # 100, Duluth, GA 30096, USA

<319E5E02-1647-4542-836C-D389403ADE5F.png>
Dunkin, Nick
2018-03-08 19:25:52 UTC
Permalink
HI Sudheer,

I’m not sure we’re quite on the same page but I’m grateful for your input. This is all for ATS ver 7.0 and the documentation I’m talking about is on this page

https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html

In the section "Reducing Origin Server Requests (Avoiding the Thundering Herd)”

There is nothing in that section about these settings being associated with the Collapsed Forwarding plugin. In fact there is no mention of the Collapsed Forwarding plugin at all. Now I’m a little confused.

Is anyone able to clarify this for me? I thought I understood but maybe I don’t.

Thanks,

Nick


From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 1:36 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>
Subject: Re: Parent.config and thundering herd.

Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream.

Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?

records.config — Apache Traffic Server 6.2.1 documentation<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

records.config — Apache Traffic Server 6.2.1 documentation





proxy.config.http.cache.open_write_fail_action
Scope: CONFIG
Type: INT
Default: 0
Reloadable: Yes
Overridable: Yes
This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.

* 0 = default, disable cache and goto origin server
* 1 = return a 502 error on a cache miss
* 2 = serve stale if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 3 = return a 502 error on a cache miss or serve stale on a cache revalidate if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 4 = return a 502 error on either a cache miss or on a revalidation





Thanks,

Sudheer



On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


HI Sudheer,

Thanks for the reply. I couldn’t think of any reason either, but I wanted to check with the community.

Just for clarification. We’re not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:

* Read While Writer
* Open Read Retry Timeout
* Open Write Fail Action

We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.

Thanks,

Nick

From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Reply-To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Subject: Re: Parent.config and thundering herd.

I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it.

Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing.

Thanks,

Sudheer

On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:

Hi,

We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.

I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?

All this testing is with ATS 7.0 currently.

Thanks for your insight.

Nick


Nick Dunkin

Principal Engineer

o: 678.258.4071

e: ***@curr.com<mailto:***@ccur.com>

4375 River Green Pkwy # 100, Duluth, GA 30096, USA

<319E5E02-1647-4542-836C-D389403ADE5F.png>
Jeremy Payne
2018-03-08 21:01:16 UTC
Permalink
One thought is to turn on debug with parent routing enable then compare
against debug with parent routing disabled.
Do you see any extra or missing steps while cache is being looked up(read)
or when the open write lock is created ?
Post by Dunkin, Nick
HI Sudheer,
I’m not sure we’re quite on the same page but I’m grateful for your
input. This is all for ATS ver 7.0 and the documentation I’m talking about
is on this page
https://docs.trafficserver.apache.org/en/7.0.x/admin-
guide/configuration/cache-basics.en.html
In the section "*Reducing Origin Server Requests (Avoiding the Thundering
Herd)*”
There is nothing in that section about these settings being associated
with the Collapsed Forwarding plugin. In fact there is no mention of the
Collapsed Forwarding plugin at all. Now I’m a little confused.
Is anyone able to clarify this for me? I thought I understood but maybe I don’t.
Thanks,
Nick
Date: Thursday, March 8, 2018 at 1:36 PM
Subject: Re: Parent.config and thundering herd.
Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The
plugin in fact is based on the settings you mentioned below and allows to
block multiple parallel requests for the same object from leaking upstream.
Using the settings alone, without the plugin would not actually achieve
any request coalescing for cache miss scenarios -- it'd simply result in
returning an error back to the client. Is that what you meant by "seeing
request coalescing"? Or is your use case, not involving cache misses, but,
stale cache (e.g VOD)?
records.config — Apache Traffic Server 6.2.1 documentation
<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>
records.config — Apache Traffic Server 6.2.1 documentation
<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>
proxy.config.http.cache.open_write_fail_action
Scope: CONFIG
Type: INT
Default: 0
Reloadable: Yes
Overridable: Yes
This setting indicates the action taken on failing to obtain the cache
open write lock on either a cache miss or a cache hit stale. This typically
happens when there is more than one request to the same cache object
simultaneously. During such a scenario, all but one (which goes to the
origin) request is served either a stale copy or an error depending on this
setting.
- 0 = default, disable cache and goto origin server
- 1 = return a 502 error on a cache miss
- 2 = serve stale if object’s age is under proxy.config.http.cache.
max_stale_age
<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>,
else go to origin server
- 3 = return a 502 error on a cache miss or serve stale on a cache
revalidate if object’s age is under proxy.config.http.cache.
max_stale_age
<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>,
else go to origin server
- 4 = return a 502 error on either a cache miss or on a revalidation
Thanks,
Sudheer
On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <
HI Sudheer,
Thanks for the reply. I couldn’t think of any reason either, but I wanted
to check with the community.
Just for clarification. We’re not using the Collapsed-Forwarding plugin
explicitly, I understood that that plugin was deprecated in favor of the
- Read While Writer
- Open Read Retry Timeout
- Open Write Fail Action
We certainly don’t have the Collapsed-Forwarding plugin in the
plugin.config and we are seeing request coalescing.
Thanks,
Nick
Date: Thursday, March 8, 2018 at 11:49 AM
Subject: Re: Parent.config and thundering herd.
I haven’t looked at parent proxy setup much, but at a high level, I can’t
think of any reason why an origin failover mechanism would impact request
coalescing using collapsed forwarding plugin. The open write fail action
works based on the cache key for the object and as long as that doesn’t
change, it shouldn’t matter which origin it is pulled from. As a matter of
fact, we have had origin failover setup using a custom plugin as well as
request coalescing enabled in our HLS delivery servers and didn’t see any
problems with it.
Is it possible the access failures are resulting in preventing the object
from being downloaded or being cached somehow? If the object is never
cached, then you will see problems with request coalescing.
Thanks,
Sudheer
Hi,
We’ve been using the Thundering Herd protection provided by *Read While
Writer*, *Open Read Retry Timeout* and *Open Write Fail Action* and have
been getting some great results. However the behavior seems to change when
we start using *parent.config* in order to provide some simple origin
failover (I.e simple Primary/Secondary Origin kind of thing). My initial
tests are showing multiple access failures and not very much in the way of
request coalescing.
I don’t all have the details with me now, but at a high level, should we
expect Read While Writer, Open Read Retry Timeout and Open Write Fail
Action to all work in the same way when *proxy.config.http.parent_proxy_routing_enable
*is enabled and we have a simple Primary/Secondary Origin configured with
"parent_is_proxy=false”? Especially when most of the time the Primary
Origin will be up and available. Are there any gotchas we should be
aware of?
All this testing is with ATS 7.0 currently.
Thanks for your insight.
Nick
*Nick Dunkin*
*Principal Engineer*
*o:* 678.258.4071 <(678)%20258-4071>
4375 River Green Pkwy # 100, Duluth, GA
<https://maps.google.com/?q=4375+River+Green+Pkwy+%23+100,+Duluth,+GA+30096,+USA&entry=gmail&source=g>
30096, USA
<https://maps.google.com/?q=4375+River+Green+Pkwy+%23+100,+Duluth,+GA+30096,+USA&entry=gmail&source=g>
<319E5E02-1647-4542-836C-D389403ADE5F.png>
Dunkin, Nick
2018-03-08 21:54:36 UTC
Permalink
Hi Jeremy,

Sure, I can try that. I’ll report back with what I find.

Thanks,

Nick

From: Jeremy Payne <***@gmail.com<mailto:***@gmail.com>>
Reply-To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Date: Thursday, March 8, 2018 at 4:01 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Subject: Re: Parent.config and thundering herd.

One thought is to turn on debug with parent routing enable then compare against debug with parent routing disabled.
Do you see any extra or missing steps while cache is being looked up(read) or when the open write lock is created ?






On Thu, Mar 8, 2018 at 1:25 PM, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:
HI Sudheer,

I’m not sure we’re quite on the same page but I’m grateful for your input. This is all for ATS ver 7.0 and the documentation I’m talking about is on this page

https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html

In the section "Reducing Origin Server Requests (Avoiding the Thundering Herd)”

There is nothing in that section about these settings being associated with the Collapsed Forwarding plugin. In fact there is no mention of the Collapsed Forwarding plugin at all. Now I’m a little confused.

Is anyone able to clarify this for me? I thought I understood but maybe I don’t.

Thanks,

Nick


From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 1:36 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>

Subject: Re: Parent.config and thundering herd.

Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream.

Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?

records.config — Apache Traffic Server 6.2.1 documentation<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

records.config — Apache Traffic Server 6.2.1 documentation





proxy.config.http.cache.open_write_fail_action
Scope: CONFIG
Type: INT
Default: 0
Reloadable: Yes
Overridable: Yes
This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.

* 0 = default, disable cache and goto origin server
* 1 = return a 502 error on a cache miss
* 2 = serve stale if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 3 = return a 502 error on a cache miss or serve stale on a cache revalidate if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 4 = return a 502 error on either a cache miss or on a revalidation





Thanks,

Sudheer



On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


HI Sudheer,

Thanks for the reply. I couldn’t think of any reason either, but I wanted to check with the community.

Just for clarification. We’re not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:

* Read While Writer
* Open Read Retry Timeout
* Open Write Fail Action

We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.

Thanks,

Nick

From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Reply-To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Subject: Re: Parent.config and thundering herd.

I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it.

Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing.

Thanks,

Sudheer

On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:

Hi,

We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.

I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?

All this testing is with ATS 7.0 currently.

Thanks for your insight.

Nick


Nick Dunkin

Principal Engineer

o: 678.258.4071<tel:(678)%20258-4071>

e: ***@curr.com<mailto:***@ccur.com>

4375 River Green Pkwy # 100, Duluth, GA<https://maps.google.com/?q=4375+River+Green+Pkwy+%23+100,+Duluth,+GA+30096,+USA&entry=gmail&source=g> 30096, USA<https://maps.google.com/?q=4375+River+Green+Pkwy+%23+100,+Duluth,+GA+30096,+USA&entry=gmail&source=g>

<319E5E02-1647-4542-836C-D389403ADE5F.png>
Sudheer Vinukonda
2018-03-08 22:37:54 UTC
Permalink
AFAIK, collapsed_forwarding plugin is still actively used in production for live and vod streaming by a few companies and I'm not aware of any plans to deprecate it (we did agree on deprecating coallpsed_connection plugin which is somewhat similar in what it does, but, different in how it does -- perhaps, you were referring to that?).
I copied an older link earlier for open_write_fail_action, mainly because, it hasn't changed much in 7.x in what it does. Please see below 7.x reference. 
With the open_write_fail_action feature (within the ATS core), ATS would return errors to all but one, on seeing multiple concurrent requests for the same object. For example, if you were doing a live streaming and a 1000 clients requested for the same segment file that is not in the Delivery Server's cache yet, enabling open_write_fail_action feature allows to return 502 to 999 clients, while the other request fetches the segment and populates the cache. As long as the clients retry, this should mostly work. However, if you do not like to return errors to clients (we certainly did not, as it'd make things much worse by causing a retry storm), collapsed_forwarding plugin can hold those requests waiting for the one request that was proxy'ed over to the Origin to fetch the segment and fill the cache. Once the segment is fetched and the writing to cache begins, the other requests can then join the party (that's where, read-while-writer comes into picture), and start streaming to all the clients at the same time.
Now, it's possible that you may have never used the collapsed_forwarding plugin and somehow happened to not see the problem of returning 502 errors to clients, but, it's always possible depending on the scale, concurrency (and in particular, the origin latency). Perhaps, enabling parent proxy may have exposed the problem, by somehow making the latency worse?


records.config — Apache Traffic Server 7.0.0 documentation


|
|
| |
records.config — Apache Traffic Server 7.0.0 documentation


|

|

|








On Thursday, March 8, 2018, 11:26:13 AM PST, Dunkin, Nick <***@ccur.com> wrote:

HI Sudheer,
I’m not sure we’re quite on the same page but I’m grateful for your input.  This is all for ATS ver 7.0 and the documentation I’m talking about is on this page
https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html
In the section "Reducing Origin Server Requests (Avoiding the Thundering Herd)”
There is nothing in that section about these settings being associated with the Collapsed Forwarding plugin.  In fact there is no mention of the Collapsed Forwarding plugin at all.  Now I’m a little confused. 
Is anyone able to clarify this for me?  I thought I understood but maybe I don’t.  
Thanks,
Nick

From: Sudheer Vinukonda <***@yahoo.com>
Date: Thursday, March 8, 2018 at 1:36 PM
To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>, Nick Dunkin <***@ccur.com>
Subject: Re: Parent.config and thundering herd.

Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream. 
Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?
records.config — Apache Traffic Server 6.2.1 documentation


|
|
| |
records.config — Apache Traffic Server 6.2.1 documentation


|

|

|



- proxy.config.http.cache.open_write_fail_action

| Scope: | CONFIG |
| Type: | INT |
| Default: | 0 |
| Reloadable: | Yes |
| Overridable: | Yes |


This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.

- 0 = default, disable cache and goto origin server
- 1 = return a 502 error on a cache miss
- 2 = serve stale if object’s age is under proxy.config.http.cache.max_stale_age, else go to origin server
- 3 = return a 502 error on a cache miss or serve stale on a cache revalidate if object’s age is under proxy.config.http.cache.max_stale_age, else go to origin server
- 4 = return a 502 error on either a cache miss or on a revalidation





Thanks,
Sudheer


On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <***@ccur.com> wrote:

HI Sudheer,
Thanks for the reply.  I couldn’t think of any reason either, but I wanted to check with the community.
Just for clarification.  We’re not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:
- Read While Writer
- Open Read Retry Timeout 
- Open Write Fail Action 
We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.
Thanks,
Nick
From: Sudheer Vinukonda <***@yahoo.com>
Reply-To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>
Subject: Re: Parent.config and thundering herd.

I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it. 
Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing. 
Thanks,
Sudheer 
On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com> wrote:


Hi,
We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results.  However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing).  My initial tests are showing multiple access failures and not very much in the way of request coalescing. 
I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enableis enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”?  Especially when most of the time the Primary Origin will be up and available.  Are there any gotchas we should be aware of?
All this testing is with ATS 7.0 currently.
Thanks for your insight.
Nick


Nick Dunkin

Principal Engineer

o:   678.258.4071

e:   ***@curr.com 

4375 River Green Pkwy # 100, Duluth, GA 30096, USA

<319E5E02-1647-4542-836C-D389403ADE5F.png>
Dunkin, Nick
2018-03-08 23:22:49 UTC
Permalink
Hi Sudheer,

I really think I’m missing something. Please allow me to check my understanding from the beginning.

We followed the documentation at the link I provided earlier, specifically the section on Reducing Origin Server Requests (Avoiding the Thundering Herd).

We added the required prerequisite configurations (as per the documentation):

CONFIG proxy.config.cache.enable_read_while_writer INT 1
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
CONFIG proxy.config.cache.max_doc_size INT 0

And we also chose sensible settings for each of the following configurations (as per the documentation):

CONFIG proxy.config.cache.read_while_writer.max_retries INT xxx
CONFIG proxy.config.cache.read_while_writer_retry.delay INT xxx
CONFIG proxy.config.http.cache.max_open_read_retries INT xxx
CONFIG proxy.config.http.cache.open_read_retry_time INT xxx
CONFIG proxy.config.http.cache.open_write_fail_action INT xxx

The documentation then states - "Once these are enabled, you have something that is very close, but not quite the same, to Squid’s Collapsed Forwarding.”

AFAIK none of this involves the Collapsed Forwarding plugin. The documentation doesn’t mention the Collapsed Forwarding plugin. We don’t have the Collapsed Forwarding plugin declared in our plugin.config.

It was my understanding that these settings were orthogonal to the Collapsed Forwarding plugin but provided similar functionality “out of the box”.

Please can you let me know if I have misunderstood the documentation? Maybe this section of the documentation is outdated?

Many thanks for your patience,

Nick

From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 5:37 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>
Subject: Re: Parent.config and thundering herd.

AFAIK, collapsed_forwarding plugin is still actively used in production for live and vod streaming by a few companies and I'm not aware of any plans to deprecate it (we did agree on deprecating coallpsed_connection plugin which is somewhat similar in what it does, but, different in how it does -- perhaps, you were referring to that?).

I copied an older link earlier for open_write_fail_action, mainly because, it hasn't changed much in 7.x in what it does. Please see below 7.x reference.

With the open_write_fail_action feature (within the ATS core), ATS would return errors to all but one, on seeing multiple concurrent requests for the same object. For example, if you were doing a live streaming and a 1000 clients requested for the same segment file that is not in the Delivery Server's cache yet, enabling open_write_fail_action feature allows to return 502 to 999 clients, while the other request fetches the segment and populates the cache. As long as the clients retry, this should mostly work. However, if you do not like to return errors to clients (we certainly did not, as it'd make things much worse by causing a retry storm), collapsed_forwarding plugin can hold those requests waiting for the one request that was proxy'ed over to the Origin to fetch the segment and fill the cache. Once the segment is fetched and the writing to cache begins, the other requests can then join the party (that's where, read-while-writer comes into picture), and start streaming to all the clients at the same time.

Now, it's possible that you may have never used the collapsed_forwarding plugin and somehow happened to not see the problem of returning 502 errors to clients, but, it's always possible depending on the scale, concurrency (and in particular, the origin latency). Perhaps, enabling parent proxy may have exposed the problem, by somehow making the latency worse?



records.config — Apache Traffic Server 7.0.0 documentation<https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

<https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

records.config — Apache Traffic Server 7.0.0 documentation






[Inline image]



On Thursday, March 8, 2018, 11:26:13 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


HI Sudheer,

I’m not sure we’re quite on the same page but I’m grateful for your input. This is all for ATS ver 7.0 and the documentation I’m talking about is on this page

https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html

In the section "Reducing Origin Server Requests (Avoiding the Thundering Herd)”

There is nothing in that section about these settings being associated with the Collapsed Forwarding plugin. In fact there is no mention of the Collapsed Forwarding plugin at all. Now I’m a little confused.

Is anyone able to clarify this for me? I thought I understood but maybe I don’t.

Thanks,

Nick


From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 1:36 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>
Subject: Re: Parent.config and thundering herd.

Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream.

Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?

records.config — Apache Traffic Server 6.2.1 documentation<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

records.config — Apache Traffic Server 6.2.1 documentation





proxy.config.http.cache.open_write_fail_action
Scope: CONFIG
Type: INT
Default: 0
Reloadable: Yes
Overridable: Yes
This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.

* 0 = default, disable cache and goto origin server
* 1 = return a 502 error on a cache miss
* 2 = serve stale if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 3 = return a 502 error on a cache miss or serve stale on a cache revalidate if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 4 = return a 502 error on either a cache miss or on a revalidation





Thanks,

Sudheer



On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


HI Sudheer,

Thanks for the reply. I couldn’t think of any reason either, but I wanted to check with the community.

Just for clarification. We’re not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:

* Read While Writer
* Open Read Retry Timeout
* Open Write Fail Action

We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.

Thanks,

Nick

From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Reply-To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Subject: Re: Parent.config and thundering herd.

I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it.

Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing.

Thanks,

Sudheer

On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:

Hi,

We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.

I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?

All this testing is with ATS 7.0 currently.

Thanks for your insight.

Nick


Nick Dunkin

Principal Engineer

o: 678.258.4071

e: ***@curr.com<mailto:***@ccur.com>

4375 River Green Pkwy # 100, Duluth, GA 30096, USA

<319E5E02-1647-4542-836C-D389403ADE5F.png>
Sudheer Vinukonda
2018-03-09 01:54:03 UTC
Permalink
I can see how the documentation might be slightly misleading. You are right that these settings by themselves are orthogonal to whether or not you've collapsed forwarding plugin enabled. And configuring them would indeed avoid Thundering Herd to the Origins, in the sense that, at most, one request per object is leaked upstream. 
However, in terms of the net result to the client, as clearly described in the docs, the best these settings can achieve is to return an error to the client or a stale copy, when applicable/available (for example, an older manifest file in case of HLS streaming). This is generally not a desirable behavior for many video solutions and this is where the collapsed_forwarding plugin comes into play. That plugin essentially is built on top of the open_write_fail_action, intercepts the error from going back to the client and waits until the cache is filled with the needed object. The net result, in this case, is clearly better experience to the users and friendlier to the clients (e.g video players). 
Technically, using collapsed_forwarding plugin would still be an "out-of-the-box" solution, as long as you compile the plugin and set it up correctly. 
More info about how the plugin works is at Collapsed Forwarding Plugin — Apache Traffic Server 8.0.0 documentation

|
|
| |
Collapsed Forwarding Plugin — Apache Traffic Server 8.0.0 documentation


|

|

|



Hope this helps.
Thanks,
Sudheer



On Thursday, March 8, 2018, 3:22:56 PM PST, Dunkin, Nick <***@ccur.com> wrote:

Hi Sudheer,
I really think I’m missing something.  Please allow me to check my understanding from the beginning.
We followed the documentation at the link I provided earlier, specifically the section on Reducing Origin Server Requests (Avoiding the Thundering Herd).  
We added the required prerequisite configurations (as per the documentation):
CONFIG proxy.config.cache.enable_read_while_writer INT 1CONFIG proxy.config.http.background_fill_active_timeout INT 0CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000CONFIG proxy.config.cache.max_doc_size INT 0
And we also chose sensible settings for each of the following configurations (as per the documentation):
CONFIG proxy.config.cache.read_while_writer.max_retries INT xxxCONFIG proxy.config.cache.read_while_writer_retry.delay INT xxxCONFIG proxy.config.http.cache.max_open_read_retries INT xxxCONFIG proxy.config.http.cache.open_read_retry_time INT xxxCONFIG proxy.config.http.cache.open_write_fail_action INT xxx
The documentation then states - "Once these are enabled, you have something that is very close, but not quite the same, to Squid’s Collapsed Forwarding.”
AFAIK none of this involves the Collapsed Forwarding plugin.  The documentation doesn’t mention the Collapsed Forwarding plugin.  We don’t have the Collapsed Forwarding plugin declared in our plugin.config.
It was my understanding that these settings were orthogonal to the Collapsed Forwarding plugin but provided similar functionality “out of the box”.
Please can you let me know if I have misunderstood the documentation?  Maybe this section of the documentation is outdated? 
Many thanks for your patience,
Nick
From: Sudheer Vinukonda <***@yahoo.com>
Date: Thursday, March 8, 2018 at 5:37 PM
To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>, Nick Dunkin <***@ccur.com>
Subject: Re: Parent.config and thundering herd.

AFAIK, collapsed_forwarding plugin is still actively used in production for live and vod streaming by a few companies and I'm not aware of any plans to deprecate it (we did agree on deprecating coallpsed_connection plugin which is somewhat similar in what it does, but, different in how it does -- perhaps, you were referring to that?).
I copied an older link earlier for open_write_fail_action, mainly because, it hasn't changed much in 7.x in what it does. Please see below 7.x reference. 
With the open_write_fail_action feature (within the ATS core), ATS would return errors to all but one, on seeing multiple concurrent requests for the same object. For example, if you were doing a live streaming and a 1000 clients requested for the same segment file that is not in the Delivery Server's cache yet, enabling open_write_fail_action feature allows to return 502 to 999 clients, while the other request fetches the segment and populates the cache. As long as the clients retry, this should mostly work. However, if you do not like to return errors to clients (we certainly did not, as it'd make things much worse by causing a retry storm), collapsed_forwarding plugin can hold those requests waiting for the one request that was proxy'ed over to the Origin to fetch the segment and fill the cache. Once the segment is fetched and the writing to cache begins, the other requests can then join the party (that's where, read-while-writer comes into picture), and start streaming to all the clients at the same time.
Now, it's possible that you may have never used the collapsed_forwarding plugin and somehow happened to not see the problem of returning 502 errors to clients, but, it's always possible depending on the scale, concurrency (and in particular, the origin latency). Perhaps, enabling parent proxy may have exposed the problem, by somehow making the latency worse?


records.config — Apache Traffic Server 7.0.0 documentation


|
|
| |
records.config — Apache Traffic Server 7.0.0 documentation


|

|

|








On Thursday, March 8, 2018, 11:26:13 AM PST, Dunkin, Nick <***@ccur.com> wrote:

HI Sudheer,
I’m not sure we’re quite on the same page but I’m grateful for your input.  This is all for ATS ver 7.0 and the documentation I’m talking about is on this page
https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html
In the section "Reducing Origin Server Requests (Avoiding the Thundering Herd)”
There is nothing in that section about these settings being associated with the Collapsed Forwarding plugin.  In fact there is no mention of the Collapsed Forwarding plugin at all.  Now I’m a little confused. 
Is anyone able to clarify this for me?  I thought I understood but maybe I don’t.  
Thanks,
Nick

From: Sudheer Vinukonda <***@yahoo.com>
Date: Thursday, March 8, 2018 at 1:36 PM
To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>, Nick Dunkin <***@ccur.com>
Subject: Re: Parent.config and thundering herd.

Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream. 
Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?
records.config — Apache Traffic Server 6.2.1 documentation


|
|
| |
records.config — Apache Traffic Server 6.2.1 documentation


|

|

|



- proxy.config.http.cache.open_write_fail_action

| Scope: | CONFIG |
| Type: | INT |
| Default: | 0 |
| Reloadable: | Yes |
| Overridable: | Yes |


This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.

- 0 = default, disable cache and goto origin server
- 1 = return a 502 error on a cache miss
- 2 = serve stale if object’s age is under proxy.config.http.cache.max_stale_age, else go to origin server
- 3 = return a 502 error on a cache miss or serve stale on a cache revalidate if object’s age is under proxy.config.http.cache.max_stale_age, else go to origin server
- 4 = return a 502 error on either a cache miss or on a revalidation





Thanks,
Sudheer


On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <***@ccur.com> wrote:

HI Sudheer,
Thanks for the reply.  I couldn’t think of any reason either, but I wanted to check with the community.
Just for clarification.  We’re not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:
- Read While Writer
- Open Read Retry Timeout 
- Open Write Fail Action 
We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.
Thanks,
Nick
From: Sudheer Vinukonda <***@yahoo.com>
Reply-To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>
Subject: Re: Parent.config and thundering herd.

I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it. 
Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing. 
Thanks,
Sudheer 
On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com> wrote:


Hi,
We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results.  However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing).  My initial tests are showing multiple access failures and not very much in the way of request coalescing. 
I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enableis enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”?  Especially when most of the time the Primary Origin will be up and available.  Are there any gotchas we should be aware of?
All this testing is with ATS 7.0 currently.
Thanks for your insight.
Nick


Nick Dunkin

Principal Engineer

o:   678.258.4071

e:   ***@curr.com 

4375 River Green Pkwy # 100, Duluth, GA 30096, USA

<319E5E02-1647-4542-836C-D389403ADE5F.png>
Dunkin, Nick
2018-03-09 02:15:46 UTC
Permalink
Hi Sudheer

Thank you for such a detailed description. I think I understand now. I will look at putting the plugin in place to fully harden the solution.

I really appreciate your assistance.

Regards,

Nick

Sent from my iPhone

On Mar 8, 2018, at 8:54 PM, Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>> wrote:

I can see how the documentation might be slightly misleading. You are right that these settings by themselves are orthogonal to whether or not you've collapsed forwarding plugin enabled. And configuring them would indeed avoid Thundering Herd to the Origins, in the sense that, at most, one request per object is leaked upstream.

However, in terms of the net result to the client, as clearly described in the docs, the best these settings can achieve is to return an error to the client or a stale copy, when applicable/available (for example, an older manifest file in case of HLS streaming). This is generally not a desirable behavior for many video solutions and this is where the collapsed_forwarding plugin comes into play. That plugin essentially is built on top of the open_write_fail_action, intercepts the error from going back to the client and waits until the cache is filled with the needed object. The net result, in this case, is clearly better experience to the users and friendlier to the clients (e.g video players).

Technically, using collapsed_forwarding plugin would still be an "out-of-the-box" solution, as long as you compile the plugin and set it up correctly.

More info about how the plugin works is at Collapsed Forwarding Plugin — Apache Traffic Server 8.0.0 documentation<https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html>

<https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html>

Collapsed Forwarding Plugin — Apache Traffic Server 8.0.0 documentation





Hope this helps.

Thanks,

Sudheer




On Thursday, March 8, 2018, 3:22:56 PM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


Hi Sudheer,

I really think I’m missing something. Please allow me to check my understanding from the beginning.

We followed the documentation at the link I provided earlier, specifically the section on Reducing Origin Server Requests (Avoiding the Thundering Herd).

We added the required prerequisite configurations (as per the documentation):

CONFIG proxy.config.cache.enable_read_while_writer INT 1
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
CONFIG proxy.config.cache.max_doc_size INT 0

And we also chose sensible settings for each of the following configurations (as per the documentation):

CONFIG proxy.config.cache.read_while_writer.max_retries INT xxx
CONFIG proxy.config.cache.read_while_writer_retry.delay INT xxx
CONFIG proxy.config.http.cache.max_open_read_retries INT xxx
CONFIG proxy.config.http.cache.open_read_retry_time INT xxx
CONFIG proxy.config.http.cache.open_write_fail_action INT xxx

The documentation then states - "Once these are enabled, you have something that is very close, but not quite the same, to Squid’s Collapsed Forwarding.”

AFAIK none of this involves the Collapsed Forwarding plugin. The documentation doesn’t mention the Collapsed Forwarding plugin. We don’t have the Collapsed Forwarding plugin declared in our plugin.config.

It was my understanding that these settings were orthogonal to the Collapsed Forwarding plugin but provided similar functionality “out of the box”.

Please can you let me know if I have misunderstood the documentation? Maybe this section of the documentation is outdated?

Many thanks for your patience,

Nick

From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 5:37 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>
Subject: Re: Parent.config and thundering herd.

AFAIK, collapsed_forwarding plugin is still actively used in production for live and vod streaming by a few companies and I'm not aware of any plans to deprecate it (we did agree on deprecating coallpsed_connection plugin which is somewhat similar in what it does, but, different in how it does -- perhaps, you were referring to that?).

I copied an older link earlier for open_write_fail_action, mainly because, it hasn't changed much in 7.x in what it does. Please see below 7.x reference.

With the open_write_fail_action feature (within the ATS core), ATS would return errors to all but one, on seeing multiple concurrent requests for the same object. For example, if you were doing a live streaming and a 1000 clients requested for the same segment file that is not in the Delivery Server's cache yet, enabling open_write_fail_action feature allows to return 502 to 999 clients, while the other request fetches the segment and populates the cache. As long as the clients retry, this should mostly work. However, if you do not like to return errors to clients (we certainly did not, as it'd make things much worse by causing a retry storm), collapsed_forwarding plugin can hold those requests waiting for the one request that was proxy'ed over to the Origin to fetch the segment and fill the cache. Once the segment is fetched and the writing to cache begins, the other requests can then join the party (that's where, read-while-writer comes into picture), and start streaming to all the clients at the same time.

Now, it's possible that you may have never used the collapsed_forwarding plugin and somehow happened to not see the problem of returning 502 errors to clients, but, it's always possible depending on the scale, concurrency (and in particular, the origin latency). Perhaps, enabling parent proxy may have exposed the problem, by somehow making the latency worse?



records.config — Apache Traffic Server 7.0.0 documentation<https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

<https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

records.config — Apache Traffic Server 7.0.0 documentation






<1520548213187blob.jpg>



On Thursday, March 8, 2018, 11:26:13 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


HI Sudheer,

I’m not sure we’re quite on the same page but I’m grateful for your input. This is all for ATS ver 7.0 and the documentation I’m talking about is on this page

https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html

In the section "Reducing Origin Server Requests (Avoiding the Thundering Herd)”

There is nothing in that section about these settings being associated with the Collapsed Forwarding plugin. In fact there is no mention of the Collapsed Forwarding plugin at all. Now I’m a little confused.

Is anyone able to clarify this for me? I thought I understood but maybe I don’t.

Thanks,

Nick


From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 1:36 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>
Subject: Re: Parent.config and thundering herd.

Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream.

Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?

records.config — Apache Traffic Server 6.2.1 documentation<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

records.config — Apache Traffic Server 6.2.1 documentation





proxy.config.http.cache.open_write_fail_action
Scope: CONFIG
Type: INT
Default: 0
Reloadable: Yes
Overridable: Yes
This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.

* 0 = default, disable cache and goto origin server
* 1 = return a 502 error on a cache miss
* 2 = serve stale if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 3 = return a 502 error on a cache miss or serve stale on a cache revalidate if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 4 = return a 502 error on either a cache miss or on a revalidation





Thanks,

Sudheer



On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


HI Sudheer,

Thanks for the reply. I couldn’t think of any reason either, but I wanted to check with the community.

Just for clarification. We’re not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:

* Read While Writer
* Open Read Retry Timeout
* Open Write Fail Action

We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.

Thanks,

Nick

From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Reply-To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Subject: Re: Parent.config and thundering herd.

I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it.

Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing.

Thanks,

Sudheer

On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:

Hi,

We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.

I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?

All this testing is with ATS 7.0 currently.

Thanks for your insight.

Nick


Nick Dunkin

Principal Engineer

o: 678.258.4071

e: ***@curr.com<mailto:***@ccur.com>

4375 River Green Pkwy # 100, Duluth, GA 30096, USA

<319E5E02-1647-4542-836C-D389403ADE5F.png>

<1520548213187blob.jpg>
Dunkin, Nick
2018-03-09 02:20:51 UTC
Permalink
Hi Sudheer

Sorry, one quick follow up question.

We have other plugins declared in the plugin.config file. Should the collapsed forwarding plugin precede them or go at the end? Or does it not matter?

Thanks

Nick



Sent from my iPhone

On Mar 8, 2018, at 8:54 PM, Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>> wrote:

I can see how the documentation might be slightly misleading. You are right that these settings by themselves are orthogonal to whether or not you've collapsed forwarding plugin enabled. And configuring them would indeed avoid Thundering Herd to the Origins, in the sense that, at most, one request per object is leaked upstream.

However, in terms of the net result to the client, as clearly described in the docs, the best these settings can achieve is to return an error to the client or a stale copy, when applicable/available (for example, an older manifest file in case of HLS streaming). This is generally not a desirable behavior for many video solutions and this is where the collapsed_forwarding plugin comes into play. That plugin essentially is built on top of the open_write_fail_action, intercepts the error from going back to the client and waits until the cache is filled with the needed object. The net result, in this case, is clearly better experience to the users and friendlier to the clients (e.g video players).

Technically, using collapsed_forwarding plugin would still be an "out-of-the-box" solution, as long as you compile the plugin and set it up correctly.

More info about how the plugin works is at Collapsed Forwarding Plugin — Apache Traffic Server 8.0.0 documentation<https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html>

<https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html>

Collapsed Forwarding Plugin — Apache Traffic Server 8.0.0 documentation





Hope this helps.

Thanks,

Sudheer




On Thursday, March 8, 2018, 3:22:56 PM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


Hi Sudheer,

I really think I’m missing something. Please allow me to check my understanding from the beginning.

We followed the documentation at the link I provided earlier, specifically the section on Reducing Origin Server Requests (Avoiding the Thundering Herd).

We added the required prerequisite configurations (as per the documentation):

CONFIG proxy.config.cache.enable_read_while_writer INT 1
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
CONFIG proxy.config.cache.max_doc_size INT 0

And we also chose sensible settings for each of the following configurations (as per the documentation):

CONFIG proxy.config.cache.read_while_writer.max_retries INT xxx
CONFIG proxy.config.cache.read_while_writer_retry.delay INT xxx
CONFIG proxy.config.http.cache.max_open_read_retries INT xxx
CONFIG proxy.config.http.cache.open_read_retry_time INT xxx
CONFIG proxy.config.http.cache.open_write_fail_action INT xxx

The documentation then states - "Once these are enabled, you have something that is very close, but not quite the same, to Squid’s Collapsed Forwarding.”

AFAIK none of this involves the Collapsed Forwarding plugin. The documentation doesn’t mention the Collapsed Forwarding plugin. We don’t have the Collapsed Forwarding plugin declared in our plugin.config.

It was my understanding that these settings were orthogonal to the Collapsed Forwarding plugin but provided similar functionality “out of the box”.

Please can you let me know if I have misunderstood the documentation? Maybe this section of the documentation is outdated?

Many thanks for your patience,

Nick

From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 5:37 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>
Subject: Re: Parent.config and thundering herd.

AFAIK, collapsed_forwarding plugin is still actively used in production for live and vod streaming by a few companies and I'm not aware of any plans to deprecate it (we did agree on deprecating coallpsed_connection plugin which is somewhat similar in what it does, but, different in how it does -- perhaps, you were referring to that?).

I copied an older link earlier for open_write_fail_action, mainly because, it hasn't changed much in 7.x in what it does. Please see below 7.x reference.

With the open_write_fail_action feature (within the ATS core), ATS would return errors to all but one, on seeing multiple concurrent requests for the same object. For example, if you were doing a live streaming and a 1000 clients requested for the same segment file that is not in the Delivery Server's cache yet, enabling open_write_fail_action feature allows to return 502 to 999 clients, while the other request fetches the segment and populates the cache. As long as the clients retry, this should mostly work. However, if you do not like to return errors to clients (we certainly did not, as it'd make things much worse by causing a retry storm), collapsed_forwarding plugin can hold those requests waiting for the one request that was proxy'ed over to the Origin to fetch the segment and fill the cache. Once the segment is fetched and the writing to cache begins, the other requests can then join the party (that's where, read-while-writer comes into picture), and start streaming to all the clients at the same time.

Now, it's possible that you may have never used the collapsed_forwarding plugin and somehow happened to not see the problem of returning 502 errors to clients, but, it's always possible depending on the scale, concurrency (and in particular, the origin latency). Perhaps, enabling parent proxy may have exposed the problem, by somehow making the latency worse?



records.config — Apache Traffic Server 7.0.0 documentation<https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

<https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

records.config — Apache Traffic Server 7.0.0 documentation






<1520548213187blob.jpg>



On Thursday, March 8, 2018, 11:26:13 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


HI Sudheer,

I’m not sure we’re quite on the same page but I’m grateful for your input. This is all for ATS ver 7.0 and the documentation I’m talking about is on this page

https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html

In the section "Reducing Origin Server Requests (Avoiding the Thundering Herd)”

There is nothing in that section about these settings being associated with the Collapsed Forwarding plugin. In fact there is no mention of the Collapsed Forwarding plugin at all. Now I’m a little confused.

Is anyone able to clarify this for me? I thought I understood but maybe I don’t.

Thanks,

Nick


From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Date: Thursday, March 8, 2018 at 1:36 PM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>, Nick Dunkin <***@ccur.com<mailto:***@ccur.com>>
Subject: Re: Parent.config and thundering herd.

Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream.

Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?

records.config — Apache Traffic Server 6.2.1 documentation<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-open-write-fail-action>

records.config — Apache Traffic Server 6.2.1 documentation





proxy.config.http.cache.open_write_fail_action
Scope: CONFIG
Type: INT
Default: 0
Reloadable: Yes
Overridable: Yes
This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.

* 0 = default, disable cache and goto origin server
* 1 = return a 502 error on a cache miss
* 2 = serve stale if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 3 = return a 502 error on a cache miss or serve stale on a cache revalidate if object’s age is under proxy.config.http.cache.max_stale_age<https://docs.trafficserver.apache.org/en/6.2.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age>, else go to origin server
* 4 = return a 502 error on either a cache miss or on a revalidation





Thanks,

Sudheer



On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:


HI Sudheer,

Thanks for the reply. I couldn’t think of any reason either, but I wanted to check with the community.

Just for clarification. We’re not using the Collapsed-Forwarding plugin explicitly, I understood that that plugin was deprecated in favor of the three configuration areas I mentioned:

* Read While Writer
* Open Read Retry Timeout
* Open Write Fail Action

We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.

Thanks,

Nick

From: Sudheer Vinukonda <***@yahoo.com<mailto:***@yahoo.com>>
Reply-To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Date: Thursday, March 8, 2018 at 11:49 AM
To: "***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>" <***@trafficserver.apache.org<mailto:***@trafficserver.apache.org>>
Subject: Re: Parent.config and thundering herd.

I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it.

Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing.

Thanks,

Sudheer

On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <***@ccur.com<mailto:***@ccur.com>> wrote:

Hi,

We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.

I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?

All this testing is with ATS 7.0 currently.

Thanks for your insight.

Nick


Nick Dunkin

Principal Engineer

o: 678.258.4071

e: ***@curr.com<mailto:***@ccur.com>

4375 River Green Pkwy # 100, Duluth, GA 30096, USA

<319E5E02-1647-4542-836C-D389403ADE5F.png>

<1520548213187blob.jpg>
Sudheer Vinukonda
2018-03-09 02:37:56 UTC
Permalink
collapsed_forwarding plugin can be configured either as a global plugin (plugin.config) or as a remap plugin (remap.config). We’ve had it setup as a remap plugin as it’s mainly relevant for HLS streaming scenarios and we’ve had other static objects served from the same delivery servers which never run into such concurrency related problems.

If you’d like to set this up also in remap mode, you’d simply append it to the relevant remap rules. If you’d like to use it in global mode, it should be okay to add it to plugin.config as the last line. The doc link for the plugin should also have the set up instructions.

https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html

Thanks,

Sudheer
Post by Dunkin, Nick
Hi Sudheer
Sorry, one quick follow up question.
We have other plugins declared in the plugin.config file. Should the collapsed forwarding plugin precede them or go at the end? Or does it not matter?
Thanks
Nick
Sent from my iPhone
Post by Sudheer Vinukonda
I can see how the documentation might be slightly misleading. You are right that these settings by themselves are orthogonal to whether or not you've collapsed forwarding plugin enabled. And configuring them would indeed avoid Thundering Herd to the Origins, in the sense that, at most, one request per object is leaked upstream.
However, in terms of the net result to the client, as clearly described in the docs, the best these settings can achieve is to return an error to the client or a stale copy, when applicable/available (for example, an older manifest file in case of HLS streaming). This is generally not a desirable behavior for many video solutions and this is where the collapsed_forwarding plugin comes into play. That plugin essentially is built on top of the open_write_fail_action, intercepts the error from going back to the client and waits until the cache is filled with the needed object. The net result, in this case, is clearly better experience to the users and friendlier to the clients (e.g video players).
Technically, using collapsed_forwarding plugin would still be an "out-of-the-box" solution, as long as you compile the plugin and set it up correctly.
More info about how the plugin works is at Collapsed Forwarding Plugin — Apache Traffic Server 8.0.0 documentation
Collapsed Forwarding Plugin — Apache Traffic Server 8.0.0 documentation
Hope this helps.
Thanks,
Sudheer
Hi Sudheer,
I really think I’m missing something. Please allow me to check my understanding from the beginning.
We followed the documentation at the link I provided earlier, specifically the section on Reducing Origin Server Requests (Avoiding the Thundering Herd).
CONFIG proxy.config.cache.enable_read_while_writer INT 1
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
CONFIG proxy.config.cache.max_doc_size INT 0
CONFIG proxy.config.cache.read_while_writer.max_retries INT xxx
CONFIG proxy.config.cache.read_while_writer_retry.delay INT xxx
CONFIG proxy.config.http.cache.max_open_read_retries INT xxx
CONFIG proxy.config.http.cache.open_read_retry_time INT xxx
CONFIG proxy.config.http.cache.open_write_fail_action INT xxx
The documentation then states - "Once these are enabled, you have something that is very close, but not quite the same, to Squid’s Collapsed Forwarding.”
AFAIK none of this involves the Collapsed Forwarding plugin. The documentation doesn’t mention the Collapsed Forwarding plugin. We don’t have the Collapsed Forwarding plugin declared in our plugin.config.
It was my understanding that these settings were orthogonal to the Collapsed Forwarding plugin but provided similar functionality “out of the box”.
Please can you let me know if I have misunderstood the documentation? Maybe this section of the documentation is outdated?
Many thanks for your patience,
Nick
Date: Thursday, March 8, 2018 at 5:37 PM
Subject: Re: Parent.config and thundering herd.
AFAIK, collapsed_forwarding plugin is still actively used in production for live and vod streaming by a few companies and I'm not aware of any plans to deprecate it (we did agree on deprecating coallpsed_connection plugin which is somewhat similar in what it does, but, different in how it does -- perhaps, you were referring to that?).
I copied an older link earlier for open_write_fail_action, mainly because, it hasn't changed much in 7.x in what it does. Please see below 7.x reference.
With the open_write_fail_action feature (within the ATS core), ATS would return errors to all but one, on seeing multiple concurrent requests for the same object. For example, if you were doing a live streaming and a 1000 clients requested for the same segment file that is not in the Delivery Server's cache yet, enabling open_write_fail_action feature allows to return 502 to 999 clients, while the other request fetches the segment and populates the cache. As long as the clients retry, this should mostly work. However, if you do not like to return errors to clients (we certainly did not, as it'd make things much worse by causing a retry storm), collapsed_forwarding plugin can hold those requests waiting for the one request that was proxy'ed over to the Origin to fetch the segment and fill the cache. Once the segment is fetched and the writing to cache begins, the other requests can then join the party (that's where, read-while-writer comes into picture), and start streaming to all the clients at the same time.
Now, it's possible that you may have never used the collapsed_forwarding plugin and somehow happened to not see the problem of returning 502 errors to clients, but, it's always possible depending on the scale, concurrency (and in particular, the origin latency). Perhaps, enabling parent proxy may have exposed the problem, by somehow making the latency worse?
records.config — Apache Traffic Server 7.0.0 documentation
records.config — Apache Traffic Server 7.0.0 documentation
<1520548213187blob.jpg>
HI Sudheer,
I’m not sure we’re quite on the same page but I’m grateful for your input. This is all for ATS ver 7.0 and the documentation I’m talking about is on this page
https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html
In the section "Reducing Origin Server Requests (Avoiding the Thundering Herd)”
There is nothing in that section about these settings being associated with the Collapsed Forwarding plugin. In fact there is no mention of the Collapsed Forwarding plugin at all. Now I’m a little confused.
Is anyone able to clarify this for me? I thought I understood but maybe I don’t.
Thanks,
Nick
Date: Thursday, March 8, 2018 at 1:36 PM
Subject: Re: Parent.config and thundering herd.
Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin in fact is based on the settings you mentioned below and allows to block multiple parallel requests for the same object from leaking upstream.
Using the settings alone, without the plugin would not actually achieve any request coalescing for cache miss scenarios -- it'd simply result in returning an error back to the client. Is that what you meant by "seeing request coalescing"? Or is your use case, not involving cache misses, but, stale cache (e.g VOD)?
records.config — Apache Traffic Server 6.2.1 documentation
records.config — Apache Traffic Server 6.2.1 documentation
proxy.config.http.cache.open_write_fail_action
Scope: CONFIG
Type: INT
Default: 0
Reloadable: Yes
Overridable: Yes
This setting indicates the action taken on failing to obtain the cache open write lock on either a cache miss or a cache hit stale. This typically happens when there is more than one request to the same cache object simultaneously. During such a scenario, all but one (which goes to the origin) request is served either a stale copy or an error depending on this setting.
0 = default, disable cache and goto origin server
1 = return a 502 error on a cache miss
2 = serve stale if object’s age is under proxy.config.http.cache.max_stale_age, else go to origin server
3 = return a 502 error on a cache miss or serve stale on a cache revalidate if object’s age is under proxy.config.http.cache.max_stale_age, else go to origin server
4 = return a 502 error on either a cache miss or on a revalidation
Thanks,
Sudheer
HI Sudheer,
Thanks for the reply. I couldn’t think of any reason either, but I wanted to check with the community.
Read While Writer
Open Read Retry Timeout
Open Write Fail Action
We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config and we are seeing request coalescing.
Thanks,
Nick
Date: Thursday, March 8, 2018 at 11:49 AM
Subject: Re: Parent.config and thundering herd.
I haven’t looked at parent proxy setup much, but at a high level, I can’t think of any reason why an origin failover mechanism would impact request coalescing using collapsed forwarding plugin. The open write fail action works based on the cache key for the object and as long as that doesn’t change, it shouldn’t matter which origin it is pulled from. As a matter of fact, we have had origin failover setup using a custom plugin as well as request coalescing enabled in our HLS delivery servers and didn’t see any problems with it.
Is it possible the access failures are resulting in preventing the object from being downloaded or being cached somehow? If the object is never cached, then you will see problems with request coalescing.
Thanks,
Sudheer
Post by Dunkin, Nick
Hi,
We’ve been using the Thundering Herd protection provided by Read While Writer, Open Read Retry Timeout and Open Write Fail Action and have been getting some great results. However the behavior seems to change when we start using parent.config in order to provide some simple origin failover (I.e simple Primary/Secondary Origin kind of thing). My initial tests are showing multiple access failures and not very much in the way of request coalescing.
I don’t all have the details with me now, but at a high level, should we expect Read While Writer, Open Read Retry Timeout and Open Write Fail Action to all work in the same way when proxy.config.http.parent_proxy_routing_enable is enabled and we have a simple Primary/Secondary Origin configured with "parent_is_proxy=false”? Especially when most of the time the Primary Origin will be up and available. Are there any gotchas we should be aware of?
All this testing is with ATS 7.0 currently.
Thanks for your insight.
Nick
Nick Dunkin
Principal Engineer
o: 678.258.4071
4375 River Green Pkwy # 100, Duluth, GA 30096, USA
<319E5E02-1647-4542-836C-D389403ADE5F.png>
<1520548213187blob.jpg>
Loading...