Discussion:
Issue with collapsed forwarding plugin, one CONN_FAIL
Zelkowitz, Evan
2018-05-15 20:08:16 UTC
Permalink
Ive been testing the collapsed forwarding plugin here by sending it random traffic, one request with 10 duplicates of the same request. One thing Im seeing is that a lot of the time one of the requests out of those 10 will come back as an ERR_CONN_FAIL, but the pssc will be the 303 redirect.

I was just wondering if this was expected? Is it an issue or has anyone seen this?

Here is my output logging showing one of these instances:


1526413868.121 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=1 b=107782 sssc=200 sscl=107782 cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT pqsn=origin-test-cache-52.net uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.123 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.124 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.125 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.126 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.126 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.127 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.127 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=60 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.121 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=303 ttms=68 b=247 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=ERR_CONNECT_FAIL phr=DIRECT pqsn=origin-test-cache-52.net uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.122 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=108 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
Chou, Peter
2018-05-15 20:29:26 UTC
Permalink
Evan,

I saw something similar with 304 instead of 303 when I tried something similar earlier. You can check my e-mail in the mailing list archive -- it was sent on 6-23-2017.

BTW, the issues I described were reproducible on my dev machine using curl and bash/perl to generate the requests. When we moved to our test environment with production level machines and using Spirent to generate traffic they were not reproducible. Perhaps some kind of timing issue in a low performance environment (the test machine and network were virtualized)?

Thanks,
Peter


From: Zelkowitz, Evan [mailto:***@comcast.com]
Sent: Tuesday, May 15, 2018 1:08 PM
To: ***@trafficserver.apache.org
Subject: Issue with collapsed forwarding plugin, one CONN_FAIL

Ive been testing the collapsed forwarding plugin here by sending it random traffic, one request with 10 duplicates of the same request. One thing Im seeing is that a lot of the time one of the requests out of those 10 will come back as an ERR_CONN_FAIL, but the pssc will be the 303 redirect.

I was just wondering if this was expected? Is it an issue or has anyone seen this?

Here is my output logging showing one of these instances:


1526413868.121 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=1 b=107782 sssc=200 sscl=107782 cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT pqsn=origin-test-cache-52.net uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.123 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.124 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.125 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.126 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.126 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.127 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.127 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=60 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.121 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=303 ttms=68 b=247 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=ERR_CONNECT_FAIL phr=DIRECT pqsn=origin-test-cache-52.net uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.122 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts<https://urldefense.proofpoint.com/v2/url?u=http-3A__origin-2Dtest-2Dcache-2D52.net_muxed-2D3000-2DprDi.tJZFQeIkXN-5FQYeLWwqAeRbJpH.ts&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=JfqcxDQVxf52z7KRMHnPJFR6rzqRxWgyHDZIX7TjSr8&s=EPsP8XiJkBl_l6xb6tOps4CbppYIFuyPhMpKbnZg2OU&e=> cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=108 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
Zelkowitz, Evan
2018-05-16 19:51:30 UTC
Permalink
So now that Im testing it more I can see on the origin side that for each one of these ERR_CONNECT_FAILS that that is going back to my client application which is then getting the redirect and going to the origin. It doesn’t even appear to be doing the number of retries or delays that I set for the collapsed_forwarding plugin. So something seems to be going wrong with the plugin somewhere. I can see it attempting the delay, so it can see the writefail header, and it sets the ‘internal’ redirect properly but that appears to leak out back to the client before it gets attempted by ATS itself.

From: "Zelkowitz, Evan" <***@comcast.com>
Reply-To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>
Date: Tuesday, May 15, 2018 at 2:08 PM
To: "***@trafficserver.apache.org" <***@trafficserver.apache.org>
Subject: [EXTERNAL] Issue with collapsed forwarding plugin, one CONN_FAIL

Ive been testing the collapsed forwarding plugin here by sending it random traffic, one request with 10 duplicates of the same request. One thing Im seeing is that a lot of the time one of the requests out of those 10 will come back as an ERR_CONN_FAIL, but the pssc will be the 303 redirect.

I was just wondering if this was expected? Is it an issue or has anyone seen this?

Here is my output logging showing one of these instances:


1526413868.121 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=1 b=107782 sssc=200 sscl=107782 cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT pqsn=origin-test-cache-52.net uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.123 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.124 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.125 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.126 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.126 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.127 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.127 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=60 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.121 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=303 ttms=68 b=247 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=ERR_CONNECT_FAIL phr=DIRECT pqsn=origin-test-cache-52.net uas="CUSTOMUAS/0.0.0" xmt="-"

1526413868.122 chi=test-ip phn=test-cache-09.net php=80 shn=origin-test-cache-52.net url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=108 b=107782 sssc=000 sscl=0 cfsc=FIN pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
Jeremy Payne
2018-06-19 12:24:08 UTC
Permalink
in testing 713, i also notice the 303 sent back to the end user. this
is much different than what's seen with 621. at least with 621 a 5xx
reponse is sent after all configured retries have been exhausted.

that said.. whats the current status of this plugin? deprecated? no
longer supported ?

is it safe to say, ATS core read_while_writer timers(and related
parameters) should be the preferred protections against thundering
herd ?
maybe at one time, core parameters were not sufficient to protect the
origin, which is why the plugin was developed in the first place.
but now core has caught up, and the plugin is no longer needed ?

thanks!




On Wed, May 16, 2018 at 2:51 PM, Zelkowitz, Evan
Post by Zelkowitz, Evan
So now that Im testing it more I can see on the origin side that for each
one of these ERR_CONNECT_FAILS that that is going back to my client
application which is then getting the redirect and going to the origin. It
doesn’t even appear to be doing the number of retries or delays that I set
for the collapsed_forwarding plugin. So something seems to be going wrong
with the plugin somewhere. I can see it attempting the delay, so it can see
the writefail header, and it sets the ‘internal’ redirect properly but that
appears to leak out back to the client before it gets attempted by ATS
itself.
Date: Tuesday, May 15, 2018 at 2:08 PM
Subject: [EXTERNAL] Issue with collapsed forwarding plugin, one CONN_FAIL
Ive been testing the collapsed forwarding plugin here by sending it random
traffic, one request with 10 duplicates of the same request. One thing Im
seeing is that a lot of the time one of the requests out of those 10 will
come back as an ERR_CONN_FAIL, but the pssc will be the 303 redirect.
I was just wondering if this was expected? Is it an issue or has anyone seen this?
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=1 b=107782 sssc=200 sscl=107782
cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.123 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.124 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.125 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=60 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=303 ttms=68 b=247 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=ERR_CONNECT_FAIL phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.122 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=108 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
e***@gmail.com
2018-06-19 15:07:33 UTC
Permalink
I believe the ats params provide good enough protection except for the one specific case mentioned in the collapsed forwarding plugin docs. The one case they can't protect against is multiple instantaneous hits for the same content. So this actually comes up fairly often for linear video since you have multiple clients that can all be requesting the same video chunk at the same time
Post by Jeremy Payne
in testing 713, i also notice the 303 sent back to the end user. this
is much different than what's seen with 621. at least with 621 a 5xx
reponse is sent after all configured retries have been exhausted.
that said.. whats the current status of this plugin? deprecated? no
longer supported ?
is it safe to say, ATS core read_while_writer timers(and related
parameters) should be the preferred protections against thundering
herd ?
maybe at one time, core parameters were not sufficient to protect the
origin, which is why the plugin was developed in the first place.
but now core has caught up, and the plugin is no longer needed ?
thanks!
On Wed, May 16, 2018 at 2:51 PM, Zelkowitz, Evan
Post by Zelkowitz, Evan
So now that Im testing it more I can see on the origin side that for each
one of these ERR_CONNECT_FAILS that that is going back to my client
application which is then getting the redirect and going to the origin. It
doesn’t even appear to be doing the number of retries or delays that I set
for the collapsed_forwarding plugin. So something seems to be going wrong
with the plugin somewhere. I can see it attempting the delay, so it can see
the writefail header, and it sets the ‘internal’ redirect properly but that
appears to leak out back to the client before it gets attempted by ATS
itself.
Date: Tuesday, May 15, 2018 at 2:08 PM
Subject: [EXTERNAL] Issue with collapsed forwarding plugin, one CONN_FAIL
Ive been testing the collapsed forwarding plugin here by sending it random
traffic, one request with 10 duplicates of the same request. One thing Im
seeing is that a lot of the time one of the requests out of those 10 will
come back as an ERR_CONN_FAIL, but the pssc will be the 303 redirect.
I was just wondering if this was expected? Is it an issue or has anyone seen this?
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=1 b=107782 sssc=200 sscl=107782
cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.123 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.124 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.125 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=60 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=303 ttms=68 b=247 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=ERR_CONNECT_FAIL phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.122 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=108 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
Jeremy Payne
2018-06-19 16:00:47 UTC
Permalink
Evan,

In my lab testing, I've noticed inconsistent results with the
collapsed_forwarding plugin. Sometimes it leaks redirects to the
client. Sometimes it sends a 500-Ink_Api response back.
However, when increasing open_write retires to some value to allow
upstream to respond, my results have been more consistent.. Where
instead of error responses being sent back downstream, I
am now sending 200s(or something other than a 5xx).

proxy.config.http.cache.max_open_write_retries 50

So whats the difference with core waiting for the write lock to free
up, as opposed to handing off the same to the plugin ?
Post by e***@gmail.com
I believe the ats params provide good enough protection except for the one specific case mentioned in the collapsed forwarding plugin docs. The one case they can't protect against is multiple instantaneous hits for the same content. So this actually comes up fairly often for linear video since you have multiple clients that can all be requesting the same video chunk at the same time
Post by Jeremy Payne
in testing 713, i also notice the 303 sent back to the end user. this
is much different than what's seen with 621. at least with 621 a 5xx
reponse is sent after all configured retries have been exhausted.
that said.. whats the current status of this plugin? deprecated? no
longer supported ?
is it safe to say, ATS core read_while_writer timers(and related
parameters) should be the preferred protections against thundering
herd ?
maybe at one time, core parameters were not sufficient to protect the
origin, which is why the plugin was developed in the first place.
but now core has caught up, and the plugin is no longer needed ?
thanks!
On Wed, May 16, 2018 at 2:51 PM, Zelkowitz, Evan
Post by Zelkowitz, Evan
So now that Im testing it more I can see on the origin side that for each
one of these ERR_CONNECT_FAILS that that is going back to my client
application which is then getting the redirect and going to the origin. It
doesn’t even appear to be doing the number of retries or delays that I set
for the collapsed_forwarding plugin. So something seems to be going wrong
with the plugin somewhere. I can see it attempting the delay, so it can see
the writefail header, and it sets the ‘internal’ redirect properly but that
appears to leak out back to the client before it gets attempted by ATS
itself.
Date: Tuesday, May 15, 2018 at 2:08 PM
Subject: [EXTERNAL] Issue with collapsed forwarding plugin, one CONN_FAIL
Ive been testing the collapsed forwarding plugin here by sending it random
traffic, one request with 10 duplicates of the same request. One thing Im
seeing is that a lot of the time one of the requests out of those 10 will
come back as an ERR_CONN_FAIL, but the pssc will be the 303 redirect.
I was just wondering if this was expected? Is it an issue or has anyone seen this?
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=1 b=107782 sssc=200 sscl=107782
cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.123 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.124 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.125 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=60 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=303 ttms=68 b=247 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=ERR_CONNECT_FAIL phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.122 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=108 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
e***@gmail.com
2018-06-19 16:14:47 UTC
Permalink
So the setting we usually use on write fail is to go to the origin. The other option is to send back 5xx or stale. In the origin case each request will end up waiting for the write lock and then once it obtains it it will end up going to the origin. So you end up with a lockstep herd all writing the same content (at least for all the requests that come in before the initial write finishes).

The idea of the plugin is instead when you get a write lock fail you know at that point that some other request is already attempting to write/update the object, so at that point you may as well just go in to a read retry loop instead of actually attempting any write since at some point the first write will finish and you will be able to read that without having to go to the origin. I believe that was the initial intent. However in my testing I also saw the redirect and 5xx client leaks, so it was too unpredictable for us to use it.

I have been working on trying to add this functionality in to the ATS core code, just another write fail option that if the write lock fails it jumps back to attempting to read from the cache instead. Its mostly working except for one random core Im getting in production testing that Im trying to track down. Its also just experimental at this stage since because of how the ATS state machine is setup its possible that even after jumping back to the read its possible to end up attempting another write and if the timing is just right the first write may finish right as another request failed to read and attempted to do another write, which will result in an origin hit. But thats something to figure out once Ive worked out the core issue
Post by Jeremy Payne
Evan,
In my lab testing, I've noticed inconsistent results with the
collapsed_forwarding plugin. Sometimes it leaks redirects to the
client. Sometimes it sends a 500-Ink_Api response back.
However, when increasing open_write retires to some value to allow
upstream to respond, my results have been more consistent.. Where
instead of error responses being sent back downstream, I
am now sending 200s(or something other than a 5xx).
proxy.config.http.cache.max_open_write_retries 50
So whats the difference with core waiting for the write lock to free
up, as opposed to handing off the same to the plugin ?
Post by e***@gmail.com
I believe the ats params provide good enough protection except for the one specific case mentioned in the collapsed forwarding plugin docs. The one case they can't protect against is multiple instantaneous hits for the same content. So this actually comes up fairly often for linear video since you have multiple clients that can all be requesting the same video chunk at the same time
Post by Jeremy Payne
in testing 713, i also notice the 303 sent back to the end user. this
is much different than what's seen with 621. at least with 621 a 5xx
reponse is sent after all configured retries have been exhausted.
that said.. whats the current status of this plugin? deprecated? no
longer supported ?
is it safe to say, ATS core read_while_writer timers(and related
parameters) should be the preferred protections against thundering
herd ?
maybe at one time, core parameters were not sufficient to protect the
origin, which is why the plugin was developed in the first place.
but now core has caught up, and the plugin is no longer needed ?
thanks!
On Wed, May 16, 2018 at 2:51 PM, Zelkowitz, Evan
Post by Zelkowitz, Evan
So now that Im testing it more I can see on the origin side that for each
one of these ERR_CONNECT_FAILS that that is going back to my client
application which is then getting the redirect and going to the origin. It
doesn’t even appear to be doing the number of retries or delays that I set
for the collapsed_forwarding plugin. So something seems to be going wrong
with the plugin somewhere. I can see it attempting the delay, so it can see
the writefail header, and it sets the ‘internal’ redirect properly but that
appears to leak out back to the client before it gets attempted by ATS
itself.
Date: Tuesday, May 15, 2018 at 2:08 PM
Subject: [EXTERNAL] Issue with collapsed forwarding plugin, one CONN_FAIL
Ive been testing the collapsed forwarding plugin here by sending it random
traffic, one request with 10 duplicates of the same request. One thing Im
seeing is that a lot of the time one of the requests out of those 10 will
come back as an ERR_CONN_FAIL, but the pssc will be the 303 redirect.
I was just wondering if this was expected? Is it an issue or has anyone seen
this?
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=1 b=107782 sssc=200 sscl=107782
cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.123 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.124 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.125 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=60 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=303 ttms=68 b=247 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=ERR_CONNECT_FAIL phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.122 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=108 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
Jeremy Payne
2018-06-19 18:35:06 UTC
Permalink
Yes.. Thanks for that reminder.. The point of the plugin is to return
the transaction to open read, since once the write lock is obtained
the request is heading to the origin.

That said, I'm still noticing a timing issue with the plugin..
Meaning, if the plugin is kicked off too soon, then for whatever the
reason the plugin returns a 500 Ink-API response.
If I add a little delay via open_write_retries, then the plugin
returns the expected response.
Its a dark art of sorts.. :-)
Post by e***@gmail.com
So the setting we usually use on write fail is to go to the origin. The other option is to send back 5xx or stale. In the origin case each request will end up waiting for the write lock and then once it obtains it it will end up going to the origin. So you end up with a lockstep herd all writing the same content (at least for all the requests that come in before the initial write finishes).
The idea of the plugin is instead when you get a write lock fail you know at that point that some other request is already attempting to write/update the object, so at that point you may as well just go in to a read retry loop instead of actually attempting any write since at some point the first write will finish and you will be able to read that without having to go to the origin. I believe that was the initial intent. However in my testing I also saw the redirect and 5xx client leaks, so it was too unpredictable for us to use it.
I have been working on trying to add this functionality in to the ATS core code, just another write fail option that if the write lock fails it jumps back to attempting to read from the cache instead. Its mostly working except for one random core Im getting in production testing that Im trying to track down. Its also just experimental at this stage since because of how the ATS state machine is setup its possible that even after jumping back to the read its possible to end up attempting another write and if the timing is just right the first write may finish right as another request failed to read and attempted to do another write, which will result in an origin hit. But thats something to figure out once Ive worked out the core issue
Post by Jeremy Payne
Evan,
In my lab testing, I've noticed inconsistent results with the
collapsed_forwarding plugin. Sometimes it leaks redirects to the
client. Sometimes it sends a 500-Ink_Api response back.
However, when increasing open_write retires to some value to allow
upstream to respond, my results have been more consistent.. Where
instead of error responses being sent back downstream, I
am now sending 200s(or something other than a 5xx).
proxy.config.http.cache.max_open_write_retries 50
So whats the difference with core waiting for the write lock to free
up, as opposed to handing off the same to the plugin ?
Post by e***@gmail.com
I believe the ats params provide good enough protection except for the one specific case mentioned in the collapsed forwarding plugin docs. The one case they can't protect against is multiple instantaneous hits for the same content. So this actually comes up fairly often for linear video since you have multiple clients that can all be requesting the same video chunk at the same time
Post by Jeremy Payne
in testing 713, i also notice the 303 sent back to the end user. this
is much different than what's seen with 621. at least with 621 a 5xx
reponse is sent after all configured retries have been exhausted.
that said.. whats the current status of this plugin? deprecated? no
longer supported ?
is it safe to say, ATS core read_while_writer timers(and related
parameters) should be the preferred protections against thundering
herd ?
maybe at one time, core parameters were not sufficient to protect the
origin, which is why the plugin was developed in the first place.
but now core has caught up, and the plugin is no longer needed ?
thanks!
On Wed, May 16, 2018 at 2:51 PM, Zelkowitz, Evan
Post by Zelkowitz, Evan
So now that Im testing it more I can see on the origin side that for each
one of these ERR_CONNECT_FAILS that that is going back to my client
application which is then getting the redirect and going to the origin. It
doesn’t even appear to be doing the number of retries or delays that I set
for the collapsed_forwarding plugin. So something seems to be going wrong
with the plugin somewhere. I can see it attempting the delay, so it can see
the writefail header, and it sets the ‘internal’ redirect properly but that
appears to leak out back to the client before it gets attempted by ATS
itself.
Date: Tuesday, May 15, 2018 at 2:08 PM
Subject: [EXTERNAL] Issue with collapsed forwarding plugin, one CONN_FAIL
Ive been testing the collapsed forwarding plugin here by sending it random
traffic, one request with 10 duplicates of the same request. One thing Im
seeing is that a lot of the time one of the requests out of those 10 will
come back as an ERR_CONN_FAIL, but the pssc will be the 303 redirect.
I was just wondering if this was expected? Is it an issue or has anyone seen
this?
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=1 b=107782 sssc=200 sscl=107782
cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.123 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.124 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.125 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.126 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.127 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=60 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.121 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=303 ttms=68 b=247 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=ERR_CONNECT_FAIL phr=DIRECT pqsn=origin-test-cache-52.net
uas="CUSTOMUAS/0.0.0" xmt="-"
1526413868.122 chi=test-ip phn=test-cache-09.net php=80
shn=origin-test-cache-52.net
url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=108 b=107782 sssc=000 sscl=0 cfsc=FIN
pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
Loading...