Open
Description
Hi there!
Bug Report
When deploying an update of concourse (using concourse/bosh) the update/recreation of the first worker was not finished for more than 4 hours.
Bosh shows the respective worker as failing (using bosh instances).
Logging in to the worker via bosh ssh, we observed:
- there is a process running the drain script
- under the worker parent process, there were still new jobs / child processes started. There were no old jobs running 4hrs.
- in the log of the worker, there is a log message containing "retiring-worker", indicating the USR2 signal was received by the worker here: https://github.com/concourse/concourse/blob/1b1b9ef171f7bc3fbb81e27fa7afa256a3c707b0/worker/beacon.go#L134
So for some reason, the drain behaviour described here https://concourse-ci.org/concourse-worker.html#gracefully-removing-a-worker seems not to work.
We then issued another USR2 signal to the worker process manually using kill -USR2 <worker pid>
.
This made the worker finish running jobs and it shut down.
You can see the log down here, the worker recreation update took 4hrs29min:
12:53:26 Task 1708127 | 10:53:20 | Preparing deployment: Preparing deployment (00:00:03)
12:53:26 Task 1708127 | 10:53:23 | Preparing deployment: Rendering templates (00:00:02)
12:53:26 Task 1708127 | 10:53:25 | Preparing package compilation: Finding packages to compile (00:00:00)
12:53:26 Task 1708127 | 10:53:25 | Compiling packages: btrfs_tools/797f8df53d2881f034366408b3f043a57f8f4c51
12:53:26 Task 1708127 | 10:53:25 | Compiling packages: postgres-9.6.10/04ecac16e7e53e17d1a1799c0fe874f262f1960ba37514da1b3a30d1c58c13c0
12:53:26 Task 1708127 | 10:53:25 | Compiling packages: postgres-common/9e812f515167406f22e2f983a6c325b0a54e1bd6128aa44e1b8f8bc44034d01f
12:54:22 Task 1708127 | 10:53:25 | Compiling packages: postgres-11.3/c0604a42bdaa3ce61d1b13f7b1017005794c18bb1307cabb30cacb49f30b36ac
12:54:24 Task 1708127 | 10:53:25 | Compiling packages: concourse/faaac11289457bdd4fb8d177051a7d8f03d9ff63
12:56:31 Task 1708127 | 10:54:22 | Compiling packages: postgres-common/9e812f515167406f22e2f983a6c325b0a54e1bd6128aa44e1b8f8bc44034d01f (00:00:57)
12:57:45 Task 1708127 | 10:54:24 | Compiling packages: btrfs_tools/797f8df53d2881f034366408b3f043a57f8f4c51 (00:00:59)
12:57:56 Task 1708127 | 10:56:31 | Compiling packages: concourse/faaac11289457bdd4fb8d177051a7d8f03d9ff63 (00:03:06)
12:58:31 Task 1708127 | 10:57:45 | Compiling packages: postgres-9.6.10/04ecac16e7e53e17d1a1799c0fe874f262f1960ba37514da1b3a30d1c58c13c0 (00:04:20)
12:58:31 Task 1708127 | 10:57:55 | Compiling packages: postgres-11.3/c0604a42bdaa3ce61d1b13f7b1017005794c18bb1307cabb30cacb49f30b36ac (00:04:30)
12:58:31 Task 1708127 | 10:58:31 | Updating instance worker-maintenance: worker-maintenance/d8c3c95a-0353-45d8-85ad-43ba00809758 (0) (canary)
12:58:32 Task 1708127 | 10:58:31 | Updating instance db: db/e768317d-f8f0-4462-a1c3-c9c51c76385e (0) (canary)
13:00:55 Task 1708127 | 10:58:31 | Updating instance web: web/f9fe22ad-e517-4816-8495-41dd69a92e4e (0) (canary)
13:01:18 Task 1708127 | 10:58:31 | Updating instance worker: worker/e1e957d6-355e-43a6-8cde-3610b98fb1dd (0) (canary)
13:01:18 Task 1708127 | 11:00:54 | Updating instance db: db/e768317d-f8f0-4462-a1c3-c9c51c76385e (0) (canary) (00:02:23)
13:01:19 Task 1708127 | 11:01:18 | Updating instance web: web/f9fe22ad-e517-4816-8495-41dd69a92e4e (0) (canary) (00:02:47)
13:01:19 Task 1708127 | 11:01:18 | Updating instance web: web/f9fe22ad-e517-4816-8495-41dd69a92e4e (0) (canary) (00:02:47)
16:01:38 Task 1708127 | 11:01:18 | Updating instance web: web/c10c5e72-0d53-474a-9a75-be897df157df (1)
17:28:03 Task 1708127 | 11:01:18 | Updating instance web: web/c10c5e72-0d53-474a-9a75-be897df157df (1) (00:03:29)
17:28:03 Task 1708127 | 14:01:37 | Updating instance worker-maintenance: worker-maintenance/d8c3c95a-0353-45d8-85ad-43ba00809758 (0) (canary) (03:03:06)
17:31:32 Task 1708127 | 15:28:03 | Updating instance worker: worker/e1e957d6-355e-43a6-8cde-3610b98fb1dd (0) (canary) (04:29:32)
- Concourse version: v5.2.0
- Deployment type (BOSH/Docker/binary): BOSH
- Infrastructure/IaaS: AWS/GCP/Openstack
Metadata
Metadata
Assignees
Labels
No labels