Skip to content

NO-JIRA: Improve unexpected reboot test output #29668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dgoodwin
Copy link
Contributor

@dgoodwin dgoodwin commented Apr 9, 2025

The test output today is pretty confusing, times are not formatted as
they were intended due to the use of slices. This change formats the
timestamps to be human readable in output, improves a couple variable
names, and logs the boots for each node in chronological order instead
of reverse.

Old output:

{  fail [github.com/openshift/origin/test/extended/machines/cluster.go:176]: Unexpected error:
    <errors.aggregate | len:2, cap:2>: 
    [unexpected boot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}], expected reboot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]]
    [
        <*errors.errorString | 0xc00217e540>{
            s: "unexpected boot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]",
        },
        <*errors.errorString | 0xc00217e580>{
            s: "expected reboot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]",
        },
    ]
occurred
Ginkgo exit error 1: exit with code 1}

The test output today is pretty confusing, times are not formatted as
they were intended due to the use of slices. This change formats the
timestamps to be human readable in output, improves a couple variable
names, and logs the boots for each node in chronological order instead
of reverse.
@openshift-ci openshift-ci bot requested review from p0lyn0mial and sjenning April 9, 2025 17:36
Copy link
Contributor

openshift-ci bot commented Apr 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2025
Copy link

openshift-trt bot commented Apr 9, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: dab9f95

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-bgp-techpreview Medium - "[sig-network][OCPFeatureGate:RouteAdvertisements][Feature:RouteAdvertisements][apigroup:operator.openshift.io] when using openshift ovn-kubernetes [PodNetwork] Advertising the default network [apigroup:user.openshift.io][apigroup:security.openshift.io] External host should be able to query route advertised pods by the pod IP [Suite:openshift/conformance/parallel]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-bgp-techpreview Medium - "[sig-network][OCPFeatureGate:RouteAdvertisements][Feature:RouteAdvertisements][apigroup:operator.openshift.io] when using openshift ovn-kubernetes [PodNetwork] Advertising the default network [apigroup:user.openshift.io][apigroup:security.openshift.io] pods should communicate with external host without being SNATed [Suite:openshift/conformance/parallel]" is a new test, and was only seen in one job.

New tests seen in this PR at sha: dab9f95

  • "[sig-network][OCPFeatureGate:RouteAdvertisements][Feature:RouteAdvertisements][apigroup:operator.openshift.io] when using openshift ovn-kubernetes [PodNetwork] Advertising the default network [apigroup:user.openshift.io][apigroup:security.openshift.io] External host should be able to query route advertised pods by the pod IP [Suite:openshift/conformance/parallel]" [Total: 1, Pass: 1, Fail: 0, Flake: 1]
  • "[sig-network][OCPFeatureGate:RouteAdvertisements][Feature:RouteAdvertisements][apigroup:operator.openshift.io] when using openshift ovn-kubernetes [PodNetwork] Advertising the default network [apigroup:user.openshift.io][apigroup:security.openshift.io] pods should communicate with external host without being SNATed [Suite:openshift/conformance/parallel]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]

@MaysaMacedo
Copy link
Contributor

/test e2e-metal-ipi-ovn e2e-metal-ipi-virtualmedia e2e-metal-ipi-ovn-dualstack-local-gateway

@MaysaMacedo
Copy link
Contributor

@dgoodwin It looks good. Did you have a chance to test and see if the output is really what you expect?
Can you add a link to some jira or use no-jira in the PR title?

@@ -137,37 +145,40 @@ var _ = g.Describe("[sig-node] Managed cluster", func() {
allTimelineEvents := []bootTimelineEntry{}
allTimelineEvents = append(allTimelineEvents, nodeBoots...)
allTimelineEvents = append(allTimelineEvents, nodeReboots...)

e2e.Logf("timeline events for %q\n%v", node.Name, formatTimeline(allTimelineEvents))

sort.Sort(sort.Reverse(byTime(allTimelineEvents)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this still make the output to be reversed? Like newer first in the list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in the stdout log, not the actual test output. Maybe I should leave this be so it's the same for both but its nice to see the list in chronological order in the stdout

@dgoodwin
Copy link
Contributor Author

Unfortunately this test is too rare a failure to reproduce in the PR, so we'd have to push it into the wild and wait.

@dgoodwin
Copy link
Contributor Author

Actually I can add a bogus failure and make it fail, I'll try that

@dgoodwin dgoodwin changed the title Improve unexpected reboot test output NO-JIRA: Improve unexpected reboot test output Apr 10, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 10, 2025
@openshift-ci-robot
Copy link

@dgoodwin: This pull request explicitly references no jira issue.

In response to this:

The test output today is pretty confusing, times are not formatted as
they were intended due to the use of slices. This change formats the
timestamps to be human readable in output, improves a couple variable
names, and logs the boots for each node in chronological order instead
of reverse.

Old output:

{  fail [github.com/openshift/origin/test/extended/machines/cluster.go:176]: Unexpected error:
   <errors.aggregate | len:2, cap:2>: 
   [unexpected boot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}], expected reboot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]]
   [
       <*errors.errorString | 0xc00217e540>{
           s: "unexpected boot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]",
       },
       <*errors.errorString | 0xc00217e580>{
           s: "expected reboot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]",
       },
   ]
occurred
Ginkgo exit error 1: exit with code 1}

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented May 15, 2025

@dgoodwin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-fips-serial 5beff6c link false /test e2e-gcp-fips-serial
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback 5beff6c link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
ci/prow/e2e-aws-ovn-microshift 5beff6c link true /test e2e-aws-ovn-microshift
ci/prow/e2e-azure-ovn-upgrade 5beff6c link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn 5beff6c link false /test e2e-metal-ipi-ovn
ci/prow/e2e-aws-ovn-single-node-serial 5beff6c link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-metal-ipi-serial-ovn-ipv6 5beff6c link false /test e2e-metal-ipi-serial-ovn-ipv6
ci/prow/e2e-gcp-disruptive 5beff6c link false /test e2e-gcp-disruptive
ci/prow/e2e-aws-ovn-etcd-scaling 5beff6c link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-fips 5beff6c link true /test e2e-aws-ovn-fips
ci/prow/e2e-gcp-ovn-etcd-scaling 5beff6c link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-metal-ipi-virtualmedia 5beff6c link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-aws 5beff6c link false /test e2e-aws
ci/prow/e2e-gcp-ovn 5beff6c link true /test e2e-gcp-ovn
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway 5beff6c link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/e2e-openstack-ovn 5beff6c link false /test e2e-openstack-ovn
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout 5beff6c link false /test e2e-metal-ipi-ovn-kube-apiserver-rollout
ci/prow/e2e-aws-ovn-kube-apiserver-rollout 5beff6c link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-aws-ovn-cgroupsv2 5beff6c link false /test e2e-aws-ovn-cgroupsv2
ci/prow/okd-e2e-gcp 5beff6c link false /test okd-e2e-gcp
ci/prow/e2e-aws-ovn-single-node-upgrade 5beff6c link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/okd-scos-e2e-aws-ovn 5beff6c link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-proxy 5beff6c link false /test e2e-aws-proxy
ci/prow/e2e-aws-ovn 5beff6c link false /test e2e-aws-ovn
ci/prow/e2e-metal-ipi-serial 5beff6c link false /test e2e-metal-ipi-serial
ci/prow/e2e-hypershift-conformance 5beff6c link false /test e2e-hypershift-conformance
ci/prow/e2e-aws-ovn-single-node 5beff6c link false /test e2e-aws-ovn-single-node
ci/prow/e2e-azure 5beff6c link false /test e2e-azure
ci/prow/e2e-vsphere-ovn-etcd-scaling 5beff6c link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-vsphere-ovn 5beff6c link true /test e2e-vsphere-ovn
ci/prow/e2e-azure-ovn-etcd-scaling 5beff6c link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-aws-disruptive 5beff6c link false /test e2e-aws-disruptive
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 5beff6c link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-aws-ovn-edge-zones 5beff6c link true /test e2e-aws-ovn-edge-zones
ci/prow/e2e-vsphere-ovn-upi 5beff6c link true /test e2e-vsphere-ovn-upi
ci/prow/e2e-openstack-serial 5beff6c link false /test e2e-openstack-serial
ci/prow/e2e-metal-ipi-ovn-ipv6 5beff6c link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-metal-ipi-ovn-dualstack 5beff6c link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-agnostic-ovn-cmd 5beff6c link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-techpreview 5beff6c link false /test e2e-metal-ipi-ovn-dualstack-bgp-techpreview
ci/prow/e2e-aws-ovn-serial 5beff6c link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-serial-2of2 5beff6c link true /test e2e-aws-ovn-serial-2of2
ci/prow/e2e-aws-ovn-serial-1of2 5beff6c link true /test e2e-aws-ovn-serial-1of2
ci/prow/e2e-aws-ovn-serial-publicnet 5beff6c link true /test e2e-aws-ovn-serial-publicnet

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

openshift-trt bot commented May 15, 2025

Job Failure Risk Analysis for sha: 5beff6c

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-disruptive Medium
[sig-node] static pods should start after being created
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling High
[sig-api-machinery] disruption/cache-openshift-api apiserver/openshift-apiserver connection/new should be available throughout the test
This test has passed 99.58% of 4761 runs on release 4.20 [Overall] in the last week.
---
[sig-instrumentation] disruption/metrics-api connection/new should be available throughout the test
This test has passed 99.57% of 3933 runs on release 4.20 [Overall] in the last week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants