Skip to content

pod cpu-hog and pod-network-loss Litmus chaos experiments are not working, helper pod is getting failed with timeout error #5062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Elezabeth23 opened this issue Mar 10, 2025 · 3 comments

Comments

@Elezabeth23
Copy link

For pod cpu-hog experiment, the helper pod is getting created and running for log time and ended up at below error

time="2025-03-10T04:00:24Z" level=info msg="Helper Name: stress-chaos"
time="2025-03-10T04:00:24Z" level=info msg="[PreReq]: Getting the ENV variables"
time="2025-03-10T04:00:54Z" level=info msg="[Info]: Details of Stressor:" CPU Core=0 CPU Load=100 Timeout=180
time="2025-03-10T04:49:24Z" level=fatal msg="helper pod failed, err: could not get container id\n --- at /litmus-go/chaoslib/litmus/stress-chaos/helper/stress-helper.go:127 (prepareStressChaos) ---\nCaused by: {"source":"pod-cpu-hog-helper-hb6zz","errorCode":"HELPER_ERROR","reason":"Get \"https://xx.xx.x.x:xx/api/v1/namespaces/namespace1/pods/nodejsapp\\\": dial tcp xx.xx.x.x:xx: i/o timeout","target":"{podName: nodejsapp, namespace: cicddemo-d0}"}, resultErr: Get "https://xx.xx.x.x:xx/apis/litmuschaos.io/v1alpha1/namespaces/namespace1/chaosresults/pod-cpu-hog-bbz49p62-pod-cpu-hog\": dial tcp xx.xx.x.x:xx: i/o timeout"

Below log is from Job pod

ERROR: time="2025-03-10T06:31:24Z" level=error msg="[Error]: CPU hog failed, err: could not run chaos in parallel mode\n --- at /litmus-go/chaoslib/litmus/stress-chaos/lib/stress-chaos.go:101 (PrepareAndInjectStressChaos) ---\nCaused by: helper pod failed\n --- at /litmus-go/pkg/utils/common/common.go:164 (HelperFailedError) ---\nCaused by: {"errorCode":"STATUS_CHECKS_ERROR","reason":"container is not completed within timeout","target":"{podName: pod-cpu-hog-helper-zlmbp, namespace: namespace1, container: pod-cpu-hog}"}"

@kyzrxx
Copy link

kyzrxx commented Apr 2, 2025

@Elezabeth23 Please give a try with setting SET_HELPER_DATA ="true"

@Elezabeth23
Copy link
Author

Elezabeth23 commented Apr 2, 2025

@kyzrxx
With SET_HELPER_DATA = "true", which is the default value, I am getting below error
{"errorCode":"STATUS_CHECKS_ERROR","phase":"ChaosInject","reason":"container is not completed within timeout","target":"{podName: pod-memory-hog-helper-z2n6b, namespace: cicddemo-d0, container: pod-memory-hog}"}

I have raised this in litmus slack channel as well Slack

@kyzrxx
Copy link

kyzrxx commented Apr 4, 2025

@Elezabeth23 Sorry i was trying to say on SET_HELPER_DATA="false" but somehow i mistyped it.

Looking at the slack message i see you are using openshift 3.x version. Since there is resourcequota set in your ns, can you check on the experiment pod and helper yamls ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants