Skip to content

[KubernetesPodOperator] Dectection of different timeouts for schedule and startup state #49784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AutomationDev85
Copy link
Contributor

Overview

The idea behind this PR is to enable the KubernetesPodOperator with detection of different timeouts.

For this we introduce the schedule_timeout_seconds parameter. This parameter defines the time from creating the Pod till arriving the scheduled state. With this timeout if is possible to catch e.g. scale up of Kubernetes nodes more detailed.
The startup_timeout_seconds timeout is then used to check for the time from entering the scheduled state till POD enters the running state. With that it is possible to specify the time for pulling an image more detailed.

With these 2 parameters it is possible to control the startup time of the Pod more detailed. A long running scale up of a node in the cluster does not affect the timeout during pulling of a huge image.

As this can break current timeout settings of the user -> Idea is to define the new parameter schedule_timeout_seconds with None instead of a default int value. If the user does not set this parameter the same value as startup_timeout_seconds is used again. This can double the timeout in worst case but we think it is worse for the moment to have no breaking change in the timeout behavior of the operator. What do you think about this?

Details of change:

  • Add schedule_timeout_seconds parameter.
  • Modify the await_pod_start function of the pod manager to detect schedule and startup timeouts.
  • Add and modify unit tests.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Apr 25, 2025
@AutomationDev85 AutomationDev85 force-pushed the feature/enable-schedule-timeout-kubernetes-pod-operator branch from 46520b1 to f489bca Compare April 25, 2025 12:50
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good for me. But would like to have another pair of eyes on this.

The failed compose test is a problem on main and seems to be un-related to this PR.

@jscheffl jscheffl force-pushed the feature/enable-schedule-timeout-kubernetes-pod-operator branch from f489bca to 1ab9178 Compare April 26, 2025 12:06
@nevcohen
Copy link
Contributor

nevcohen commented May 6, 2025

Why not combine these three PRs?

Or at least just the first one (49867)

@jscheffl
Copy link
Contributor

jscheffl commented May 6, 2025

Why not combine these three PRs?

Smaller PRs == easier review :-D

@nevcohen
Copy link
Contributor

nevcohen commented May 6, 2025

Smaller PRs == easier review :-D

I totally agree, but in this case they are really dependent on each other and there isn't really much extra code.

Anyway, it's not critical.

@AutomationDev85 AutomationDev85 force-pushed the feature/enable-schedule-timeout-kubernetes-pod-operator branch from 1ab9178 to 20f005e Compare May 12, 2025 07:47
@AutomationDev85 AutomationDev85 force-pushed the feature/enable-schedule-timeout-kubernetes-pod-operator branch from 20f005e to 2d3bca9 Compare May 12, 2025 07:50
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Approve. LGTM!

@jscheffl jscheffl merged commit 651a6dc into apache:main May 12, 2025
76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants