[KubernetesPodOperator] Dectection of different timeouts for schedule and startup state #49784

AutomationDev85 · 2025-04-25T12:42:57Z

Overview

The idea behind this PR is to enable the KubernetesPodOperator with detection of different timeouts.

For this we introduce the schedule_timeout_seconds parameter. This parameter defines the time from creating the Pod till arriving the scheduled state. With this timeout if is possible to catch e.g. scale up of Kubernetes nodes more detailed.
The startup_timeout_seconds timeout is then used to check for the time from entering the scheduled state till POD enters the running state. With that it is possible to specify the time for pulling an image more detailed.

With these 2 parameters it is possible to control the startup time of the Pod more detailed. A long running scale up of a node in the cluster does not affect the timeout during pulling of a huge image.

As this can break current timeout settings of the user -> Idea is to define the new parameter schedule_timeout_seconds with None instead of a default int value. If the user does not set this parameter the same value as startup_timeout_seconds is used again. This can double the timeout in worst case but we think it is worse for the moment to have no breaking change in the timeout behavior of the operator. What do you think about this?

Details of change:

Add schedule_timeout_seconds parameter.
Modify the await_pod_start function of the pod manager to detect schedule and startup timeouts.
Add and modify unit tests.

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py

jscheffl

Looking good for me. But would like to have another pair of eyes on this.

The failed compose test is a problem on main and seems to be un-related to this PR.

nevcohen · 2025-05-06T16:33:15Z

Why not combine these three PRs?

49867
50192

Or at least just the first one (49867)

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py

jscheffl · 2025-05-06T19:48:29Z

Why not combine these three PRs?

Smaller PRs == easier review :-D

nevcohen · 2025-05-06T20:11:33Z

Smaller PRs == easier review :-D

I totally agree, but in this case they are really dependent on each other and there isn't really much extra code.

Anyway, it's not critical.

…imeout and startup timeout

jscheffl

Re-Approve. LGTM!

AutomationDev85 requested review from jedcunningham and hussein-awala as code owners April 25, 2025 12:42

boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Apr 25, 2025

jscheffl reviewed Apr 25, 2025

View reviewed changes

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py Outdated Show resolved Hide resolved

AutomationDev85 force-pushed the feature/enable-schedule-timeout-kubernetes-pod-operator branch from 46520b1 to f489bca Compare April 25, 2025 12:50

jscheffl approved these changes Apr 26, 2025

View reviewed changes

jscheffl force-pushed the feature/enable-schedule-timeout-kubernetes-pod-operator branch from f489bca to 1ab9178 Compare April 26, 2025 12:06

AutomationDev85 mentioned this pull request Apr 28, 2025

[KubernetesPodOperator] Add fail fast detection during pod startup #49867

Merged

nevcohen reviewed May 6, 2025

View reviewed changes

AutomationDev85 force-pushed the feature/enable-schedule-timeout-kubernetes-pod-operator branch from 1ab9178 to 20f005e Compare May 12, 2025 07:47

KubernetesPodOperator uses different timeouts to check for schedule t…

2d3bca9

…imeout and startup timeout

AutomationDev85 force-pushed the feature/enable-schedule-timeout-kubernetes-pod-operator branch from 20f005e to 2d3bca9 Compare May 12, 2025 07:50

jscheffl approved these changes May 12, 2025

View reviewed changes

jscheffl merged commit 651a6dc into apache:main May 12, 2025
76 checks passed

This was referenced May 14, 2025

Status of testing Providers that were prepared on May 14, 2025 #50599

Closed

Status of testing Providers that were prepared on May 20, 2025 #50818

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[KubernetesPodOperator] Dectection of different timeouts for schedule and startup state #49784

[KubernetesPodOperator] Dectection of different timeouts for schedule and startup state #49784

Uh oh!

AutomationDev85 commented Apr 25, 2025

Uh oh!

Uh oh!

jscheffl left a comment

Uh oh!

nevcohen commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jscheffl commented May 6, 2025

Uh oh!

nevcohen commented May 6, 2025

Uh oh!

jscheffl left a comment

Uh oh!

Uh oh!

Uh oh!

[KubernetesPodOperator] Dectection of different timeouts for schedule and startup state #49784

[KubernetesPodOperator] Dectection of different timeouts for schedule and startup state #49784

Uh oh!

Conversation

AutomationDev85 commented Apr 25, 2025

Overview

Details of change:

Uh oh!

Uh oh!

jscheffl left a comment

Choose a reason for hiding this comment

Uh oh!

nevcohen commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jscheffl commented May 6, 2025

Uh oh!

nevcohen commented May 6, 2025

Uh oh!

jscheffl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!