-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[KubernetesPodOperator] Dectection of different timeouts for schedule and startup state #49784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KubernetesPodOperator] Dectection of different timeouts for schedule and startup state #49784
Conversation
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py
Outdated
Show resolved
Hide resolved
46520b1
to
f489bca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good for me. But would like to have another pair of eyes on this.
The failed compose test is a problem on main and seems to be un-related to this PR.
f489bca
to
1ab9178
Compare
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py
Show resolved
Hide resolved
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py
Show resolved
Hide resolved
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py
Show resolved
Hide resolved
Smaller PRs == easier review :-D |
I totally agree, but in this case they are really dependent on each other and there isn't really much extra code. Anyway, it's not critical. |
1ab9178
to
20f005e
Compare
…imeout and startup timeout
20f005e
to
2d3bca9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-Approve. LGTM!
Overview
The idea behind this PR is to enable the KubernetesPodOperator with detection of different timeouts.
For this we introduce the schedule_timeout_seconds parameter. This parameter defines the time from creating the Pod till arriving the scheduled state. With this timeout if is possible to catch e.g. scale up of Kubernetes nodes more detailed.
The startup_timeout_seconds timeout is then used to check for the time from entering the scheduled state till POD enters the running state. With that it is possible to specify the time for pulling an image more detailed.
With these 2 parameters it is possible to control the startup time of the Pod more detailed. A long running scale up of a node in the cluster does not affect the timeout during pulling of a huge image.
As this can break current timeout settings of the user -> Idea is to define the new parameter schedule_timeout_seconds with None instead of a default int value. If the user does not set this parameter the same value as startup_timeout_seconds is used again. This can double the timeout in worst case but we think it is worse for the moment to have no breaking change in the timeout behavior of the operator. What do you think about this?
Details of change: