Skip to content

Commit 2be2b0f

Browse files
committed
fix(makefile): standardized image targets
- deleted logs/ directory and added it to .gitignore - removed RELEASE_PYTHON_VERSION and standardized on PYTHON_VERSION makefile variable - helper functions to parse makefile target and extract important metadata as makefile variables - add retries to podman push in build_image makefile function - dynamically build workbench directory / dockerfile filename based on target - standardized makefile image targets as <accelerator>-<feature>-<scope>-<os>-<python version> - single deploy-% target for all images - single undeploy-% target for all images - singe test-% target for all images - new e2e-% target that runs $* + deploy-$* + test-$* + undeploy-$* - updated/simplified make_test.py in light of Makefile changes - pass kustomize output to kubectl via stdin to avoid accidental checkin of personal settings - refactored notebooks/ repo file hierarchy to consistently leverage subfolders for accelerator-specific resources - renamed runtimes folder to runtime to match target name - jupyter/cuda + jupyter/rocm - runtime/cuda + runtime/rocm - updated kustomize resources for consistency - image name used an manifest name prefix - -workbench used as manifest name suffix - using labels transformer as commonLabels deprecated - containerPort named workbench-port - removed spec.containers.command from codeserver/rstudio to let server start - images.newTag aligned with makefile target - added emptyDir volume mount to all workloads - added startupProbe to our accelerator images - using term "workbench" as opposed to "notebook" consistently throughout manifests - updated various Dockerfile to match new folder hierarchy where necessary - refactored test_jupyter_with_papermill to support testing needs of all workbenches + runtimes - scripts/makefile_utils directory created - numerous usability enhancements to the logic - reduce hardcoding of "magic" strings by parsing kustomize output to identify workload names and ports - scan for open port and use that when verifying container starts via kubectl port-forward - confirms container starts for all workbenches (not just jupyter) - confirms required libraries installed in container (now applied to jupyter notebooks as well) - moved all validate-xxx target logic into script for better consolidated maintenance - relies on makefile to pass metadata parsed from target name to avoid duplicating logic - TODO: - fix any problems in GHA due to above changes - add NAMING.md file to explain the "rules" around our makefile target names and all the places in our development flow that is impacted - fix openshift/release due to above changes - cleanup now-defunct/legacy makefile targets Related-to: https://issues.redhat.com/browse/RHOAIENG-23291
1 parent 4e17ba3 commit 2be2b0f

File tree

134 files changed

+1393
-491
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

134 files changed

+1393
-491
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ build/
1212
develop-eggs/
1313
dist/
1414
downloads/
15+
logs/
1516
eggs/
1617
.eggs/
1718
lib/

Makefile

Lines changed: 153 additions & 100 deletions
Large diffs are not rendered by default.

ci/cached-builds/make_test.py

Lines changed: 29 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -33,31 +33,8 @@ def main() -> None:
3333

3434
def run_tests(target: str) -> None:
3535
prefix = target.translate(str.maketrans(".", "-"))
36-
# this is a pod name in statefulset, some tests deploy individual unmanaged pods, though
37-
pod = prefix + "-notebook-0" # `$(kubectl get statefulset -o name | head -n 1)` would work too
3836
namespace = "ns-" + prefix
3937

40-
if target.startswith("runtime-"):
41-
deploy = "deploy9"
42-
deploy_target = target.replace("runtime-", "runtimes-")
43-
elif target.startswith("rocm-runtime-"):
44-
deploy = "deploy9"
45-
deploy_target = target.replace("rocm-runtime-", "runtimes-rocm-")
46-
elif target.startswith("rocm-jupyter-"):
47-
deploy = "deploy9"
48-
deploy_target = target.replace("rocm-jupyter-", "jupyter-rocm-")
49-
elif target.startswith("cuda-rstudio-"):
50-
deploy = "deploy"
51-
os = re.match(r"^cuda-rstudio-([^-]+-).*", target)
52-
deploy_target = os.group(1) + target.removeprefix("cuda-")
53-
elif target.startswith("rstudio-"):
54-
deploy = "deploy"
55-
os = re.match(r"^rstudio-([^-]+-).*", target)
56-
deploy_target = os.group(1) + target
57-
else:
58-
deploy = "deploy9"
59-
deploy_target = target
60-
6138
check_call(f"kubectl create namespace {namespace}", shell=True)
6239
check_call(f"kubectl config set-context --current --namespace={namespace}", shell=True)
6340
check_call(f"kubectl label namespace {namespace} fake-scc=fake-restricted-v2", shell=True)
@@ -69,24 +46,10 @@ def run_tests(target: str) -> None:
6946
# See https://github.com/kubernetes/kubernetes/issues/66689
7047
check_call("timeout 10s bash -c 'until kubectl get serviceaccount/default; do sleep 1; done'", shell=True)
7148

72-
check_call(f"make {deploy}-{deploy_target}", shell=True)
73-
wait_for_stability(pod)
49+
check_call(f"make deploy-{target}", shell=True)
7450

7551
try:
76-
if target.startswith("runtime-"):
77-
check_call(f"make validate-runtime-image image={target}", shell=True)
78-
elif target.startswith("rocm-runtime-"):
79-
check_call(
80-
f"make validate-runtime-image image={target.replace('rocm-runtime-', 'runtime-rocm-')}", shell=True
81-
)
82-
elif target.startswith(("rstudio-", "cuda-rstudio-")):
83-
check_call(f"make validate-rstudio-image image={target}", shell=True)
84-
elif target.startswith("codeserver-"):
85-
check_call(f"make validate-codeserver-image image={target}", shell=True)
86-
elif target.startswith("rocm-jupyter"):
87-
check_call(f"make test-{target.replace('rocm-jupyter-', 'jupyter-rocm-')}", shell=True)
88-
else:
89-
check_call(f"make test-{target}", shell=True)
52+
check_call(f"make test-{target}", shell=True)
9053
finally:
9154
# dump a lot of info to the GHA logs
9255
with gha_log_group("pod and statefulset info"):
@@ -109,7 +72,7 @@ def run_tests(target: str) -> None:
10972
# regular logs from a running (or finished) pod
11073
call("kubectl logs --selector=nosuchlabel!=nosuchvalue --all-pods --timestamps", shell=True)
11174

112-
check_call(f"make un{deploy}-{deploy_target}", shell=True)
75+
check_call(f"make undeploy-{target}", shell=True)
11376

11477
print(f"[INFO] Finished testing {target}")
11578

@@ -133,22 +96,6 @@ def execute(executor: typing.Callable, args: tuple, kwargs: dict) -> int:
13396
return result
13497

13598

136-
# TODO(jdanek) this is a dumb impl, needs to be improved
137-
def wait_for_stability(pod: str) -> None:
138-
"""Waits for the pod to be stable. Often I'm seeing that the probes initially fail.
139-
> error: Internal error occurred: error executing command in container: container is not created or running
140-
> error: unable to upgrade connection: container not found ("notebook")
141-
"""
142-
timeout = 100
143-
for _ in range(3):
144-
call(
145-
f"timeout {timeout}s bash -c 'until kubectl wait --for=condition=Ready pods --all --timeout 5s; do sleep 1; done'",
146-
shell=True,
147-
)
148-
timeout = 50
149-
time.sleep(3)
150-
151-
15299
# https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#grouping-log-lines
153100
@contextlib.contextmanager
154101
def gha_log_group(title):
@@ -170,81 +117,81 @@ def test_make_commands_jupyter(self, mock_execute: unittest.mock.Mock) -> None:
170117
"""Compares the commands with what we had in the openshift/release yaml"""
171118
run_tests("jupyter-minimal-ubi9-python-3.11")
172119
commands: list[str] = [c[0][1][0] for c in mock_execute.call_args_list]
173-
assert "make deploy9-jupyter-minimal-ubi9-python-3.11" in commands
120+
assert "make deploy-jupyter-minimal-ubi9-python-3.11" in commands
174121
assert "make test-jupyter-minimal-ubi9-python-3.11" in commands
175-
assert "make undeploy9-jupyter-minimal-ubi9-python-3.11" in commands
122+
assert "make undeploy-jupyter-minimal-ubi9-python-3.11" in commands
176123

177124
@unittest.mock.patch("make_test.execute")
178125
def test_make_commands_jupyter_rocm(self, mock_execute: unittest.mock.Mock) -> None:
179126
"""Compares the commands with what we had in the openshift/release yaml"""
180127
run_tests("rocm-jupyter-tensorflow-ubi9-python-3.11")
181128
commands: list[str] = [c[0][1][0] for c in mock_execute.call_args_list]
182-
assert "make deploy9-jupyter-rocm-tensorflow-ubi9-python-3.11" in commands
183-
assert "make test-jupyter-rocm-tensorflow-ubi9-python-3.11" in commands
184-
assert "make undeploy9-jupyter-rocm-tensorflow-ubi9-python-3.11" in commands
129+
assert "make deploy-rocm-jupyter-tensorflow-ubi9-python-3.11" in commands
130+
assert "make test-rocm-jupyter-tensorflow-ubi9-python-3.11" in commands
131+
assert "make undeploy-rocm-jupyter-tensorflow-ubi9-python-3.11" in commands
185132

186133
@unittest.mock.patch("make_test.execute")
187134
def test_make_commands_codeserver(self, mock_execute: unittest.mock.Mock) -> None:
188135
"""Compares the commands with what we had in the openshift/release yaml"""
189136
run_tests("codeserver-ubi9-python-3.11")
190137
commands: list[str] = [c[0][1][0] for c in mock_execute.call_args_list]
191-
assert "make deploy9-codeserver-ubi9-python-3.11" in commands
192-
assert "make validate-codeserver-image image=codeserver-ubi9-python-3.11" in commands
193-
assert "make undeploy9-codeserver-ubi9-python-3.11" in commands
138+
assert "make deploy-codeserver-ubi9-python-3.11" in commands
139+
assert "make test-codeserver-ubi9-python-3.11" in commands
140+
assert "make undeploy-codeserver-ubi9-python-3.11" in commands
194141

195142
@unittest.mock.patch("make_test.execute")
196143
def test_make_commands_rstudio(self, mock_execute: unittest.mock.Mock) -> None:
197144
"""Compares the commands with what we had in the openshift/release yaml"""
198145
run_tests("rstudio-c9s-python-3.11")
199146
commands: list[str] = [c[0][1][0] for c in mock_execute.call_args_list]
200-
assert "make deploy-c9s-rstudio-c9s-python-3.11" in commands
201-
assert "make validate-rstudio-image image=rstudio-c9s-python-3.11" in commands
202-
assert "make undeploy-c9s-rstudio-c9s-python-3.11" in commands
147+
assert "make deploy-rstudio-c9s-python-3.11" in commands
148+
assert "make test-rstudio-c9s-python-3.11" in commands
149+
assert "make undeploy-rstudio-c9s-python-3.11" in commands
203150

204151
@unittest.mock.patch("make_test.execute")
205152
def test_make_commands_rsudio_rhel(self, mock_execute: unittest.mock.Mock) -> None:
206153
"""Compares the commands with what we had in the openshift/release yaml"""
207154
run_tests("rstudio-rhel9-python-3.11")
208155
commands: list[str] = [c[0][1][0] for c in mock_execute.call_args_list]
209-
assert "make deploy-rhel9-rstudio-rhel9-python-3.11" in commands
210-
assert "make validate-rstudio-image image=rstudio-rhel9-python-3.11" in commands
211-
assert "make undeploy-rhel9-rstudio-rhel9-python-3.11" in commands
156+
assert "make deploy-rstudio-rhel9-python-3.11" in commands
157+
assert "make test-rstudio-rhel9-python-3.11" in commands
158+
assert "make undeploy-rstudio-rhel9-python-3.11" in commands
212159

213160
@unittest.mock.patch("make_test.execute")
214161
def test_make_commands_cuda_rstudio(self, mock_execute: unittest.mock.Mock) -> None:
215162
"""Compares the commands with what we had in the openshift/release yaml"""
216163
run_tests("cuda-rstudio-c9s-python-3.11")
217164
commands: list[str] = [c[0][1][0] for c in mock_execute.call_args_list]
218-
assert "make deploy-c9s-rstudio-c9s-python-3.11" in commands
219-
assert "make validate-rstudio-image image=cuda-rstudio-c9s-python-3.11" in commands
220-
assert "make undeploy-c9s-rstudio-c9s-python-3.11" in commands
165+
assert "make deploy-cuda-rstudio-c9s-python-3.11" in commands
166+
assert "make test-cuda-rstudio-c9s-python-3.11" in commands
167+
assert "make undeploy-cuda-rstudio-c9s-python-3.11" in commands
221168

222169
@unittest.mock.patch("make_test.execute")
223170
def test_make_commands_cuda_rstudio_rhel(self, mock_execute: unittest.mock.Mock) -> None:
224171
"""Compares the commands with what we had in the openshift/release yaml"""
225172
run_tests("cuda-rstudio-rhel9-python-3.11")
226173
commands: list[str] = [c[0][1][0] for c in mock_execute.call_args_list]
227-
assert "make deploy-rhel9-rstudio-rhel9-python-3.11" in commands
228-
assert "make validate-rstudio-image image=cuda-rstudio-rhel9-python-3.11" in commands
229-
assert "make undeploy-rhel9-rstudio-rhel9-python-3.11" in commands
174+
assert "make deploy-cuda-rstudio-rhel9-python-3.11" in commands
175+
assert "make test-cuda-rstudio-rhel9-python-3.11" in commands
176+
assert "make undeploy-cuda-rstudio-rhel9-python-3.11" in commands
230177

231178
@unittest.mock.patch("make_test.execute")
232179
def test_make_commands_runtime(self, mock_execute: unittest.mock.Mock) -> None:
233180
"""Compares the commands with what we had in the openshift/release yaml"""
234181
run_tests("runtime-datascience-ubi9-python-3.11")
235182
commands: list[str] = [c[0][1][0] for c in mock_execute.call_args_list]
236-
assert "make deploy9-runtimes-datascience-ubi9-python-3.11" in commands
237-
assert "make validate-runtime-image image=runtime-datascience-ubi9-python-3.11" in commands
238-
assert "make undeploy9-runtimes-datascience-ubi9-python-3.11" in commands
183+
assert "make deploy-runtime-datascience-ubi9-python-3.11" in commands
184+
assert "make test-runtime-datascience-ubi9-python-3.11" in commands
185+
assert "make undeploy-runtime-datascience-ubi9-python-3.11" in commands
239186

240187
@unittest.mock.patch("make_test.execute")
241188
def test_make_commands_rocm_runtime(self, mock_execute: unittest.mock.Mock) -> None:
242189
"""Compares the commands with what we had in the openshift/release yaml"""
243190
run_tests("rocm-runtime-pytorch-ubi9-python-3.11")
244191
commands: list[str] = [c[0][1][0] for c in mock_execute.call_args_list]
245-
assert "make deploy9-runtimes-rocm-pytorch-ubi9-python-3.11" in commands
246-
assert "make validate-runtime-image image=runtime-rocm-pytorch-ubi9-python-3.11" in commands
247-
assert "make undeploy9-runtimes-rocm-pytorch-ubi9-python-3.11" in commands
192+
assert "make deploy-rocm-runtime-pytorch-ubi9-python-3.11" in commands
193+
assert "make test-rocm-runtime-pytorch-ubi9-python-3.11" in commands
194+
assert "make undeploy-rocm-runtime-pytorch-ubi9-python-3.11" in commands
248195

249196

250197
if __name__ == "__main__":
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
---
22
apiVersion: kustomize.config.k8s.io/v1beta1
33
kind: Kustomization
4-
namePrefix: codeserver-
4+
namePrefix: codeserver-ubi9-python-3-11-
55
resources:
66
- pod.yaml
7+
transformers:
8+
- labels.yaml
79
images:
8-
- name: codeserver-workbench
10+
- name: quay.io/opendatahub/workbench-images
911
newName: quay.io/opendatahub/workbench-images
1012
newTag: codeserver-ubi9-python-3.11
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
apiVersion: builtin
2+
kind: LabelTransformer
3+
metadata:
4+
name: add-labels
5+
labels:
6+
app: codeserver-ubi9-python-3-11
7+
fieldSpecs:
8+
- path: metadata/labels
9+
create: true
10+
- path: spec/template/metadata/labels
11+
create: false

codeserver/ubi9-python-3.11/kustomize/base/pod.yaml

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,16 @@
22
apiVersion: v1
33
kind: Pod
44
metadata:
5-
name: pod
6-
labels:
7-
app: codeserver-image
5+
name: workbench
86
spec:
97
containers:
10-
- name: codeserver
11-
image: codeserver-workbench
12-
command: ["/bin/sh", "-c", "while true ; do date; sleep 5; done;"]
8+
- name: workbench
9+
image: quay.io/opendatahub/workbench-images
1310
imagePullPolicy: Always
1411
ports:
15-
- containerPort: 8585
12+
- name: workbench-port
13+
protocol: TCP
14+
containerPort: 8787
1615
resources:
1716
limits:
1817
cpu: 500m

jupyter/pytorch/ubi9-python-3.11/Dockerfile.cuda renamed to jupyter/cuda/pytorch/ubi9-python-3.11/Dockerfile.cuda

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ RUN yum install -y \
135135
${NV_CUDNN_PACKAGE_DEV} \
136136
&& yum clean all \
137137
&& rm -rf /var/cache/yum/*
138-
138+
139139
# Set this flag so that libraries can find the location of CUDA
140140
ENV XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda
141141

@@ -146,7 +146,7 @@ WORKDIR /opt/app-root/src
146146
#########################
147147
# cuda-jupyter-minimal #
148148
#########################
149-
FROM cuda-base AS cuda-jupyter-minimal
149+
FROM cuda-base AS cuda-jupyter-minimal
150150

151151
ARG JUPYTER_REUSABLE_UTILS=jupyter/utils
152152
ARG MINIMAL_SOURCE_CODE=jupyter/minimal/ubi9-python-3.11
@@ -156,7 +156,7 @@ WORKDIR /opt/app-root/bin
156156
COPY ${JUPYTER_REUSABLE_UTILS} utils/
157157

158158
COPY ${MINIMAL_SOURCE_CODE}/start-notebook.sh ./
159-
159+
160160
WORKDIR /opt/app-root/src
161161

162162
ENTRYPOINT ["start-notebook.sh"]
@@ -202,7 +202,7 @@ WORKDIR /opt/app-root/src
202202
FROM cuda-jupyter-datascience AS cuda-jupyter-pytorch
203203

204204
ARG DATASCIENCE_SOURCE_CODE=jupyter/datascience/ubi9-python-3.11
205-
ARG PYTORCH_SOURCE_CODE=jupyter/pytorch/ubi9-python-3.11
205+
ARG PYTORCH_SOURCE_CODE=jupyter/cuda/pytorch/ubi9-python-3.11
206206

207207
WORKDIR /opt/app-root/bin
208208

@@ -227,11 +227,11 @@ RUN echo "Installing softwares and packages" && \
227227
# Remove default Elyra runtime-images \
228228
rm /opt/app-root/share/jupyter/metadata/runtime-images/*.json && \
229229
# Replace Notebook's launcher, "(ipykernel)" with Python's version 3.x.y \
230-
sed -i -e "s/Python.*/$(python --version | cut -d '.' -f-2)\",/" /opt/app-root/share/jupyter/kernels/python3/kernel.json && \
230+
sed -i -e "s/Python.*/$(python --version | cut -d '.' -f-2)\",/" /opt/app-root/share/jupyter/kernels/python3/kernel.json && \
231231
# Disable announcement plugin of jupyterlab \
232-
jupyter labextension disable "@jupyterlab/apputils-extension:announcements" && \
232+
jupyter labextension disable "@jupyterlab/apputils-extension:announcements" && \
233233
# Apply JupyterLab addons \
234-
/opt/app-root/bin/utils/addons/apply.sh && \
234+
/opt/app-root/bin/utils/addons/apply.sh && \
235235
# Fix permissions to support pip in Openshift environments \
236236
chmod -R g+w /opt/app-root/lib/python3.11/site-packages && \
237237
fix-permissions /opt/app-root -P
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
apiVersion: kustomize.config.k8s.io/v1beta1
33
kind: Kustomization
4-
namePrefix: jupyter-pytorch-ubi9-python-3-11-
5-
commonLabels:
6-
app: jupyter-pytorch-ubi9-python-3-11
4+
namePrefix: cuda-jupyter-pytorch-ubi9-python-3-11-
75
resources:
86
- service.yaml
97
- statefulset.yaml
8+
transformers:
9+
- labels.yaml
1010
images:
1111
- name: quay.io/opendatahub/workbench-images
1212
newName: quay.io/opendatahub/workbench-images
13-
newTag: jupyter-pytorch-ubi9-python-3.11
13+
newTag: cuda-jupyter-pytorch-ubi9-python-3.11
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
apiVersion: builtin
2+
kind: LabelTransformer
3+
metadata:
4+
name: add-labels
5+
labels:
6+
app: cuda-jupyter-pytorch-ubi9-python-3-11
7+
fieldSpecs:
8+
- path: metadata/labels
9+
create: true
10+
- path: spec/template/metadata/labels
11+
create: false
12+
- path: spec/selector/matchLabels
13+
create: false

jupyter/tensorflow/ubi9-python-3.11/kustomize/base/service.yaml renamed to jupyter/cuda/pytorch/ubi9-python-3.11/kustomize/base/service.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
apiVersion: v1
33
kind: Service
44
metadata:
5-
name: notebook
5+
name: workbench
66
labels:
7-
app: notebook
7+
app: workbench
88
spec:
99
type: ClusterIP
1010
ports:
1111
- port: 8888
1212
protocol: TCP
13-
targetPort: notebook-port
13+
targetPort: workbench-port
1414
selector:
15-
app: notebook
15+
app: workbench

0 commit comments

Comments
 (0)