Outcomes
Explore how the
restartPolicyattribute affects crashing pods.Observe the behavior of a slow-starting application that has no configured probes.
Use a deployment to scale the application, and observe the behavior of a broken pod.
As the student user on the workstation machine, use the lab command to prepare your system for this exercise.
This command ensures that the following conditions are true:
The
reliability-haproject exists.The resource files are available in the course directory.
The classroom registry has the
long-loadcontainer image.
The long-load container image contains an application with utility endpoints.
These endpoints perform such tasks as crashing the process and toggling the server's health status.
[student@workstation ~]$ lab start reliability-ha
Instructions
As the
developeruser, create a pod from a YAML manifest in thereliability-haproject.Log in as the
developeruser with thedeveloperpassword.[student@workstation ~]$
oc login -u developer -p developer \ https://api.ocp4.example.com:6443...output omitted...Select the
reliability-haproject.[student@workstation ~]$
oc project reliability-haNow using project "reliability-ha" on server "https://api.ocp4.example.com:6443".Navigate to the lab materials directory and view the contents of the pod definition. In particular,
restartPolicyis set toAlways.[student@workstation ~]$
cd DO180/labs/reliability-ha[student@workstation reliability-ha]$
cat long-load.yamlapiVersion: v1 kind: Pod metadata: name: long-load spec: containers: - image: registry.ocp4.example.com:8443/redhattraining/long-load:v1 name: long-load securityContext: allowPrivilegeEscalation: falserestartPolicy: AlwaysCreate a pod by using the
oc applycommand.[student@workstation reliability-ha]$
oc apply -f long-load.yamlpod/long-load createdSend a request to the pod to confirm that it is running and responding.
[student@workstation reliability-ha]$
oc exec long-load -- \ curl -s localhost:3000/healthOk
Trigger the pod to crash, and observe that the
restartPolicyinstructs the cluster to re-create the pod.Observe that the pod is running and has not restarted.
[student@workstation reliability-ha]$
oc get podsNAME READY STATUS RESTARTS AGE long-load 1/1 Running01mSend a request to the
/destructendpoint in the application. This request triggers the process to crash.[student@workstation reliability-ha]$
oc exec long-load -- \ curl -s localhost:3000/destructcommand terminated with exit code 52Observe that the pod is running and restarted one time.
[student@workstation reliability-ha]$
oc get podsNAME READY STATUS RESTARTS AGE long-load 1/1 Running1(34s ago) 4m16sDelete the
long-loadpod.[student@workstation reliability-ha]$
oc delete pod long-loadpod "long-load" deletedThe pod is not re-created, because it was created manually, and not via a workload resource such as a deployment.
Use a restart policy of
Neverto create the pod, and observe that it is not re-created on crashing.Modify the
long-load.yamlfile so that therestartPolicyis set toNever....output omitted... restartPolicy:
NeverCreate the pod with the updated YAML file.
[student@workstation reliability-ha]$
oc apply -f long-load.yamlpod/long-load createdSend a request to the pod to confirm that the pod is running and that the application is responding.
[student@workstation reliability-ha]$
oc exec long-load -- \ curl -s localhost:3000/healthOkSend a request to the
/destructendpoint in the application to crash it.[student@workstation reliability-ha]$
oc exec long-load -- \ curl -s localhost:3000/destructcommand terminated with exit code 52Observe that the pod is not restarted and is in an error state.
[student@workstation reliability-ha]$
oc get podsNAME READY STATUS RESTARTS AGE long-load 0/1Error0 2m36sDelete the
long-loadpod.[student@workstation reliability-ha]$
oc delete pod long-loadpod "long-load" deleted
Because the cluster does not know when the application inside the pod is ready to receive requests, you must add a startup delay to the application. Adding this capability by using probes is covered in a later exercise.
Update the
long-load.yamlfile by adding a startup delay and use a restart policy ofAlways. Set theSTART_DELAYvariable to 60,000 milliseconds (one minute) so that the file looks like the following excerpt:...output omitted... spec: containers: - image: registry.ocp4.example.com:8443/redhattraining/long-load:v1 imagePullPolicy: Always securityContext: allowPrivilegeEscalation: false name: long-load
env: - name: START_DELAY value: "60000"restartPolicy: AlwaysNote
Although numbers are a valid YAML type, environment variables must be passed as strings. YAML syntax is also indentation-sensitive.
For these reasons, ensure that your file appears exactly as the preceding example.
Apply the YAML file to create the pod and proceed within one minute to the next step.
[student@workstation reliability-ha]$
oc apply -f long-load.yamlpod/long-load createdWithin a minute of pod creation, verify the status of the pod. The status shows as ready even though it is not. Try to send a request to the application, and observe that it fails.
[student@workstation reliability-ha]$
oc get podsNAME READY STATUS RESTARTS AGE long-load 1/1 Running 0 16s[student@workstation reliability-ha]$
oc exec long-load -- \ curl -s localhost:3000/healthapp is still startingAfter waiting a minute for the application to start, send another a request to the pod to confirm that it is running and responding.
[student@workstation reliability-ha]$
oc exec long-load -- \ curl -s localhost:3000/healthOk
Use a deployment to scale up the number of deployed pods. Observe that deleting the pods causes service outages, even though the deployment handles re-creating the pods.
Review the
long-load-deploy.yamlfile, which defines a deployment, service, and route. The deployment creates three replicas of the application pod.In each pod, a
START_DELAYenvironment variable is set to 15,000 milliseconds (15 seconds). In each pod, the application responds that it is not ready until after the delay.[student@workstation reliability-ha]$
cat long-load-deploy.yaml...output omitted... spec: containers: - image: registry.ocp4.example.com:8443/redhattraining/long-load:v1 imagePullPolicy: Always name: long-load env:- name: START_DELAY value: "15000"...output omitted...Start the load test script, which sends a request to the
/healthAPI endpoint of the application every two seconds. Leave the script running in a visible terminal window.[student@workstation reliability-ha]$
./load-test.sh...output omitted...In a new terminal window, apply the
~/DO180/labs/reliability-ha/long-load-deploy.yamlfile.[student@workstation ~]$
oc apply -f \ ~/DO180/labs/reliability-ha/long-load-deploy.yamldeployment.apps/long-load created service/long-load created route.route.openshift.io/long-load createdWatch the output of the load test script as the pods and the application instances start. After a delay, the requests succeed.
...output omitted... Ok Ok Ok ...output omitted...
By using the
/togglesickAPI endpoint of the application, put one of the three pods into a broken state.[student@workstation ~]$
curl \ long-load-reliability-ha.apps.ocp4.example.com/togglesickno output expectedWatch the output of the load test script as some requests start failing. Because of the load balancer, the exact order of the output is random.
...output omitted... Ok app is unhealthy app is unhealthy Ok Ok ...output omitted...
Press Ctrl+C to end the load test script.
Return to the
/home/student/directory.[student@workstation reliability-ha]$
cd /home/student/[student@workstation ~]$