Outcomes
You should be able to monitor the memory usage of an application, and set a memory limit for a pod.
As the student user on the workstation machine, use the lab command to prepare your system for this exercise.
This command ensures that all resources are available for this exercise.
It also creates the reliability-limits project and the /home/student/DO180/labs/reliability-limits/resources.txt file.
The resources.txt file contains some commands that you use during the exercise.
You can use the file to copy and paste these commands.
[student@workstation ~]$ lab start reliability-limits
Instructions
Log in to the OpenShift cluster as the
developeruser with thedeveloperpassword. Use thereliability-limitsproject.Log in to the OpenShift cluster.
[student@workstation ~]$
oc login -u developer -p developer \https://api.ocp4.example.com:6443Login successful. ...output omitted...Set the
reliability-limitsproject as the active project.[student@workstation ~]$
oc project reliability-limits...output omitted...
Create the
leakappdeployment from the~/DO180/labs/reliability-limits/leakapp.ymlfile that thelabcommand prepared. The application has a bug, and leaks 1 MiB of memory every second.Review the
~/DO180/labs/reliability-limits/leakapp.ymlresource file. The memory limit is set to 35 MiB. Do not change the file....output omitted... resources: requests: memory: 20Mi
limits:memory: 35MiUse the
oc applycommand to create the application. Ignore the warning message.[student@workstation ~]$
oc apply -f \~/DO180/labs/reliability-limits/leakapp.ymldeployment.apps/leakapp createdWait for the pod to start. You might have to rerun the command several times for the pod to report a
Runningstatus. The name of the pod on your system probably differs.[student@workstation ~]$
oc get podsNAME READY STATUS RESTARTS AGE leakapp-99bb64c8d-hk26k 1/1Running0 12s
Watch the pod. OpenShift restarts the pod after 30 seconds.
Use the
watchcommand to monitor theoc get podscommand. Wait for OpenShift to restart the pod, and then press Ctrl+C to quit thewatchcommand.[student@workstation ~]$
watch oc get podsEvery 2.0s: oc get pods workstation: Wed Mar 8 07:27:45 2023 NAME READY STATUS RESTARTS AGE leakapp-99bb64c8d-hk26k 1/1 Running1(15s ago) 48sRetrieve the container status to verify that OpenShift restarted the pod due to an Out-Of-Memory (OOM) event.
[student@workstation ~]$
oc get pods leakapp-99bb64c8d-hk26k\-o jsonpath='{.status.containerStatuses[0].lastState}' | jq .{ "terminated": { "containerID": "cri-o://5800...1d04", "exitCode": 137, "finishedAt": "2023-03-08T12:29:24Z","reason": "OOMKilled", "startedAt": "2023-03-08T12:28:53Z" } }
Observe the pod status for a few minutes, until the
CrashLoopBackOffstatus is displayed. During this period, OpenShift restarts the pod several times because of the memory leak.Between each restart, OpenShift sets the pod status to
CrashLoopBackOff, waits an increasing amount of time between retries, and then restarts the pod. The delay between restarts gives the operator the opportunity to fix the issue.After various retries, OpenShift finally sets the
CrashLoopBackOffwait timer to five minutes. During this wait time, the application is not available to your customers.[student@workstation ~]$
watch oc get podsEvery 2.0s: oc get pods workstation: Wed Mar 8 07:33:15 2023 NAME READY STATUS RESTARTS AGE leakapp-99bb64c8d-hk26k 0/1CrashLoopBackOff4(82s ago) 5m25sPress Ctrl+C to quit the
watchcommand.Fixing the memory leak would resolve the issue. However, it might take some time for the developers to fix the bug. In the meantime, set the memory limit to 600 MiB. With this setting, the pod can run for ten minutes before the application reaches the limit.
Use the
oc set resourcescommand to set the new limit. Ignore the warning message.[student@workstation ~]$
oc set resources deployment/leakapp \--limits memory=600Mideployment.apps/leakapp resource requirements updatedWait for the pod to start. You might have to rerun the command several times for the pod to report a
Runningstatus. The name of the pod on your system probably differs.[student@workstation ~]$
oc get podsNAME READY STATUS RESTARTS AGE leakapp-6bc64dfcd-86fpc 1/1Running0 12sWait two minutes to verify that OpenShift no longer restarts the pod every 30 seconds.
[student@workstation ~]$
watch oc get podsEvery 2.0s: oc get pods workstation: Wed Mar 8 07:38:15 2023 NAME READY STATUS RESTARTS AGE leakapp-6bc64dfcd-86fpc 1/1 Running03m12sPress Ctrl+C to quit the
watchcommand.
Review the memory that the pod consumes. You might have to rerun the command several times for the metrics to be available. The memory usage on your system probably differs.
[student@workstation ~]$
oc adm top podsNAME CPU(cores) MEMORY(bytes) leakapp-6bc64dfcd-86fpc 0m174MiOptional. Wait about 10 minutes from the creation time until the application reaches the out of memory error. After this period, OpenShift restarts the pod, because it reached the 600 MiB memory limit.
Open a new terminal window, and then run the
watchcommand to monitor theoc adm top podscommand.[student@workstation ~]$
watch oc adm top podsEvery 2.0s: oc adm top pods workstation: Wed Mar 8 07:38:55 2023 NAME CPU(cores) MEMORY(bytes) leakapp-6bc64dfcd-86fpc 0m 176MiLeave the command running and do not interrupt it.
Note
You might see a message that metrics are not yet available. If so, wait some time and try again.
In the first terminal, run the
watchcommand to monitor theoc get podscommand. Watch the output of theoc adm top podscommand in the second terminal. When the memory usage reaches 600 MiB, the OOM subsystem kills the process inside the container, and OpenShift restarts the pod.[student@workstation ~]$
watch oc get podsEvery 2.0s: oc get pods workstation: Wed Mar 8 07:46:35 2023 NAME READY STATUS RESTARTS AGE leakapp-6bc64dfcd-86fpc 1/1 Running1(3s ago) 9m58sPress Ctrl+C to quit the
watchcommand.Press Ctrl+C to quit the
watchcommand in the second terminal. Close this second terminal when done.