Headroom Lab

In order to demonstrate the headroom effect on POD scheduling time, we are going to create two different VNGs, one with manual headroom capacity and one without, following that, we will run two new deployments on the cluster, each one will be directed to a different VNG, we’ll see that the one direct toward the VNG with the headroom capacity will be scheduled much sooner if not immediate while the other one will be pending for a while.


Navigate back to the Spot Console within your Ocean cluster, enter the Virtual Node Groups tab vng3-1

Press on the “+ Create VNG” vng3-1

In the pop-up screen, choose Configure Manually, this will derive all the configurations from the default VNG, with the ability to override any parameters you choose.

vng3-1

Specify example-3-1 as the VNG Name and in the Node Selection section specify the following Node Labels:

vng3-1

Key = env Value = ocean-workshop

Click “Add Node Label” to create a dropdown for a second Node Label:

Key = example

Value = 3-1

Scroll down on the VNG setup page and open the Advanced section –> And update your headroom values as shown below:

Reserve: 3
CPU: 100
Memory: 256
GPU: 0

Your final result should look like this:

vng3-1

Click “Create” at the bottom of the page.

We now have our headroom VNG ready.

2. Create Additional VNG with No Headroom

Now let’s create another VNG, this one without any Headroom capacity.

Press on the “+ Create VNG” vng3-1

In the pop-up screen, choose Configure Manually, this will derive all the configurations from the default VNG, with the ability to override any parameters you choose.

vng3-1

Specify example-3-2 as the VNG Name and in the Node Selection section specify the following Node Labels:

vng3-1

Key = env Value = ocean-workshop

Click “Add Node Label” to create a dropdown for a second Node Label:

Key = example Value = 3-2

Now let’s check what headroom looks like in the cluster. Navigate to the nodes tab in the cluster.

nodes

Now let’s check what headroom looks like in the cluster. Navigate to the nodes tab in the cluster. Look for a node to be created under the “example-3-1 virtual node group and hover your mouse over the “Resource Allocation” graph of the node and we should expect to see a node with .3 vCPU reserved.

This headroom serves as an extra buffer of CPU/Memory capacity so if a new pod were to come into the cluster targeting this VNG, the Kubernetes scheduler would immediately be able to schedule that pod without waiting for a new node to be spun up (assuming the headroom is large enough).

Another way we can check the headroom and always understand the actions Ocean is taking is from the logs tab within Ocean.

Navigate to the “Log” Tab within your Ocean cluster.

Logs

We should see a recent scaling event that looks similar to this one:

Spotinst Kubernetes Controller, Instances Launched. Launched 1 instances. Ids: [sir-kfjyyeik] (view details)

Click on “view details”. This will launch a popup explaining why we launched a new node (for headroom) and all the details for this scaling event.

Logs

Now that we have verified our headroom is in place, let’s deploy the following yaml file to put it to the test.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-3-1
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-dev
        image: nginx
        resources:
          requests:
            memory: "100Mi"
            cpu: "256m"
      nodeSelector:
        example: 3-1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-3-2
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-dev
        image: nginx
        resources:
          requests:
            memory: "100Mi"
            cpu: "256m"
      nodeSelector:
        example: 3-2

Expected Behavior: This yaml file will create two deployments targeting VNG 3-1 and VNG 3-2 respectively. We should expect that the example 3-1 pod will get scheduled immediately and 3-2 will have to wait for a new node to be created and registered to EKS before it can be scheduled. Although it may take less than 2 minutes for the new node to get scheduled, many businesses demand the fastest possible scaling.

To apply this file, navigate back to your cloud9 IDE and run the following:

kubectl apply -f /home/ec2-user/environment/spot-workshop-template/headroom-example.yaml

Your output should mirror the one shown below:

➜  kubectl apply -f headroom-example.yaml                                                                                                                                                                                                          
deployment.apps/example-3-1 created
deployment.apps/example-3-2 created

Now run the following command:

kubectl get pods,nodes -o wide

We’ll now see the Pods from the first deployment (example-3-1) are scheduled and running while the Pods from the second deployment remain in a pending state (example-3-2)

➜  kubectl get pods,nodes                                                                                                                                                                                                                   
NAME                               READY   STATUS    RESTARTS   AGE
pod/example-1-7f8b5549bb-xnsfq     1/1     Running   0          111m
pod/example-2-1-64d9d4877d-rv2pg   1/1     Running   0          63m
pod/example-2-2-7d55b5b8cf-dfkpw   0/1     Pending   0          63m
pod/example-3-1-7c8db8d4cc-4njwh   1/1     Running   0          9s
pod/example-3-1-7c8db8d4cc-ldj8z   1/1     Running   0          9s
pod/example-3-1-7c8db8d4cc-x6wwg   1/1     Running   0          9s
pod/example-3-2-c47d8b56d-fc79p    0/1     Pending   0          7s
pod/example-3-2-c47d8b56d-hnq25    0/1     Pending   0          7s
pod/example-3-2-c47d8b56d-lcr4r    0/1     Pending   0          7s

NAME                                                STATUS   ROLES    AGE     VERSION
node/ip-192-168-24-193.us-west-2.compute.internal   Ready    <none>   62m     v1.19.6-eks-49a6c0
node/ip-192-168-46-105.us-west-2.compute.internal   Ready    <none>   3h15m   v1.19.6-eks-49a6c0
node/ip-192-168-70-3.us-west-2.compute.internal     Ready    <none>   28m     v1.19.6-eks-49a6c0

After waiting two minutes, let’s run that command again and verify the example 3-2 pods got scheduled.

➜ kubectl get pods,nodes                                                                                                                                                                                                                    21/06/14 | 15:40:02
NAME                               READY   STATUS              RESTARTS   AGE
pod/example-1-7f8b5549bb-xnsfq     1/1     Running             0          112m
pod/example-2-1-64d9d4877d-rv2pg   1/1     Running             0          64m
pod/example-2-2-7d55b5b8cf-dfkpw   0/1     Pending             0          64m
pod/example-3-1-7c8db8d4cc-4njwh   1/1     Running             0          112s
pod/example-3-1-7c8db8d4cc-ldj8z   1/1     Running             0          112s
pod/example-3-1-7c8db8d4cc-x6wwg   1/1     Running             0          112s
pod/example-3-2-c47d8b56d-fc79p    0/1     ContainerCreating   0          110s
pod/example-3-2-c47d8b56d-hnq25    0/1     ContainerCreating   0          110s
pod/example-3-2-c47d8b56d-lcr4r    0/1     ContainerCreating   0          110s

NAME                                                STATUS   ROLES    AGE     VERSION
node/ip-192-168-17-86.us-west-2.compute.internal    Ready    <none>   52s     v1.19.6-eks-49a6c0
node/ip-192-168-24-193.us-west-2.compute.internal   Ready    <none>   63m     v1.19.6-eks-49a6c0
node/ip-192-168-46-105.us-west-2.compute.internal   Ready    <none>   3h17m   v1.19.6-eks-49a6c0
node/ip-192-168-70-3.us-west-2.compute.internal     Ready    <none>   30m     v1.19.6-eks-49a6c0

That’s a wrap for our headroom lab. Advance to the next page for our recap.