Scheduling

노드에 파드 할당하기

특정한 노드(들) 집합에서만 동작하도록 파드를 제한할 수 있다. 이를 수행하는 방법에는 여러 가지가 있으며 권장되는 접근 방식은 모두 레이블 셀렉터를 사용하여 선택을 용이하게 한다. 보통은 스케줄러가 자동으로 합리적인 배치(예: 자원이 부족한 노드에 파드를 배치하지 않도록 노드 간에 파드를 분배)를 수행하기에 이러한 제약 조건은 필요하지 않다. 그러나, 예를 들어 SSD가 장착된 머신에 파드가 배포되도록 하거나 또는 많은 통신을 하는 두 개의 서로 다른 서비스의 파드를 동일한 가용성 영역(availability zone)에 배치하는 경우와 같이, 파드가 어느 노드에 배포될지를 제어해야 하는 경우도 있다.

방법

  • 노드 레이블에 매칭되는 nodeSelector 필드
  • 어피니티 / 안티 어피니티
  • nodeName 필드

1. 노드 레이블

다른 쿠버네티스 오브젝트와 마찬가지로, 노드도 레이블을 가진다. 레이블을 수동으로 추가할 수 있다. 또한 쿠버네티스도 클러스터의 모든 노드에 표준화된 레이블 집합을 적용한다. 잘 알려진 레이블, 어노테이션, 테인트에서 널리 사용되는 노드 레이블의 목록을 확인한다.

  • 노드 조회
$ kubectl get nodes
NAME                                              STATUS   ROLES    AGE     VERSION
ip-10-83-80-162.ap-northeast-2.compute.internal   Ready    <none>   6d15h   v1.21.12-eks-5308cf7
ip-10-83-82-103.ap-northeast-2.compute.internal   Ready    <none>   6d15h   v1.21.12-eks-5308cf7
ip-10-83-84-128.ap-northeast-2.compute.internal   Ready    <none>   6d15h   v1.21.12-eks-5308cf7
  • 노드 레이블 조회
$ kubectl get nodes --show-labels
NAME                                              STATUS   ROLES    AGE     VERSION                LABELS
ip-10-83-80-162.ap-northeast-2.compute.internal   Ready    <none>   6d15h   v1.21.12-eks-5308cf7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup-image=ami-0918f823d29c638d9,eks.amazonaws.com/nodegroup=black-nodegroup-20220708083127543500000015,eks.amazonaws.com/sourceLaunchTemplateId=lt-01141f4c6a453c7f0,eks.amazonaws.com/sourceLaunchTemplateVersion=1,failure-domain.beta.kubernetes.io/region=ap-northeast-2,failure-domain.beta.kubernetes.io/zone=ap-northeast-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-83-80-162.ap-northeast-2.compute.internal,kubernetes.io/os=linux,node.kubernetes.io/instance-type=m5.2xlarge,topology.ebs.csi.aws.com/zone=ap-northeast-2a,topology.kubernetes.io/region=ap-northeast-2,topology.kubernetes.io/zone=ap-northeast-2a

ip-10-83-82-103.ap-northeast-2.compute.internal   Ready    <none>   6d15h   v1.21.12-eks-5308cf7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup-image=ami-0918f823d29c638d9,eks.amazonaws.com/nodegroup=black-nodegroup-20220708083127543500000015,eks.amazonaws.com/sourceLaunchTemplateId=lt-01141f4c6a453c7f0,eks.amazonaws.com/sourceLaunchTemplateVersion=1,failure-domain.beta.kubernetes.io/region=ap-northeast-2,failure-domain.beta.kubernetes.io/zone=ap-northeast-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-83-82-103.ap-northeast-2.compute.internal,kubernetes.io/os=linux,node.kubernetes.io/instance-type=m5.2xlarge,topology.ebs.csi.aws.com/zone=ap-northeast-2b,topology.kubernetes.io/region=ap-northeast-2,topology.kubernetes.io/zone=ap-northeast-2b

ip-10-83-84-128.ap-northeast-2.compute.internal   Ready    <none>   6d15h   v1.21.12-eks-5308cf7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup-image=ami-0918f823d29c638d9,eks.amazonaws.com/nodegroup=black-nodegroup-20220708083127543500000015,eks.amazonaws.com/sourceLaunchTemplateId=lt-01141f4c6a453c7f0,eks.amazonaws.com/sourceLaunchTemplateVersion=1,failure-domain.beta.kubernetes.io/region=ap-northeast-2,failure-domain.beta.kubernetes.io/zone=ap-northeast-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-83-84-128.ap-northeast-2.compute.internal,kubernetes.io/os=linux,node.kubernetes.io/instance-type=m5.2xlarge,topology.ebs.csi.aws.com/zone=ap-northeast-2c,topology.kubernetes.io/region=ap-northeast-2,topology.kubernetes.io/zone=ap-northeast-2c
  • 호스트 이름으로 조회
$ kubectl describe nodes ip-10-83-80-162.ap-northeast-2.compute.internal
Name:               ip-10-83-80-162.ap-northeast-2.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.2xlarge
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=ON_DEMAND
                    eks.amazonaws.com/nodegroup=black-nodegroup-20220708083127543500000015
                    eks.amazonaws.com/nodegroup-image=ami-0918f823d29c638d9
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-01141f4c6a453c7f0
                    eks.amazonaws.com/sourceLaunchTemplateVersion=1
                    failure-domain.beta.kubernetes.io/region=ap-northeast-2
                    failure-domain.beta.kubernetes.io/zone=ap-northeast-2a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-83-80-162.ap-northeast-2.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m5.2xlarge
                    topology.ebs.csi.aws.com/zone=ap-northeast-2a
                    topology.kubernetes.io/region=ap-northeast-2
                    topology.kubernetes.io/zone=ap-northeast-2a
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0f255f242c1b4616e","efs.csi.aws.com":"i-0f255f242c1b4616e"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 08 Jul 2022 17:32:26 +0900
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-83-80-162.ap-northeast-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Fri, 15 Jul 2022 08:58:03 +0900
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 15 Jul 2022 08:55:51 +0900   Fri, 08 Jul 2022 17:32:26 +0900   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 15 Jul 2022 08:55:51 +0900   Fri, 08 Jul 2022 17:32:26 +0900   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 15 Jul 2022 08:55:51 +0900   Fri, 08 Jul 2022 17:32:26 +0900   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 15 Jul 2022 08:55:51 +0900   Fri, 08 Jul 2022 17:32:47 +0900   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.83.80.162
  Hostname:     ip-10-83-80-162.ap-northeast-2.compute.internal
  InternalDNS:  ip-10-83-80-162.ap-northeast-2.compute.internal
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         8
  ephemeral-storage:           20959212Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      32408676Ki
  pods:                        58
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         7910m
  ephemeral-storage:           18242267924
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      31391844Ki
  pods:                        58
System Info:
  Machine ID:                 ec2cf9fd6e8ff9955c3f7269a4a9d3da
  System UUID:                ec2cf9fd-6e8f-f995-5c3f-7269a4a9d3da
  Boot ID:                    c48572a6-8e6d-427b-b5b4-f9f21e8e0838
  Kernel Version:             5.4.196-108.356.amzn2.x86_64
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.13
  Kubelet Version:            v1.21.12-eks-5308cf7
  Kube-Proxy Version:         v1.21.12-eks-5308cf7
ProviderID:                   aws:///ap-northeast-2a/i-0f255f242c1b4616e
Non-terminated Pods:          (19 in total)
  Namespace                   Name                                                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                            ------------  ----------  ---------------  -------------  ---
  code-server                 code-server-84f85bfbb7-q9rs5                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         42h
  gatekeeper-system           gatekeeper-controller-manager-77768dcc76-fmggf                  100m (1%)     1 (12%)     256Mi (0%)       512Mi (1%)     19h
  kube-system                 alb-controller-aws-load-balancer-controller-579798fdbf-5w8zb    0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d14h
  kube-system                 aws-cluster-autoscaler-84bd9c55fb-k28ps                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d14h
  kube-system                 aws-node-lsfh6                                                  25m (0%)      0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 coredns-6dbb778559-5vn52                                        100m (1%)     0 (0%)      70Mi (0%)        170Mi (0%)     6d15h
  kube-system                 ebs-csi-node-q6df6                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d14h
  kube-system                 efs-csi-controller-664994d876-9wqqx                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
  kube-system                 efs-csi-node-4x5tj                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
  kube-system                 kube-proxy-26nsg                                                100m (1%)     0 (0%)      0 (0%)           0 (0%)         6d15h
  linkerd-viz                 tap-766dd477f8-x2m94                                            200m (2%)     100m (1%)   70Mi (0%)        500Mi (1%)     16h
  linkerd                     linkerd-destination-7c8564ff97-g8gf6                            300m (3%)     100m (1%)   120Mi (0%)       750Mi (2%)     16h
  linkerd                     linkerd-identity-67bcfd69d4-l94zv                               200m (2%)     100m (1%)   30Mi (0%)        500Mi (1%)     16h
  linkerd                     linkerd-proxy-injector-6f464ddc76-98wj4                         200m (2%)     100m (1%)   70Mi (0%)        500Mi (1%)     16h
  litmus                      subscriber-cd959f546-4g8kx                                      125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     21h
  litmus                      workflow-controller-856d568f68-wt6dj                            125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     21h
  log-stack                   fluentd-ccwzk                                                   300m (3%)     300m (3%)   1Gi (3%)         1Gi (3%)       46h
  log-stack                   opensearch-cluster-coordinate-0                                 1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       46h
  log-stack                   opensearch-cluster-master-0                                     1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       46h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         3775m (47%)   4150m (52%)
  memory                      6336Mi (20%)  9052Mi (29%)
  ephemeral-storage           1000Mi (5%)   2Gi (11%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Events:                       <none>
  • 적용된 레이블 확인
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.2xlarge
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=ON_DEMAND
                    eks.amazonaws.com/nodegroup=black-nodegroup-20220708083127543500000015
                    eks.amazonaws.com/nodegroup-image=ami-0918f823d29c638d9
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-01141f4c6a453c7f0
                    eks.amazonaws.com/sourceLaunchTemplateVersion=1
                    failure-domain.beta.kubernetes.io/region=ap-northeast-2
                    failure-domain.beta.kubernetes.io/zone=ap-northeast-2a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-83-80-162.ap-northeast-2.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m5.2xlarge
                    topology.ebs.csi.aws.com/zone=ap-northeast-2a
                    topology.kubernetes.io/region=ap-northeast-2
                    topology.kubernetes.io/zone=ap-northeast-2a
  • 현재 스케줄링중인 pod 상태
Namespace                   Name                                                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                            ------------  ----------  ---------------  -------------  ---
  code-server                 code-server-84f85bfbb7-q9rs5                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         42h
  gatekeeper-system           gatekeeper-controller-manager-77768dcc76-fmggf                  100m (1%)     1 (12%)     256Mi (0%)       512Mi (1%)     19h
  kube-system                 alb-controller-aws-load-balancer-controller-579798fdbf-5w8zb    0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d14h
  kube-system                 aws-cluster-autoscaler-84bd9c55fb-k28ps                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d14h
  kube-system                 aws-node-lsfh6                                                  25m (0%)      0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 coredns-6dbb778559-5vn52                                        100m (1%)     0 (0%)      70Mi (0%)        170Mi (0%)     6d15h
  kube-system                 ebs-csi-node-q6df6                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d14h
  kube-system                 efs-csi-controller-664994d876-9wqqx                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
  kube-system                 efs-csi-node-4x5tj                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
  kube-system                 kube-proxy-26nsg                                                100m (1%)     0 (0%)      0 (0%)           0 (0%)         6d15h
  linkerd-viz                 tap-766dd477f8-x2m94                                            200m (2%)     100m (1%)   70Mi (0%)        500Mi (1%)     16h
  linkerd                     linkerd-destination-7c8564ff97-g8gf6                            300m (3%)     100m (1%)   120Mi (0%)       750Mi (2%)     16h
  linkerd                     linkerd-identity-67bcfd69d4-l94zv                               200m (2%)     100m (1%)   30Mi (0%)        500Mi (1%)     16h
  linkerd                     linkerd-proxy-injector-6f464ddc76-98wj4                         200m (2%)     100m (1%)   70Mi (0%)        500Mi (1%)     16h
  litmus                      subscriber-cd959f546-4g8kx                                      125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     21h
  litmus                      workflow-controller-856d568f68-wt6dj                            125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     21h
  log-stack                   fluentd-ccwzk                                                   300m (3%)     300m (3%)   1Gi (3%)         1Gi (3%)       46h
  log-stack                   opensearch-cluster-coordinate-0                                 1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       46h
  log-stack                   opensearch-cluster-master-0                                     1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       46h
  • 리소스 사용 현황을 보자 cpu 점유율이 47%이다.
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         3775m (47%)   4150m (52%)
  memory                      6336Mi (20%)  9052Mi (29%)
  ephemeral-storage           1000Mi (5%)   2Gi (11%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0

label을 이용한 특정 Node에 Pod 배포

테스트를 위해 nginx app을 1개 배포해 보자.

  • 네임스페이스를 생성한다.(scheduling-test)
$ kubectl create namespace scheduling-test
namespace/scheduling-test created
  • label 추가
kubectl label nodes [node_name] [key]=[value]

$ kubectl label nodes ip-10-83-80-162.ap-northeast-2.compute.internal key=mytest-node
node/ip-10-83-80-162.ap-northeast-2.compute.internal labeled

# 레이블 삭제
kubectl label nodes mytest-node key-
  • 레이블 확인(key : mytest-node)
$ kubectl describe nodes ip-10-83-80-162.ap-northeast-2.compute.internal
Name:               ip-10-83-80-162.ap-northeast-2.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.2xlarge
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=ON_DEMAND
                    eks.amazonaws.com/nodegroup=black-nodegroup-20220708083127543500000015
                    eks.amazonaws.com/nodegroup-image=ami-0918f823d29c638d9
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-01141f4c6a453c7f0
                    eks.amazonaws.com/sourceLaunchTemplateVersion=1
                    failure-domain.beta.kubernetes.io/region=ap-northeast-2
                    failure-domain.beta.kubernetes.io/zone=ap-northeast-2a
                    key=mytest-node
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-83-80-162.ap-northeast-2.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m5.2xlarge
                    topology.ebs.csi.aws.com/zone=ap-northeast-2a
                    topology.kubernetes.io/region=ap-northeast-2
                    topology.kubernetes.io/zone=ap-northeast-2a
  • scheduling.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
  namespace: scheduling-test
  labels:
    app: my-nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-nginx
  template:
    metadata:
      labels:
        app: my-nginx
    spec:
      containers:
        - name: my-nginx
          image: nginx:1.14.2
          ports:
            - containerPort: 80
          resources:
            requests: 
                cpu: "500m"
            limits: 
                cpu: "1000m"
      nodeSelector:
        key: mytest-node
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  namespace: scheduling-test
  labels:
    run: my-nginx
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  selector:
    app: my-nginx
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: scheduling-ingress
  namespace: scheduling-test
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
    - host: nginxtest.black.cloud.hancom.com
      http:
        paths:
          - backend:
              serviceName: my-nginx
              servicePort: 80
  • 배포
$ kubectl apply -f .\nodeselectortest.yaml
deployment.apps/my-nginx created
service/my-nginx created 
ingress.extensions/scheduling-ingress created
  • pod 상태 조회
$ kubectl get pods -n scheduling-test
NAME                        READY   STATUS    RESTARTS   AGE
my-nginx-5b5f4bdd49-qwk7x   1/1     Running   0          9s 
  • node 상태 조회
......
Non-terminated Pods:          (20 in total)
  Namespace                   Name                                                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                            ------------  ----------  ---------------  -------------  ---
  code-server                 code-server-84f85bfbb7-q9rs5                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         43h
  gatekeeper-system           gatekeeper-controller-manager-77768dcc76-fmggf                  100m (1%)     1 (12%)     256Mi (0%)       512Mi (1%)     20h
  kube-system                 alb-controller-aws-load-balancer-controller-579798fdbf-5w8zb    0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 aws-cluster-autoscaler-84bd9c55fb-k28ps                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 aws-node-lsfh6                                                  25m (0%)      0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 coredns-6dbb778559-5vn52                                        100m (1%)     0 (0%)      70Mi (0%)        170Mi (0%)     6d16h
  kube-system                 ebs-csi-node-q6df6                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 efs-csi-controller-664994d876-9wqqx                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
  kube-system                 efs-csi-node-4x5tj                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
  kube-system                 kube-proxy-26nsg                                                100m (1%)     0 (0%)      0 (0%)           0 (0%)         6d15h
  linkerd-viz                 tap-766dd477f8-x2m94                                            200m (2%)     100m (1%)   70Mi (0%)        500Mi (1%)     16h
  linkerd                     linkerd-destination-7c8564ff97-g8gf6                            300m (3%)     100m (1%)   120Mi (0%)       750Mi (2%)     16h
  linkerd                     linkerd-identity-67bcfd69d4-l94zv                               200m (2%)     100m (1%)   30Mi (0%)        500Mi (1%)     16h
  linkerd                     linkerd-proxy-injector-6f464ddc76-98wj4                         200m (2%)     100m (1%)   70Mi (0%)        500Mi (1%)     16h
  litmus                      subscriber-cd959f546-4g8kx                                      125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     22h
  litmus                      workflow-controller-856d568f68-wt6dj                            125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     22h
  log-stack                   fluentd-ccwzk                                                   300m (3%)     300m (3%)   1Gi (3%)         1Gi (3%)       47h
  log-stack                   opensearch-cluster-coordinate-0                                 1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       47h
  log-stack                   opensearch-cluster-master-0                                     1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       47h
  scheduling-test             my-nginx-5b5f4bdd49-qwk7x                                       500m (6%)     1 (12%)     0 (0%)           0 (0%)         90s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         4275m (54%)   5150m (65%)
  memory                      6336Mi (20%)  9052Mi (29%)
  ephemeral-storage           1000Mi (5%)   2Gi (11%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0

Deployment를 수정해 리소스 제한 요청을 늘려보자

  • 기존 리소스 삭제
$ kubectl delete -f nodeselectortest.yaml 
deployment.apps "my-nginx" deleted
service "my-nginx" deleted 
ingress.extensions "scheduling-ingress" deleted
resources:
    requests: 
        cpu: "5000m"
    limits: 
        cpu: "6000m"
  • 현재 상태 확인
$ kubectl describe nodes ip-10-83-80-162.ap-northeast-2.compute.internal
.....
Non-terminated Pods:          (19 in total)
  Namespace                   Name                                                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                            ------------  ----------  ---------------  -------------  ---
  code-server                 code-server-84f85bfbb7-q9rs5                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         43h
  gatekeeper-system           gatekeeper-controller-manager-77768dcc76-fmggf                  100m (1%)     1 (12%)     256Mi (0%)       512Mi (1%)     20h
  kube-system                 alb-controller-aws-load-balancer-controller-579798fdbf-5w8zb    0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 aws-cluster-autoscaler-84bd9c55fb-k28ps                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 aws-node-lsfh6                                                  25m (0%)      0 (0%)      0 (0%)           0 (0%)         6d16h
  kube-system                 coredns-6dbb778559-5vn52                                        100m (1%)     0 (0%)      70Mi (0%)        170Mi (0%)     6d16h
  kube-system                 ebs-csi-node-q6df6                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 efs-csi-controller-664994d876-9wqqx                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         18h
  kube-system                 efs-csi-node-4x5tj                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         18h
  kube-system                 kube-proxy-26nsg                                                100m (1%)     0 (0%)      0 (0%)           0 (0%)         6d16h
  linkerd-viz                 tap-766dd477f8-x2m94                                            200m (2%)     100m (1%)   70Mi (0%)        500Mi (1%)     16h
  linkerd                     linkerd-destination-7c8564ff97-g8gf6                            300m (3%)     100m (1%)   120Mi (0%)       750Mi (2%)     16h
  linkerd                     linkerd-identity-67bcfd69d4-l94zv                               200m (2%)     100m (1%)   30Mi (0%)        500Mi (1%)     16h
  linkerd                     linkerd-proxy-injector-6f464ddc76-98wj4                         200m (2%)     100m (1%)   70Mi (0%)        500Mi (1%)     16h
  litmus                      subscriber-cd959f546-4g8kx                                      125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     22h
  litmus                      workflow-controller-856d568f68-wt6dj                            125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     22h
  log-stack                   fluentd-ccwzk                                                   300m (3%)     300m (3%)   1Gi (3%)         1Gi (3%)       47h
  log-stack                   opensearch-cluster-coordinate-0                                 1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       47h
  log-stack                   opensearch-cluster-master-0                                     1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       47h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         3775m (47%)   4150m (52%)
  memory                      6336Mi (20%)  9052Mi (29%)
  ephemeral-storage           1000Mi (5%)   2Gi (11%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Events:                       <none>
  • 배포후 pod 상태 확인
$ kubectl apply -f nodeselectortest.yaml
deployment.apps/my-nginx created
service/my-nginx created 
ingress.extensions/scheduling-ingress created

$ kubectl get pods -n scheduling-test
NAME                        READY   STATUS    RESTARTS   AGE
my-nginx-7cbcf9fb8d-lqcpg   0/1     Pending   0          13s
  • 로그 조회
$ kubectl describe pods/my-nginx-7cbcf9fb8d-lqcpg -n scheduling-test

Events:
  Type     Reason             Age                    From                Message
  ----     ------             ----                   ----                -------
  Warning  FailedScheduling   5m28s (x2 over 5m29s)  default-scheduler   0/3 nodes are available: 1 Insufficient cpu, 2 node(s) didn't match Pod's node affinity/selector.
  Normal   TriggeredScaleUp   5m26s                  cluster-autoscaler  pod triggered scale-up: [{eks-black-nodegroup-20220708083127543500000015-58c0ee77-b6ac-cd2a-5d95-c6d8c8c059b3 3->4 (max: 6)}]
  Warning  FailedScheduling   4m15s (x3 over 4m43s)  default-scheduler   0/4 nodes are available: 1 Insufficient cpu, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling   3m55s (x2 over 4m5s)   default-scheduler   0/4 nodes are available: 1 Insufficient cpu, 3 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling   2m51s (x3 over 3m19s)  default-scheduler   0/5 nodes are available: 1 Insufficient cpu, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling   2m31s (x2 over 2m41s)  default-scheduler   0/5 nodes are available: 1 Insufficient cpu, 4 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling   93s (x3 over 2m1s)     default-scheduler   0/6 nodes are available: 1 Insufficient cpu, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling   72s (x2 over 82s)      default-scheduler   0/6 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match Pod's node affinity/selector.
  Normal   NotTriggerScaleUp  24s (x10 over 2m35s)   cluster-autoscaler  pod didn't trigger scale-up: 1 max node group size reached

확인 결과

Running중인 pod의 드레인이나 리스케줄링 같은 특이상황은 관찰되지 않았으며, 리소스 부족으로 오토스케일링이 발생하여 max 값까지 노드들이 증가하지만 레이블이 존재하지 않으므로 pod 배포에 실패한다.

priorityClassName 적용후 동작 확인

  • priorityClass 적용
nodeSelector:
        key: mytest-node
priorityClassName: system-cluster-critical
  • 배포 성공 및 기존 pod 드레인 확인
Non-terminated Pods:          (16 in total)
  Namespace                   Name                                                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                            ------------  ----------  ---------------  -------------  ---
  code-server                 code-server-84f85bfbb7-q9rs5                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         43h
  gatekeeper-system           gatekeeper-controller-manager-77768dcc76-fmggf                  100m (1%)     1 (12%)     256Mi (0%)       512Mi (1%)     20h
  kube-system                 alb-controller-aws-load-balancer-controller-579798fdbf-5w8zb    0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 aws-cluster-autoscaler-84bd9c55fb-k28ps                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 aws-node-lsfh6                                                  25m (0%)      0 (0%)      0 (0%)           0 (0%)         6d16h
  kube-system                 coredns-6dbb778559-5vn52                                        100m (1%)     0 (0%)      70Mi (0%)        170Mi (0%)     6d16h
  kube-system                 ebs-csi-node-q6df6                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d15h
  kube-system                 efs-csi-controller-664994d876-9wqqx                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         18h
  kube-system                 efs-csi-node-4x5tj                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         18h
  kube-system                 kube-proxy-26nsg                                                100m (1%)     0 (0%)      0 (0%)           0 (0%)         6d16h
  litmus                      subscriber-cd959f546-4g8kx                                      125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     22h
  litmus                      workflow-controller-856d568f68-wt6dj                            125m (1%)     225m (2%)   300Mi (0%)       500Mi (1%)     22h
  log-stack                   fluentd-ccwzk                                                   300m (3%)     300m (3%)   1Gi (3%)         1Gi (3%)       47h
  log-stack                   opensearch-cluster-coordinate-0                                 1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       47h
  log-stack                   opensearch-cluster-master-0                                     1 (12%)       1 (12%)     2Gi (6%)         2Gi (6%)       47h
  scheduling-test             my-nginx-7c9849cffb-nr6gn                                       5 (63%)       6 (75%)     0 (0%)           0 (0%)         99s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         7875m (99%)   9750m (123%)
  memory                      6046Mi (19%)  6802Mi (22%)
  ephemeral-storage           1000Mi (5%)   2Gi (11%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Events:                       <none>

결론

우선순위가 낮은 pod들이 이동되는것을 확인함