目录

k8s学习笔记——基础篇——资源调度

简介:Replication Controller 和 ReplicaSet介绍。Deployment,StatefulSet,DaemonSet的介绍和基本使用,包括创建、扩容和更新、回滚等操作。

Replication Controller 和 ReplicaSet

Replication Controller(复制控制器,RC)和ReplicaSet(复制集,RS)是两种简单部署Pod的方式。因为在生产环境中,主要使用更高级的Deployment等方式进行Pod的管理和部署,所以本节只对Replication Controller和Replica Set的部署方式进行简单介绍。

Replication Controller

Replication Controller(简称RC)可确保Pod副本数达到期望值,也就是RC定义的数量。换句话说,Replication Controller可确保一个Pod或一组同类Pod总是可用。

如果存在的Pod大于设定的值,则Replication Controller将终止额外的Pod。如果太小,Replication Controller将启动更多的Pod用于保证达到期望值。与手动创建Pod不同的是,用Replication Controller维护的Pod在失败、删除或终止时会自动替换。因此即使应用程序只需要一个Pod,也应该使用Replication Controller或其他方式管理。Replication Controller类似于进程管理程序,但是Replication Controller不是监视单个节点上的各个进程,而是监视多个节点上的多个Pod。

定义一个Replication Controller的示例如下。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx
spec:
  replicas: 3
  selector:
    app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

ReplicaSet

ReplicaSet是支持基于集合的标签选择器的下一代Replication Controller,它主要用作Deployment协调创建、删除和更新Pod,和Replication Controller唯一的区别是,ReplicaSet支持标签选择器。在实际应用中,虽然ReplicaSet可以单独使用,但是一般建议使用Deployment来自动管理ReplicaSet,除非自定义的Pod不需要更新或有其他编排等。

定义一个ReplicaSet的示例如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: frontend
  labels:
    app: guestbook
    tier: frontend
spec:
  # modify replicas according to your case
  replicas: 3
  selector:
    matchLabels:
      tier: frontend
    matchExpressions:
      - {key: tier, operator: In, values: [frontend]}
  template:
    metadata:
      labels:
        app: guestbook
        tier: frontend
    spec:
      containers:
      - name: php-redis
        image: gcr.io/google_samples/gb-frontend:v3
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        env:
        - name: GET_HOSTS_FROM
          value: dns
          # If your cluster config does not include a dns service, then to
          # instead access environment variables to find service host
          # info, comment out the 'value: dns' line above, and uncomment the
          # line below.
          # value: env
        ports:
        - containerPort: 80

Replication Controller和ReplicaSet的创建删除和Pod并无太大区别,Replication Controller目前几乎已经不在生产环境中使用,ReplicaSet也很少单独被使用,都是使用更高级的资源Deployment、DaemonSet、StatefulSet进行管理Pod。

Deployment

Deployment概念

用于部署无状态的服务,这个最常用的控制器。一般用于管理维护企业内部无状态的微服务,比如configserver、zuul、springboot。他可以管理多个副本的Pod实现无缝迁移、自动扩容缩容、自动灾难恢复、一键回滚等功能。

虽然ReplicaSet可以确保在任何给定时间运行的Pod副本达到指定的数量,但是Deployment(部署)是一个更高级的概念,它管理ReplicaSet并为Pod和ReplicaSet提供声明性更新以及许多其他有用的功能,所以建议在实际使用中,使用Deployment代替ReplicaSet。

如果在Deployment对象中描述了所需的状态,Deployment控制器就会以可控制的速率将实际状态更改为期望状态。也可以在Deployment中创建新的ReplicaSet,或者删除现有的Deployment并使用新的Deployment部署所用的资源。

创建一个Deployment

1
2
3
4
5
6
7
8
# 手动创建:
kubectl create deployment nginx --image=nginx:1.15.2

# 查询:
kubectl get deployment

# 编辑:
kubectl edit deployment nginx

从文件创建:

创建一个Deployment文件,并命名为dc-nginx.yaml,用于部署三个Nginx Pod:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

示例解析:

  • nginx-deployment:Deployment的名称。

  • replicas: 创建Pod的副本数。

  • selector:定义Deployment如何找到要管理的Pod,与template的label(标签)对应。

  • template字段包含以下字段:

    • app: nginx使用label(标签)标记Pod

    • spec:表示Pod运行一个名字为nginx的容器。

    • image:运行此Pod使用的镜像

    • Port:容器用于发送和接收流量的端口

使用kubectl create创建此Deployment:

1
2
[root@k8s-master01 2.2.8.1]# kubectl create -f dc-nginx.yaml 
deployment.apps/nginx-deployment created

使用kubectl get或者kubectl describe查看此Deployment:

1
2
3
[root@k8s-master01 2.2.8.1]# kubectl get deploy
NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   3         3         3            1           60s

其中:

  • NAME:集群中Deployment的名称。

  • DESIRED:应用程序副本数。

  • CURRENT:当前正在运行的副本数。

  • UP-TO-DATE:显示已达到期望状态的被更新的副本数。

  • AVAILABLE:显示用户可以使用的应用程序副本数,当前为1,因为部分Pod仍在创建过程中。

  • AGE:显示应用程序运行的时间。

查看此时Deployment rollout的状态:

1
2
[root@k8s-master01 2.2.8.1]# kubectl rollout status deployment/nginx-deployment
deployment "nginx-deployment" successfully rolled out

再次查看此Deployment:

1
2
3
[root@k8s-master01 2.2.8.1]# kubectl get deploy
NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   3         3         3            3           11m

查看此Deployment创建的ReplicaSet:

1
2
3
[root@k8s-master01 2.2.8.1]# kubectl get rs
NAME                          DESIRED   CURRENT   READY   AGE
nginx-deployment-5c689d88bb   3         3         3       12m

查看此Deployment创建的Pod:

1
2
3
4
5
[root@k8s-master01 2.2.8.1]# kubectl get pods --show-labels
NAME                                READY   STATUS    RESTARTS   AGE   LABELS
nginx-deployment-5c689d88bb-6b95k   1/1     Running   0          13m   app=nginx,pod-template-hash=5c689d88bb
nginx-deployment-5c689d88bb-9z5z2   1/1     Running   0          13m   app=nginx,pod-template-hash=5c689d88bb
nginx-deployment-5c689d88bb-jc8hr   1/1     Running   0          13m   app=nginx,pod-template-hash=5c689d88bb

更新Deployment

一般对应用程序升级或者版本迭代时,会通过Deployment对Pod进行滚动更新。

假如更新Nginx Pod的image使用nginx:1.9.1:

1
2
[root@k8s-master01 2.2.8.1]# kubectl set image deployment nginx-deployment nginx=nginx:1.9.1 --record
deployment.extensions/nginx-deployment image updated

当然也可以直接编辑Deployment,效果相同:

1
2
[root@k8s-master01 2.2.8.1]#	kubectl edit deployment.v1.apps/nginx-deployment
deployment.apps/nginx-deployment edited

使用kubectl rollout status查看更新状态:

1
2
3
4
5
6
7
8
[root@k8s-master01 2.2.8.1]# kubectl rollout status deployment.v1.apps/nginx-deployment
Waiting for deployment "nginx-deployment" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "nginx-deployment" rollout to finish: 1 old replicas are pending termination...
deployment "nginx-deployment" successfully rolled out

查看ReplicaSet:

1
2
3
4
[root@k8s-master01 2.2.8.1]# kubectl get rs 
NAME                          DESIRED   CURRENT   READY   AGE
nginx-deployment-5c689d88bb   0         0         0       34m
nginx-deployment-6987cdb55b   3         3         3       5m14s

通过describe查看Deployment的详细信息:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[root@k8s-master01 2.2.8.1]# kubectl describe deploy nginx-deployment
Name:                   nginx-deployment
Namespace:              default
CreationTimestamp:      Thu, 24 Jan 2019 15:15:15 +0800
Labels:                 app=nginx
Annotations:            deployment.kubernetes.io/revision: 2
                        kubernetes.io/change-cause: kubectl set image deployment nginx-deployment nginx=nginx:1.9.1 --record=true
Selector:               app=nginx
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=nginx
  Containers:
   nginx:
    Image:        nginx:1.9.1
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   nginx-deployment-6987cdb55b (3/3 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  36m    deployment-controller  Scaled up replica set nginx-deployment-5c689d88bb to 3
  Normal  ScalingReplicaSet  7m16s  deployment-controller  Scaled up replica set nginx-deployment-6987cdb55b to 1
  Normal  ScalingReplicaSet  5m18s  deployment-controller  Scaled down replica set nginx-deployment-5c689d88bb to 2
  Normal  ScalingReplicaSet  5m18s  deployment-controller  Scaled up replica set nginx-deployment-6987cdb55b to 2
  Normal  ScalingReplicaSet  4m35s  deployment-controller  Scaled down replica set nginx-deployment-5c689d88bb to 1
  Normal  ScalingReplicaSet  4m34s  deployment-controller  Scaled up replica set nginx-deployment-6987cdb55b to 3
  Normal  ScalingReplicaSet  3m30s  deployment-controller  Scaled down replica set nginx-deployment-5c689d88bb to 0

在describe中可以看出,第一次创建时,它创建了一个名为nginx-deployment-5c689d88bb的ReplicaSet,并直接将其扩展为3个副本。更新部署时,它创建了一个新的ReplicaSet,命名为nginx-deployment-6987cdb55b,并将其副本数扩展为1,然后将旧的ReplicaSet缩小为2,这样至少可以有2个Pod可用,最多创建了4个Pod。以此类推,使用相同的滚动更新策略向上和向下扩展新旧ReplicaSet,最终新的ReplicaSet可以拥有3个副本,并将旧的ReplicaSet缩小为0。

回滚Deployment

当新版本不稳定时,可以对其进行回滚操作,默认情况下,所有Deployment的rollout历史都保留在系统中,可以随时回滚。

假设我们又进行了几次更新:

1
2
[root@k8s-master01 2.2.8.1]# kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v1 --record
[root@k8s-master01 2.2.8.1]# kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v2 --record

使用kubectl rollout history查看部署历史:

1
2
3
4
5
6
7
[root@k8s-master01 2.2.8.1]# kubectl rollout history deployment/nginx-deployment
deployment.extensions/nginx-deployment 
REVISION  CHANGE-CAUSE
1         <none>
2         kubectl set image deployment nginx-deployment nginx=nginx:1.9.1 --record=true
3         kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v1 --record=true
4         kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v2 --record=true

查看Deployment某次更新的详细信息,使用–revision指定版本号:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[root@k8s-master01 2.2.8.1]# kubectl rollout history deployment.v1.apps/nginx-deployment --revision=3
deployment.apps/nginx-deployment with revision #3
Pod Template:
  Labels:	app=nginx
	pod-template-hash=645959bf6b
  Annotations:	kubernetes.io/change-cause: kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v1 --record=true
  Containers:
   nginx:
    Image:	dotbalo/canary:v1
    Port:	80/TCP
    Host Port:	0/TCP
    Environment:	<none>
    Mounts:	<none>
  Volumes:	<none>

使用kubectl rollout undo回滚到上一个版本:

1
2
[root@k8s-master01 2.2.8.1]# kubectl rollout undo deployment.v1.apps/nginx-deployment
deployment.apps/nginx-deployment

再次查看更新历史,发现REVISION5回到了canary:v1:

1
2
3
4
5
6
deployment.extensions/nginx-deployment 
REVISION  CHANGE-CAUSE
1         <none>
2         kubectl set image deployment nginx-deployment nginx=nginx:1.9.1 --record=true
4         kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v2 --record=true
5         kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v1 --record=true

使用–to-revision参数回到指定版本:

1
2
[root@k8s-master01 2.2.8.1]# kubectl rollout undo deployment/nginx-deployment --to-revision=2
deployment.extensions/nginx-deployment

扩展Deployment

当公司访问量变大,三个Pod已无法支撑业务时,可以对其进行扩展。

使用kubectl scale动态调整Pod的副本数,比如增加Pod为5个:

1
2
[root@k8s-master01 2.2.8.1]# kubectl scale deployment.v1.apps/nginx-deployment --replicas=5
deployment.apps/nginx-deployment scaled

查看Pod,此时Pod已经变成了5个:

1
2
3
4
5
6
7
[root@k8s-master01 2.2.8.1]# kubectl get po
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-5f89547d9c-5r56b   1/1     Running   0          90s
nginx-deployment-5f89547d9c-htmn7   1/1     Running   0          25s
nginx-deployment-5f89547d9c-nwxs2   1/1     Running   0          99s
nginx-deployment-5f89547d9c-rpwlg   1/1     Running   0          25s
nginx-deployment-5f89547d9c-vlr5p   1/1     Running   0          95s

暂停和恢复Deployment更新

Deployment支持暂停更新,用于对Deployment进行多次修改操作。

使用kubectl rollout pause暂停Deployment更新:

1
2
[root@k8s-master01 2.2.8.1]# kubectl rollout pause deployment/nginx-deployment
deployment.extensions/nginx-deployment paused

然后对Deployment进行相关更新操作,比如更新镜像,然后对其资源进行限制:

1
2
3
4
[root@k8s-master01 2.2.8.1]# kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1
deployment.apps/nginx-deployment image updated
[root@k8s-master01 2.2.8.1]# kubectl set resources deployment.v1.apps/nginx-deployment -c=nginx --limits=cpu=200m,memory=512Mi
deployment.apps/nginx-deployment resource requirements updated

通过rollout history可以看到没有新的更新:

1
2
3
4
5
6
7
[root@k8s-master01 2.2.8.1]# kubectl rollout history deployment.v1.apps/nginx-deployment
deployment.apps/nginx-deployment 
REVISION  CHANGE-CAUSE
1         <none>
5         kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v1 --record=true
7         kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v2 --record=true
8         kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v2 --record=true

使用kubectl rollout resume恢复Deployment更新:

1
2
[root@k8s-master01 2.2.8.1]# kubectl rollout resume deployment.v1.apps/nginx-deployment
deployment.apps/nginx-deployment resumed

可以查看到恢复更新的Deployment创建了一个新的RS(复制集):

1
2
3
[root@k8s-master01 2.2.8.1]# kubectl get rs
NAME                          DESIRED   CURRENT   READY   AGE
nginx-deployment-57895845b8   5         5         4       11s

可以查看Deployment的image(镜像)已经变为nginx:1.9.1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[root@k8s-master01 2.2.8.1]# kubectl describe deploy nginx-deployment
Name:                   nginx-deployment
Namespace:              default
CreationTimestamp:      Thu, 24 Jan 2019 15:15:15 +0800
Labels:                 app=nginx
Annotations:            deployment.kubernetes.io/revision: 9
                        kubernetes.io/change-cause: kubectl set image deployment nginx-deployment nginx=dotbalo/canary:v2 --record=true
Selector:               app=nginx
Replicas:               5 desired | 5 updated | 5 total | 5 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=nginx
  Containers:
   nginx:
    Image:      nginx:1.9.1
    Port:       80/TCP
    Host Port:  0/TCP

更新Deployment的注意事项

清理策略

  • 在默认情况下,revision保留10个旧的ReplicaSet,其余的将在后台进行垃圾回收,可以在.spec.revisionHistoryLimit设置保留ReplicaSet的个数。当设置为0时,不保留历史记录。

更新策略

  • .spec.strategy.type==Recreate,表示重建,先删掉旧的Pod再创建新的Pod。

  • .spec.strategy.type==RollingUpdate,表示滚动更新,可以指定maxUnavailable和maxSurge来控制滚动更新过程。

    • .spec.strategy.rollingUpdate.maxUnavailable,指定在回滚更新时最大不可用的Pod数量,可选字段,默认为25%,可以设置为数字或百分比,如果maxSurge为0,则该值不能为0。
    • .spec.strategy.rollingUpdate.maxSurge可以超过期望值的最大Pod数,可选字段,默认为25%,可以设置成数字或百分比,如果maxUnavailable为0,则该值不能为0。

StatefulSet

StatefulSet(有状态集,缩写为sts)常用于部署有状态的且需要有序启动的应用程序,比如在进行SpringCloud项目容器化时,Eureka的部署是比较适合用StatefulSet部署方式的,可以给每个Eureka实例创建一个唯一且固定的标识符,并且每个Eureka实例无需配置多余的Service,其余Spring Boot应用可以直接通过Eureka的Headless Service即可进行注册。

  • Eureka的statefulset的资源名称是eureka,eureka-0 eureka-1 eureka-2

  • Service:headless service,没有ClusterIP eureka-svc

  • Eureka-0.eureka-svc.NAMESPACE_NAME eureka-1.eureka-svc …

StatefulSet的基本概念

StatefulSet主要用于管理有状态应用程序的工作负载API对象。比如在生产环境中,可以部署ElasticSearch集群、MongoDB集群或者需要持久化的RabbitMQ集群、Redis集群、Kafka集群和ZooKeeper集群等。

和Deployment类似,一个StatefulSet也同样管理着基于相同容器规范的Pod。不同的是,StatefulSet为每个Pod维护了一个粘性标识。这些Pod是根据相同的规范创建的,但是不可互换,每个Pod都有一个持久的标识符,在重新调度时也会保留,一般格式为StatefulSetName-Number。比如定义一个名字是Redis-Sentinel的StatefulSet,指定创建三个Pod,那么创建出来的Pod名字就为Redis-Sentinel-0、Redis-Sentinel-1、Redis-Sentinel-2。而StatefulSet创建的Pod一般使用Headless Service(无头服务)进行通信,和普通的Service的区别在于Headless Service没有ClusterIP,它使用的是Endpoint进行互相通信,Headless一般的格式为:

statefulSetName-{0..N-1}.serviceName.namespace.svc.cluster.local

说明:

  • serviceName为Headless Service的名字,创建StatefulSet时,必须指定Headless Service名称;

  • 0..N-1为Pod所在的序号,从0开始到N-1;

  • statefulSetName为StatefulSet的名字;

  • namespace为服务所在的命名空间;

  • .cluster.local为Cluster Domain(集群域)。

假如公司某个项目需要在Kubernetes中部署一个主从模式的Redis,此时使用StatefulSet部署就极为合适,因为StatefulSet启动时,只有当前一个容器完全启动时,后一个容器才会被调度,并且每个容器的标识符是固定的,那么就可以通过标识符来断定当前Pod的角色。

比如用一个名为redis-ms的StatefulSet部署主从架构的Redis,第一个容器启动时,它的标识符为redis-ms-0,并且Pod内主机名也为redis-ms-0,此时就可以根据主机名来判断,当主机名为redis-ms-0的容器作为Redis的主节点,其余从节点,那么Slave连接Master主机配置就可以使用不会更改的Master的Headless Service,此时Redis从节点(Slave)配置文件如下:

1
2
3
4
5
6
port 6379
slaveof redis-ms-0.redis-ms.public-service.svc.cluster.local 6379
tcp-backlog 511
timeout 0
tcp-keepalive 0
……

其中redis-ms-0.redis-ms.public-service.svc.cluster.local是Redis Master的Headless Service,在同一命名空间下只需要写redis-ms-0.redis-ms即可,后面的public-service.svc.cluster.local可以省略

StatefulSet组件

定义一个简单的StatefulSet的示例如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "nginx"
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
          name: web

其中,

  • kind: Service定义了一个名字为Nginx的Headless Service,创建的Service格式为nginx-0.nginx.default.svc.cluster.local,其他的类似,因为没有指定Namespace(命名空间),所以默认部署在default。

  • kind: StatefulSet定义了一个名字为web的StatefulSet,replicas表示部署Pod的副本数,本实例为2。

创建StatefulSet

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[root@k8s-master01 2.2.7]# kubectl create -f sts-web.yaml 
service/nginx created
statefulset.apps/web created
[root@k8s-master01 2.2.7]# kubectl get sts
NAME   DESIRED   CURRENT   AGE
web    2         2         12s
[root@k8s-master01 2.2.7]# kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   7d2h
nginx        ClusterIP   None         <none>        80/TCP    16s
[root@k8s-master01 2.2.7]# kubectl get po -l app=nginx
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          2m5s
web-1   1/1     Running   0          115s

StatefulSet扩容和缩容

和Deployment类似,可以通过更新replicas字段扩容/缩容StatefulSet,也可以使用kubectlscale或者kubectlpatch来扩容/缩容一个StatefulSet。

  1. 扩容

将上述创建的sts副本增加到5个(扩容之前必须保证有创建完成的静态PV,动态PV和emptyDir):

1
2
[root@k8s-master01 2.2.7]# kubectl scale sts web --replicas=5
statefulset.apps/web scaled

查看Pod状态:

1
2
3
4
5
6
7
[root@k8s-master01 2.2.7]# kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          2m58s
web-1   1/1     Running   0          2m48s
web-2   1/1     Running   0          116s
web-3   1/1     Running   0          79s
web-4   1/1     Running   0          53s

也可使用以下命令动态查看:

在一个终端动态查看:

1
2
3
4
5
6
7
[root@k8s-master01 2.2.7]# kubectl get pods -w -l app=nginx
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          4m37s
web-1   1/1     Running   0          4m27s
web-2   1/1     Running   0          3m35s
web-3   1/1     Running   0          2m58s
web-4   1/1     Running   0          2m32s

在另一个终端将副本数改为3:

1
2
[root@k8s-master01 ~]# kubectl patch sts web -p '{"spec":{"replicas":3}}'
statefulset.apps/web patched

此时可以看到第一个终端显示web-4和web-3的Pod正在被删除(或终止):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
[root@k8s-master01 2.2.7]# kubectl get pods -w -l app=nginx
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          4m37s
web-1   1/1     Running   0          4m27s
web-2   1/1     Running   0          3m35s
web-3   1/1     Running   0          2m58s
web-4   1/1     Running   0          2m32s
web-0   1/1   Running   0     5m8s
web-0   1/1   Running   0     5m11s
web-4   1/1   Terminating   0     3m36s
web-4   0/1   Terminating   0     3m38s
web-4   0/1   Terminating   0     3m47s
web-4   0/1   Terminating   0     3m47s
web-3   1/1   Terminating   0     4m13s
web-3   0/1   Terminating   0     4m14s
web-3   0/1   Terminating   0     4m22s
web-3   0/1   Terminating   0     4m22s

更新策略

  1. On Delete策略

OnDelete更新策略实现了传统(1.7版本之前)的行为,它也是默认的更新策略。当我们选择这个更新策略并修改StatefulSet的.spec.template字段时,StatefulSet控制器不会自动更新Pod,我们必须手动删除Pod才能使控制器创建新的Pod。

  1. RollingUpdate策略

RollingUpdate(滚动更新)更新策略会更新一个StatefulSet中所有的Pod,采用与序号索引相反的顺序进行滚动更新。

比如Patch一个名称为web的StatefulSet来执行RollingUpdate更新:

1
2
[root@k8s-master01 2.2.7]# kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate"}}}'
statefulset.apps/web patched

查看更改后的StatefulSet:

1
2
3
[root@k8s-master01 2.2.7]# kubectl get sts web -o yaml | grep -A 1 "updateStrategy"
  updateStrategy:
    type: RollingUpdate

然后改变容器的镜像进行滚动更新:

1
2
[root@k8s-master01 2.2.7]# kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"dotbalo/canary:v1"}]'
statefulset.apps/web patched

如上所述,StatefulSet里的Pod采用和序号相反的顺序更新。在更新下一个Pod前,StatefulSet控制器会终止每一个Pod并等待它们变成Running和Ready状态。在当前顺序变成Running和Ready状态之前,StatefulSet控制器不会更新下一个Pod,但它仍然会重建任何在更新过程中发生故障的Pod,使用它们当前的版本。已经接收到请求的Pod将会被恢复为更新的版本,没有收到请求的Pod则会被恢复为之前的版本。

在更新过程中可以使用 kubectl rollout status sts/<name> 来查看滚动更新的状态:

1
2
3
4
5
6
7
8
9
[root@k8s-master01 2.2.7]# kubectl rollout status sts/web
Waiting for 1 pods to be ready...
waiting for statefulset rolling update to complete 1 pods at revision web-56b5798f76...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
waiting for statefulset rolling update to complete 2 pods at revision web-56b5798f76...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 3 pods at revision web-56b5798f76...

查看更新后的镜像:

1
2
3
4
[root@k8s-master01 2.2.7]# for p in 0 1 2; do kubectl get po web-$p --template '{{range $i, $c := .spec.containers}}{{$c.image}}{{end}}'; echo; done
dotbalo/canary:v1
dotbalo/canary:v1
dotbalo/canary:v1
  1. 分段更新

StatefulSet可以使用RollingUpdate更新策略的partition参数来分段更新一个StatefulSet。分段更新将会使StatefulSet中其余的所有Pod(序号小于分区)保持当前版本,只更新序号大于等于分区的Pod,利用此特性可以简单实现金丝雀发布(灰度发布)或者分阶段推出新功能等。注:金丝雀发布是指在黑与白之间能够平滑过渡的一种发布方式。

比如我们定义一个分区"partition":3,可以使用patch直接对StatefulSet进行设置:

1
2
# kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":3}}}}'
statefulset "web" patched

然后再次patch改变容器的镜像:

1
2
# kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"k8s.gcr.io/nginx-slim:0.7"}]'
statefulset "web" patched

删除Pod触发更新:

1
2
kubectl delete po web-2
pod "web-2" deleted

此时,因为Pod web-2的序号小于分区3,所以Pod不会被更新,还是会使用以前的容器恢复Pod。

将分区改为2,此时会自动更新web-2(因为之前更改了更新策略),但是不会更新web-0和web-1:

1
2
# kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
statefulset "web" patched

按照上述方式,可以实现分阶段更新,类似于灰度/金丝雀发布。查看最终的结果如下:

1
2
3
4
[root@k8s-master01 2.2.7]# for p in 0 1 2; do kubectl get po web-$p --template '{{range $i, $c := .spec.containers}}{{$c.image}}{{end}}'; echo; done
dotbalo/canary:v1
dotbalo/canary:v1
dotbalo/canary:v2

删除StatefulSet

删除StatefulSet有两种方式,即级联删除和非级联删除。使用非级联方式删除StatefulSet时,StatefulSet的Pod不会被删除;使用级联删除时,StatefulSet和它的Pod都会被删除。

  1. 非级联删除

使用kubectldeletestsxxx删除StatefulSet时,只需提供–cascade=false参数,就会采用非级联删除,此时删除StatefulSet不会删除它的Pod:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[root@k8s-master01 2.2.7]# kubectl get po 
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          16m
web-1   1/1     Running   0          16m
web-2   1/1     Running   0          11m
[root@k8s-master01 2.2.7]# kubectl delete statefulset web --cascade=false
statefulset.apps "web" deleted
[root@k8s-master01 2.2.7]# kubectl get sts
No resources found.
[root@k8s-master01 2.2.7]# kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          16m
web-1   1/1     Running   0          16m
web-2   1/1     Running   0          11m

由于此时删除了StatefulSet,因此单独删除Pod时,不会被重建:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[root@k8s-master01 2.2.7]# kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          16m
web-1   1/1     Running   0          16m
web-2   1/1     Running   0          11m
[root@k8s-master01 2.2.7]# kubectl delete po web-0
pod "web-0" deleted
[root@k8s-master01 2.2.7]# kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
web-1   1/1     Running   0          18m
web-2   1/1     Running   0          12m

当再次创建此StatefulSet时,web-0会被重新创建,web-1由于已经存在而不会被再次创建,因为最初此StatefulSet的replicas是2,所以web-2会被删除,如下(忽略AlreadyExists错误):

1
2
3
4
5
6
7
[root@k8s-master01 2.2.7]# kubectl create -f sts-web.yaml 
statefulset.apps/web created
Error from server (AlreadyExists): error when creating "sts-web.yaml": services "nginx" already exists
[root@k8s-master01 2.2.7]# kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          32s
web-1   1/1     Running   0          19m
  1. 级联删除

省略–cascade=false参数即为级联删除:

1
2
3
4
[root@k8s-master01 2.2.7]# kubectl delete statefulset web
statefulset.apps "web" deleted
[root@k8s-master01 2.2.7]# kubectl get po
No resources found.

也可以使用-f参数直接删除StatefulSet和Service(此文件将sts和svc写在了一起):

1
2
3
4
[root@k8s-master01 2.2.7]# kubectl delete -f sts-web.yaml 
service "nginx" deleted
Error from server (NotFound): error when deleting "sts-web.yaml": statefulsets.apps "web" not found
[root@k8s-master01 2.2.7]#

DaemonSet

DaemonSet(守护进程集)和守护进程类似,它在符合匹配条件的节点上均部署一个Pod。

DaemonSet概念

DaemonSet确保全部(或者某些)节点上运行一个Pod副本。当有新节点加入集群时,也会为它们新增一个Pod。当节点从集群中移除时,这些Pod也会被回收,删除DaemonSet将会删除它创建的所有Pod。

使用DaemonSet的一些典型用法:

  • 运行集群存储daemon(守护进程),例如在每个节点上运行Glusterd、Ceph等。

  • 在每个节点运行日志收集daemon,例如Fluentd、Logstash。

  • 在每个节点运行监控daemon,比如Prometheus Node Exporter、Collectd、Datadog代理、New Relic代理或 Ganglia gmond。

编写DaemonSet规范

创建一个DaemonSet的内容大致如下,比如创建一个fluentd的DaemonSet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-es-v2.0.4
  namespace: logging
  labels:
    k8s-app: fluentd-es
    version: v2.0.4
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-es
      version: v2.0.4
  template:
    metadata:
      labels:
        k8s-app: fluentd-es
        kubernetes.io/cluster-service: "true"
        version: v2.0.4
      # This annotation ensures that fluentd does not get evicted if the node
      # supports critical pod annotation based priority scheme.
      # Note that this does not guarantee admission on the nodes (#40573).
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
        seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
    spec:
      serviceAccountName: fluentd-es
      containers:
      - name: fluentd-es
        image: k8s.gcr.io/fluentd-elasticsearch:v2.0.4
        env:
        - name: FLUENTD_ARGS
          value: --no-supervisor -q
        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: config-volume
          mountPath: /etc/fluent/config.d
      nodeSelector:
        beta.kubernetes.io/fluentd-ds-ready: "true"
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: config-volume
        configMap:
          name: fluentd-es-config-v0.1.4
  1. 必需字段

和其他所有Kubernetes配置一样,DaemonSet需要apiVersion、kind和metadata字段,同时也需要一个.spec配置段。

  1. Pod模板

.spec唯一需要的字段是.spec.template。.spec.template是一个Pod模板,它与Pod具有相同的配置方式,但它不具有apiVersion和kind字段。

除了Pod必需的字段外,在DaemonSet中的Pod模板必须指定合理的标签。

在DaemonSet中的Pod模板必须具有一个RestartPolicy,默认为Always。

  1. Pod Selector

.spec.selector字段表示Pod Selector,它与其他资源的.spec.selector的作用相同。

.spec.selector表示一个对象,它由如下两个字段组成:

  • matchLabels,与ReplicationController的.spec.selector的作用相同,用于匹配符合条件的Pod。

  • matchExpressions,允许构建更加复杂的Selector,可以通过指定key、value列表以及与key和value列表相关的操作符。

如果上述两个字段都指定时,结果表示的是AND关系(逻辑与的关系)。

.spec.selector必须与.spec.template.metadata.labels相匹配。如果没有指定,默认是等价的,如果它们的配置不匹配,则会被API拒绝。

(4)指定节点部署Pod

如果指定了.spec.template.spec.nodeSelector,DaemonSet Controller将在与Node Selector(节点选择器)匹配的节点上创建Pod,比如部署在磁盘类型为ssd的节点上(需要提前给节点定义标签Label):

1
2
3
4
5
6
containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd

提示:Node Selector同样适用于其他Controller。

创建DaemonSet

在生产环境中,公司业务的应用程序一般无须使用DaemonSet部署,一般情况下只有像Fluentd(日志收集)、Ingress(集群服务入口)、Calico(集群网络组件)、Node-Exporter(监控数据采集)等才需要使用DaemonSet部署到每个节点。本节只演示DaemonSet的使用。

比如创建一个nginx ingress(文件地址:https://github.com/dotbalo/k8s/blob/master/nginx-ingress/ingress.yaml):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# kubectl create -f nginx-ds.yaml 
namespace/ingress-nginx created
configmap/nginx-configuration created
configmap/tcp-services created
configmap/udp-services created
serviceaccount/nginx-ingress-serviceaccount created
clusterrole.rbac.authorization.k8s.io/nginx-ingress-clusterrole created
role.rbac.authorization.k8s.io/nginx-ingress-role created
rolebinding.rbac.authorization.k8s.io/nginx-ingress-role-nisa-binding created
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-clusterrole-nisa-binding created
daemonset.extensions/nginx-ingress-controller created

此时会在每个节点创建一个Pod:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# kubectl get po -n ingress-nginx
NAME                                 READY   STATUS    RESTARTS   AGE
nginx-ingress-controller-fjkf2   1/1     Running   0           44s
nginx-ingress-controller-gfmcv   1/1     Running   0           44s
nginx-ingress-controller-j89qc   1/1     Running   0           44s
nginx-ingress-controller-sqsk2   1/1     Running   0           44s
nginx-ingress-controller-tgdt6   1/1     Running   0           44s
[root@k8s-master01 2.2.8]# kubectl get po -n ingress-nginx -o wide
NAME                                 READY   STATUS    RESTARTS   AGE
IP              NODE           NOMINATED NODE
nginx-ingress-controller-fjkf2   1/1     Running   0          50s   192.168.20.30   k8s-node01     <none>
nginx-ingress-controller-gfmcv   1/1     Running   0          50s   192.168.20.21   k8s-master02   <none>
nginx-ingress-controller-j89qc   1/1     Running   0          50s   192.168.20.22   k8s-master03   <none>
nginx-ingress-controller-sqsk2   1/1     Running   0          50s   192.168.20.31   k8s-node02     <none>
nginx-ingress-controller-tgdt6   1/1     Running   0          50s   192.168.20.20   k8s-master01   <none>

更新和回滚DaemonSet

如果修改了节点标签(Label),DaemonSet将立刻向新匹配上的节点添加Pod,同时删除不能匹配的节点上的Pod。

在Kubernetes 1.6以后的版本中,可以在DaemonSet上执行滚动更新,未来的Kubernetes版本将支持节点的可控更新。

DaemonSet滚动更新可参考:https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/。

DaemonSet更新策略和StatefulSet类似,也有OnDelete和RollingUpdate两种方式。

查看上一节创建的DaemonSet更新方式:

1
2
# kubectl get ds/nginx-ds -o go-template='{{.spec.updateStrategy.type}}{{"\n"}}'
RollingUpdate

提示:如果是其他DaemonSet,请确保更新策略是RollingUpdate(滚动更新)。

  1. 命令式更新
1
2
kubectl edit ds/<daemonset-name>
kubectl patch ds/<daemonset-name> -p=<strategic-merge-patch>
  1. 更新镜像
1
kubectl set image ds/<daemonset-name><container-name>=<container-new-image>--record=true
  1. 查看更新状态
1
kubectl rollout status ds/<daemonset-name>
  1. 列出所有修订版本
1
kubectl rollout history daemonset <daemonset-name>
  1. 回滚到指定revision
1
kubectl rollout undo daemonset <daemonset-name> --to-revision=<revision>

DaemonSet的更新和回滚与Deployment类似。