<bdo id="4g88a"><xmp id="4g88a">
  • <legend id="4g88a"><code id="4g88a"></code></legend>

    Kubernetes集群部署Node Feature Discovery組件用于檢測集群節點特性

     1、概述

    Node Feature Discovery(NFD)是由Intel創建的項目,能夠幫助Kubernetes集群更智能地管理節點資源。它通過檢測每個節點的特性能力(例如CPU型號、GPU型號、內存大小等)并將這些能力以標簽的形式發送到Kubernetes集群的API服務器(kube-apiserver)。然后,通過kube-apiserver修改節點的標簽。這些標簽可以幫助調度器(kube-scheduler)更智能地選擇最適合特定工作負載的節點來運行Pod。

    Github:https://github.com/kubernetes-sigs/node-feature-discovery
    Docs:https://kubernetes-sigs.github.io/node-feature-discovery/master/get-started/index.html

    2、組件架構

    NFD 細分為 NFD-Master 和 NFD-Worker 兩個組件:

    NFD-Master:是一個負責與 kubernetes API Server 通信的Deployment Pod,它從 NFD-Worker 接收節點特性并相應地修改 Node 資源對象(標簽、注解)。

    NFD-Worker:是一個負責對 Node 的特性能力進行檢測的 Daemon Pod,然后它將信息傳遞給 NFD-Master,NFD-Worker 應該在每個 Node 上運行。

    可以檢測發現的硬件特征源(feature sources)清單包括:

    • CPU
    • IOMMU
    • Kernel
    • Memory
    • Network
    • PCI
    • Storage
    • System
    • USB
    • Custom (rule-based custom features)
    • Local (hooks for user-specific features)

     3、組件安裝

    (1)安裝前查看集群節點狀態

    [root@master-10 ~]# kubectl get nodes
    NAME                  STATUS   ROLES                         AGE   VERSION
    master-10.20.31.105   Ready    control-plane,master,worker   31h   v1.21.5
    

    節點詳細信息,主要關注標簽、注解。

    [root@master-10 ~]# kubectl describe nodes master-10.20.31.105 
    Name:               master-10.20.31.105
    Roles:              control-plane,master,worker
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        kubernetes.io/arch=amd64
                        kubernetes.io/hostname=master-10.20.31.105
                        kubernetes.io/os=linux
                        node-role.kubernetes.io/control-plane=
                        node-role.kubernetes.io/master=
                        node-role.kubernetes.io/worker=
                        node.kubernetes.io/exclude-from-external-load-balancers=
    Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"c6:fb:4b:8a:bb:12"}
                        flannel.alpha.coreos.com/backend-type: vxlan
                        flannel.alpha.coreos.com/kube-subnet-manager: true
                        flannel.alpha.coreos.com/public-ip: 10.20.31.105
                        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                        node.alpha.kubernetes.io/ttl: 0
                        volumes.kubernetes.io/controller-managed-attach-detach: true
    CreationTimestamp:  Tue, 12 Mar 2024 21:01:31 -0400
    Taints:             <none>
    ........
    

     (2)組件安裝

    [root@master-10 opt]# kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.14.2
    namespace/node-feature-discovery created
    customresourcedefinition.apiextensions.k8s.io/nodefeaturerules.nfd.k8s-sigs.io created
    customresourcedefinition.apiextensions.k8s.io/nodefeatures.nfd.k8s-sigs.io created
    serviceaccount/nfd-master created
    serviceaccount/nfd-worker created
    role.rbac.authorization.k8s.io/nfd-worker created
    clusterrole.rbac.authorization.k8s.io/nfd-master created
    rolebinding.rbac.authorization.k8s.io/nfd-worker created
    clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
    configmap/nfd-master-conf created
    configmap/nfd-worker-conf created
    service/nfd-master created
    deployment.apps/nfd-master created
    daemonset.apps/nfd-worker created
    

    (3)查看組件狀態

    [root@master-10 opt]# kubectl get pods -n=node-feature-discovery 
    NAME                          READY   STATUS    RESTARTS   AGE
    nfd-master-5c4684f5cb-hvjjb   1/1     Running   0          4m11s
    nfd-worker-cpwx6              1/1     Running   0          4m11s
    

    (4)查看組件日志

    可以看到nfd-worker組件默認每隔一分鐘檢測一次節點特性。

    [root@master-10 ~]# kubectl logs -f -n=node-feature-discovery nfd-worker-rlf5t 
    I0314 06:30:32.003264       1 main.go:66] "-server is deprecated, will be removed in a future release along with the deprecated gRPC API"
    I0314 06:30:32.003372       1 nfd-worker.go:219] "Node Feature Discovery Worker" version="v0.14.2" nodeName="master-10.20.31.105" namespace="node-feature-discovery"
    I0314 06:30:32.003589       1 nfd-worker.go:520] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-worker.conf"
    I0314 06:30:32.004500       1 nfd-worker.go:552] "configuration successfully updated" configuration={"Core":{"Klog":{},"LabelWhiteList":{},"NoPublish":false,"FeatureSources":["all"],"Sources":null,"LabelSources":["all"],"SleepInterval":{"Duration":60000000000}},"Sources":{"cpu":{"cpuid":{"attributeBlacklist":["BMI1","BMI2","CLMUL","CMOV","CX16","ERMS","F16C","HTT","LZCNT","MMX","MMXEXT","NX","POPCNT","RDRAND","RDSEED","RDTSCP","SGX","SGXLC","SSE","SSE2","SSE3","SSE4","SSE42","SSSE3","TDX_GUEST"]}},"custom":[],"fake":{"labels":{"fakefeature1":"true","fakefeature2":"true","fakefeature3":"true"},"flagFeatures":["flag_1","flag_2","flag_3"],"attributeFeatures":{"attr_1":"true","attr_2":"false","attr_3":"10"},"instanceFeatures":[{"attr_1":"true","attr_2":"false","attr_3":"10","attr_4":"foobar","name":"instance_1"},{"attr_1":"true","attr_2":"true","attr_3":"100","name":"instance_2"},{"name":"instance_3"}]},"kernel":{"KconfigFile":"","configOpts":["NO_HZ","NO_HZ_IDLE","NO_HZ_FULL","PREEMPT"]},"local":{},"pci":{"deviceClassWhitelist":["03","0b40","12"],"deviceLabelFields":["class","vendor"]},"usb":{"deviceClassWhitelist":["0e","ef","fe","ff"],"deviceLabelFields":["class","vendor","device"]}}}
    I0314 06:30:32.004796       1 metrics.go:70] "metrics server starting" port=8081
    I0314 06:30:32.019135       1 nfd-worker.go:562] "starting feature discovery..."
    I0314 06:30:32.019364       1 nfd-worker.go:577] "feature discovery completed"
    I0314 06:31:32.021520       1 nfd-worker.go:562] "starting feature discovery..."
    I0314 06:31:32.021695       1 nfd-worker.go:577] "feature discovery completed"
    I0314 06:32:32.027970       1 nfd-worker.go:562] "starting feature discovery..."
    I0314 06:32:32.028141       1 nfd-worker.go:577] "feature discovery completed"
    

    可以看到nfd-master組件啟動后默認第一分鐘相應地修改 Node 資源對象(標簽、注解),之后是每隔一個小時修改一次 Node 資源對象(標簽、注解),也就是說如果一個小時以內用戶手動誤修改node資源特性信息(標簽、注解),最多需要一個小時nfd-master組件才自動更正node資源特性信息。

    [root@master-10 ~]# kubectl logs -n=node-feature-discovery nfd-master-5c4684f5cb-hvjjb 
    I0314 06:23:08.190218       1 nfd-master.go:213] "Node Feature Discovery Master" version="v0.14.2" nodeName="master-10.20.31.105" namespace="node-feature-discovery"
    I0314 06:23:08.190356       1 nfd-master.go:1214] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-master.conf"
    I0314 06:23:08.190912       1 nfd-master.go:1274] "configuration successfully updated" configuration=<
    	DenyLabelNs: {}
    	EnableTaints: false
    	ExtraLabelNs: {}
    	Klog: {}
    	LabelWhiteList: {}
    	LeaderElection:
    	  LeaseDuration:
    	    Duration: 15000000000
    	  RenewDeadline:
    	    Duration: 10000000000
    	  RetryPeriod:
    	    Duration: 2000000000
    	NfdApiParallelism: 10
    	NoPublish: false
    	ResourceLabels: {}
    	ResyncPeriod:
    	  Duration: 3600000000000
     >
    I0314 06:23:08.190928       1 nfd-master.go:1338] "starting the nfd api controller"
    I0314 06:23:08.191105       1 node-updater-pool.go:79] "starting the NFD master node updater pool" parallelism=10
    I0314 06:23:08.860810       1 metrics.go:115] "metrics server starting" port=8081
    I0314 06:23:08.861033       1 component.go:36] [core][Server #1] Server created
    I0314 06:23:08.861050       1 nfd-master.go:347] "gRPC server serving" port=8080
    I0314 06:23:08.861084       1 component.go:36] [core][Server #1 ListenSocket #2] ListenSocket created
    I0314 06:23:09.860886       1 nfd-master.go:694] "will process all nodes in the cluster"
    I0314 06:23:09.923362       1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"
    I0314 07:23:09.224254       1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"
    I0314 08:23:09.081362       1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"

    (5)查看節點特性信息

    可以看到NFD組件已經把節點特性信息維護到了節點標簽、注解上,其中標簽前綴默認為 feature.node.kubernetes.io/。

    [root@master-10 opt]# kubectl describe node master-10.20.31.105 
    Name:               master-10.20.31.105
    Roles:              control-plane,master,worker
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        feature.node.kubernetes.io/cpu-cpuid.ADX=true
                        feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                        feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8=true
                        feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                        feature.node.kubernetes.io/cpu-cpuid.FXSR=true
                        feature.node.kubernetes.io/cpu-cpuid.FXSROPT=true
                        feature.node.kubernetes.io/cpu-cpuid.HLE=true
                        feature.node.kubernetes.io/cpu-cpuid.HYPERVISOR=true
                        feature.node.kubernetes.io/cpu-cpuid.LAHF=true
                        feature.node.kubernetes.io/cpu-cpuid.MOVBE=true
                        feature.node.kubernetes.io/cpu-cpuid.MPX=true
                        feature.node.kubernetes.io/cpu-cpuid.OSXSAVE=true
                        feature.node.kubernetes.io/cpu-cpuid.RTM=true
                        feature.node.kubernetes.io/cpu-cpuid.SYSCALL=true
                        feature.node.kubernetes.io/cpu-cpuid.SYSEE=true
                        feature.node.kubernetes.io/cpu-cpuid.X87=true
                        feature.node.kubernetes.io/cpu-cpuid.XSAVE=true
                        feature.node.kubernetes.io/cpu-cpuid.XSAVEC=true
                        feature.node.kubernetes.io/cpu-cpuid.XSAVEOPT=true
                        feature.node.kubernetes.io/cpu-cpuid.XSAVES=true
                        feature.node.kubernetes.io/cpu-hardware_multithreading=false
                        feature.node.kubernetes.io/cpu-model.family=6
                        feature.node.kubernetes.io/cpu-model.id=85
                        feature.node.kubernetes.io/cpu-model.vendor_id=Intel
                        feature.node.kubernetes.io/kernel-config.NO_HZ=true
                        feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true
                        feature.node.kubernetes.io/kernel-version.full=3.10.0-1160.105.1.el7.x86_64
                        feature.node.kubernetes.io/kernel-version.major=3
                        feature.node.kubernetes.io/kernel-version.minor=10
                        feature.node.kubernetes.io/kernel-version.revision=0
                        feature.node.kubernetes.io/pci-0300_15ad.present=true
                        feature.node.kubernetes.io/system-os_release.ID=centos
                        feature.node.kubernetes.io/system-os_release.VERSION_ID=7
                        feature.node.kubernetes.io/system-os_release.VERSION_ID.major=7
                        kubernetes.io/arch=amd64
                        kubernetes.io/hostname=master-10.20.31.105
                        kubernetes.io/os=linux
                        node-role.kubernetes.io/control-plane=
                        node-role.kubernetes.io/master=
                        node-role.kubernetes.io/worker=
                        node.kubernetes.io/exclude-from-external-load-balancers=
    Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"c6:fb:4b:8a:bb:12"}
                        flannel.alpha.coreos.com/backend-type: vxlan
                        flannel.alpha.coreos.com/kube-subnet-manager: true
                        flannel.alpha.coreos.com/public-ip: 10.20.31.105
                        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                        nfd.node.kubernetes.io/feature-labels:
                          cpu-cpuid.ADX,cpu-cpuid.AESNI,cpu-cpuid.AVX,cpu-cpuid.AVX2,cpu-cpuid.AVX512BW,cpu-cpuid.AVX512CD,cpu-cpuid.AVX512DQ,cpu-cpuid.AVX512F,cpu-...
                        nfd.node.kubernetes.io/master.version: v0.14.2
                        nfd.node.kubernetes.io/worker.version: v0.14.2
                        node.alpha.kubernetes.io/ttl: 0
                        volumes.kubernetes.io/controller-managed-attach-detach: true
    CreationTimestamp:  Tue, 12 Mar 2024 21:01:31 -0400
    

    4、組件應用場景

    Node Feature Discovery(NFD)組件的主要應用場景是在Kubernetes集群中提供更智能的節點調度。以下是一些NFD的常見應用場景:

    1. 智能節點調度:NFD可以幫助Kubernetes調度器更好地了解節點的特性和資源,從而更智能地選擇最適合運行特定工作負載的節點。例如,如果某個Pod需要較強的GPU支持,調度器可以利用NFD標簽來選擇具有適當GPU型號的節點。

    2. 資源約束和優化:通過將節點的特性能力以標簽的形式暴露給Kubernetes調度器,集群管理員可以更好地理解和利用集群中節點的資源情況,從而更好地進行資源約束和優化。

    3. 硬件感知的工作負載調度:對于特定的工作負載,可能需要特定類型或配置的硬件。NFD可以使調度器能夠更加智能地選擇具有適當硬件特性的節點來運行這些工作負載。

    4. 集群擴展性和性能:通過更智能地分配工作負載到節點,NFD可以提高集群的整體性能和效率。它可以幫助避免資源浪費,并確保工作負載能夠充分利用可用的硬件資源。

    5. 集群自動化:NFD可以集成到自動化流程中,例如自動化部署或縮放工作負載。通過使用NFD,自動化系統可以更好地了解節點的特性和資源,從而更好地執行相應的操作。

    總的來說,Node Feature Discovery(NFD)可以幫助提高Kubernetes集群的智能程度,使其能夠更好地適應各種類型的工作負載和節點特性,從而提高集群的性能、可靠性和效率。

     

     5、總結

    如果您的 Kubernetes 集群需要根據節點的硬件特性進行智能調度或者對節點的硬件資源進行感知和利用,那么安裝 Node Feature Discovery(NFD)是有必要的。然而,如果您的集群中的節點都具有相似的硬件配置,且不需要考慮硬件資源的差異,那么不需要安裝 NFD。

    posted @ 2024-03-14 16:32  人艱不拆_zmc  閱讀(126)  評論(0編輯  收藏  舉報
    免费视频精品一区二区_日韩一区二区三区精品_aaa在线观看免费完整版_世界一级真人片
    <bdo id="4g88a"><xmp id="4g88a">
  • <legend id="4g88a"><code id="4g88a"></code></legend>